Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 77214 - locale collation problem with LC_COLLATE=*.UTF-8
Summary: locale collation problem with LC_COLLATE=*.UTF-8
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: glibc
Version: 8.0
Hardware: All
OS: Linux
Target Milestone: ---
Assignee: Jakub Jelinek
QA Contact: Brian Brock
Depends On:
TreeView+ depends on / blocked
Reported: 2002-11-03 12:20 UTC by jar
Modified: 2016-11-24 14:59 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2002-11-03 13:38:47 UTC

Attachments (Terms of Use)

Description jar 2002-11-03 12:20:02 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (compatible; Konqueror/3; Linux)

Description of problem:
When your locale is using UTF8 there is a problem
with collation.  Try the following under a bash:

	mkdir c
	cd c
	> a
	> b
	echo [A-Z]

The output of echo is:
which is entirely wrong.

If you have LC_COLLATE=C in the environment
the bug does not occur.

This bug will cause shell scripts, at least, to
behave incorrectly.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. run bash interactively
2  run "locale" to make sure "UTF8" appears in the
   LC_COLLATE variable, e.g.
	locale | grep LC_COLLATE
   and the output is something like:
3. run the following commands:
	mkdir c
	cd c
	> a
	> b
	echo [A-Z]	
the output of echo is:
which is wrong

Actual Results:  the output of echo is:
which is wrong

Expected Results:  the output of echo should be:

Additional info:

If you set the environment variable LC_LOCALE=C
the results of running the example is correct:
However, it must be set when the shell is started,
so do
and then run the example from this new shell.

Comment 1 Miloslav Trmac 2002-11-03 13:17:46 UTC
Range expressions (i.e. [A-Z]) must behave according to LC_COLLATE settings,
i.e. use dictionary order (AaBbCc or aAbBcC in most locales). This is mandated
by POSIX, if you need the old behavior, export LC_COLLATE=C.

The only problem is that bash doesn't react immediately (needs LC_COLLATE=C 
bash), and this is fixed in bash-2.05b-7 in rawhide.

Comment 2 jar 2002-11-03 13:38:40 UTC
I understand, thanks. 
However, won't existing shell scripts break?  Perhaps, 
LC_COLLATE should be set in /etc/profile.d/glib2.* 
to "C" to avoid this?  (Or something like it.) 
The reason this was spotted was because a friend 
of mine noticed an existing shell script broke 
under Redhat 8.0. 

Comment 3 Jakub Jelinek 2002-11-03 15:05:27 UTC
You should set LC_COLLATE=C (or LC_ALL=C) right before invoking program which
requires the C collation. Say if you want to do an ASCII sort, you do
somecommand | LC_ALL=C sort
This is nothing new in 8.0 - exactly the same behaviour was there in the 7.x
series (though in that case the locales weren't en_US.UTF-8 etc., but en_US
or en_US.ISO-8859-15).

Note You need to log in before you can comment on or make changes to this bug.