Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.

Bug 77214

Summary: locale collation problem with LC_COLLATE=*.UTF-8
Product: [Retired] Red Hat Linux Reporter: jar
Component: glibcAssignee: Jakub Jelinek <jakub>
Status: CLOSED NOTABUG QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 8.0CC: fweimer, mitr
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2002-11-03 13:38:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description jar 2002-11-03 12:20:02 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (compatible; Konqueror/3; Linux)

Description of problem:
When your locale is using UTF8 there is a problem
with collation.  Try the following under a bash:

	mkdir c
	cd c
	> a
	> b
	echo [A-Z]

The output of echo is:
	b
which is entirely wrong.

If you have LC_COLLATE=C in the environment
the bug does not occur.

This bug will cause shell scripts, at least, to
behave incorrectly.



Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. run bash interactively
2  run "locale" to make sure "UTF8" appears in the
   LC_COLLATE variable, e.g.
	locale | grep LC_COLLATE
   and the output is something like:
	LC_COLLATE=en_US.UTF-8
3. run the following commands:
	mkdir c
	cd c
	> a
	> b
	echo [A-Z]	
the output of echo is:
	b
which is wrong
	

Actual Results:  the output of echo is:
	b
which is wrong

Expected Results:  the output of echo should be:
	[A-Z]

Additional info:

If you set the environment variable LC_LOCALE=C
the results of running the example is correct:
	[A-Z]
However, it must be set when the shell is started,
so do
	LC_LOCALE=C bash
and then run the example from this new shell.

Comment 1 Miloslav Trmac 2002-11-03 13:17:46 UTC
Range expressions (i.e. [A-Z]) must behave according to LC_COLLATE settings,
i.e. use dictionary order (AaBbCc or aAbBcC in most locales). This is mandated
by POSIX, if you need the old behavior, export LC_COLLATE=C.

The only problem is that bash doesn't react immediately (needs LC_COLLATE=C 
bash), and this is fixed in bash-2.05b-7 in rawhide.

Comment 2 jar 2002-11-03 13:38:40 UTC
I understand, thanks. 
 
However, won't existing shell scripts break?  Perhaps, 
LC_COLLATE should be set in /etc/profile.d/glib2.* 
to "C" to avoid this?  (Or something like it.) 
 
The reason this was spotted was because a friend 
of mine noticed an existing shell script broke 
under Redhat 8.0. 


Comment 3 Jakub Jelinek 2002-11-03 15:05:27 UTC
You should set LC_COLLATE=C (or LC_ALL=C) right before invoking program which
requires the C collation. Say if you want to do an ASCII sort, you do
somecommand | LC_ALL=C sort
This is nothing new in 8.0 - exactly the same behaviour was there in the 7.x
series (though in that case the locales weren't en_US.UTF-8 etc., but en_US
or en_US.ISO-8859-15).