|Summary:||locale collation problem with LC_COLLATE=*.UTF-8|
|Product:||[Retired] Red Hat Linux||Reporter:||jar|
|Component:||glibc||Assignee:||Jakub Jelinek <jakub>|
|Status:||CLOSED NOTABUG||QA Contact:||Brian Brock <bbrock>|
|Fixed In Version:||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2002-11-03 13:38:47 UTC||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Cloudforms Team:||---||Target Upstream Version:|
Description jar 2002-11-03 12:20:02 UTC
From Bugzilla Helper: User-Agent: Mozilla/5.0 (compatible; Konqueror/3; Linux) Description of problem: When your locale is using UTF8 there is a problem with collation. Try the following under a bash: mkdir c cd c > a > b echo [A-Z] The output of echo is: b which is entirely wrong. If you have LC_COLLATE=C in the environment the bug does not occur. This bug will cause shell scripts, at least, to behave incorrectly. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. run bash interactively 2 run "locale" to make sure "UTF8" appears in the LC_COLLATE variable, e.g. locale | grep LC_COLLATE and the output is something like: LC_COLLATE=en_US.UTF-8 3. run the following commands: mkdir c cd c > a > b echo [A-Z] the output of echo is: b which is wrong Actual Results: the output of echo is: b which is wrong Expected Results: the output of echo should be: [A-Z] Additional info: If you set the environment variable LC_LOCALE=C the results of running the example is correct: [A-Z] However, it must be set when the shell is started, so do LC_LOCALE=C bash and then run the example from this new shell.
Comment 1 Miloslav Trmac 2002-11-03 13:17:46 UTC
Range expressions (i.e. [A-Z]) must behave according to LC_COLLATE settings, i.e. use dictionary order (AaBbCc or aAbBcC in most locales). This is mandated by POSIX, if you need the old behavior, export LC_COLLATE=C. The only problem is that bash doesn't react immediately (needs LC_COLLATE=C bash), and this is fixed in bash-2.05b-7 in rawhide.
Comment 2 jar 2002-11-03 13:38:40 UTC
I understand, thanks. However, won't existing shell scripts break? Perhaps, LC_COLLATE should be set in /etc/profile.d/glib2.* to "C" to avoid this? (Or something like it.) The reason this was spotted was because a friend of mine noticed an existing shell script broke under Redhat 8.0.
Comment 3 Jakub Jelinek 2002-11-03 15:05:27 UTC
You should set LC_COLLATE=C (or LC_ALL=C) right before invoking program which requires the C collation. Say if you want to do an ASCII sort, you do somecommand | LC_ALL=C sort This is nothing new in 8.0 - exactly the same behaviour was there in the 7.x series (though in that case the locales weren't en_US.UTF-8 etc., but en_US or en_US.ISO-8859-15).