Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.

Bug 6000

Summary: tcsh patterns broken due to locale settings
Product: [Retired] Red Hat Linux Reporter: ilh
Component: bashAssignee: Bernhard Rosenkraenzer <bero>
Status: CLOSED NOTABUG QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: 6.1CC: kevin, santini, zut
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2000-07-27 18:30:22 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Description ilh 1999-10-15 21:28:48 UTC
The following is very, very wrong and will break a lot of
scripts:

% touch A B C a b c
% echo [A-Z]
A a B b C c
% echo [a-z]
a B b C C

I traced it to LC_ALL="en_US" in /etc/sysconfig/i18n.  If I
do not set LC_ALL, but instead set LC_COLLATE="C", the tcsh
patterns work as expected.  This comes down to a bug in
strcoll() in glibc, or the locale files in /usr/share/locale
are screwed up.  Either way, this is a big bug!

I am running on 6.1 with all the updates.

Comment 1 Bill Nottingham 1999-11-08 16:09:59 UTC
*** Bug 5980 has been marked as a duplicate of this bug. ***

Found a bug in tcsh:

Example: (typed in a running tcsh)
pc-9:~ mkdir t
pc-9:~ cd t
pc-9:~/t touch A B c
pc-9:~/t echo [A-Z]*
A B c
pc-9:~/t sh
[zut@pc-9 t]$ echo [A-Z]*
A B
[zut@pc-9 t]$

sh parses the patterns correctly, tcsh works wrong !

------- Additional Comments From ilh@sls.lcs.mit.edu  10/15/99 16:31 -------
I think this is a serious bug having to due with glibc and locale.
With LC_ALL=en_US, this bug always happens for me in 6.1.  With LC_ALL
undefined and LC_COLLATE=C, it works fine.

This bug is going to break a lot of scripts!

Comment 2 kevin lyda 1999-12-27 16:43:59 UTC
I just stumbled on this as well.  very annoying since i use ls -d [A-Z]* to list
docs in large dirs.  this seems to be slow to fix...

Comment 3 kevin lyda 1999-12-28 16:21:59 UTC
if you run the following program (strcoll2.c; cc -o strcoll2 strcoll2.c) like
so:

% sh
% cd /usr/share/i18n/locales
% for f in *; do
strcoll2 $f
done > /tmp/strcoll2.out

you'll see that every locale gets the ordering wrong (except the unsupported
ones which revert to the C locale)

#include <locale.h>
#include <string.h>
#include <stdio.h>

int
main(int argc, char *argv[])
{
    char *lcl;

    lcl = argv[1]? argv[1]: "en_US";
    printf("setlocale(LC_ALL, \"C\") yields %s\n", setlocale(LC_ALL, "C"));
    printf("strcoll(%s, %s) yields %d\n", "a", "B", strcoll("a", "B"));
    printf("setlocale(LC_ALL, \"%s\") yields %s\n", lcl,
	    setlocale(LC_ALL, lcl));
    printf("strcoll(%s, %s) yields %d\n", "a", "B", strcoll("a", "B"));
}

Comment 4 kevin lyda 1999-12-28 16:52:59 UTC
oops, forgot these additional notes:

it seems the string collation is done via /usr/share/i18n/locales/en_DK - the
others i checked seem to copy from there.  i've emailed the author listed in the
file: Keld.Simonsen@dkuug.dk to see if he/she has any ideas.  it might be a
parsing/logic bug in glibc, so changing these files won't neccesarily help.

either way this is going to affect more than just tcsh, so this needs to get
fixed...

Comment 5 keld 1999-12-30 11:56:59 UTC
This is a feature, not a bug.

It comes from that all forms of one letter, such as upper and lower
case versions, are to be sorted before the next letter. Thus all "a"'s
come before all "b"'s.

The behaviour is recognized by ISO and IEEE and The Open Group, and
it is being set in concrete in the new ISO standard 14651, and in
a Unicode TR. [A-Z] is not a good way to say uppercase letters.
Rather use [:upper:] - this also includes accented letters.

Probably sorting behaviour is actually not the problem, but rather
that the [] notation is widely used and dependent on the locale collation.
Standardizers have discussed new syntax for regular expressions, also to include
the full 10646 repertoire in a protable way, but no concensus has been
found yet.

Kind regards
Keld Simonsen

Comment 6 kevin lyda 1999-12-30 12:08:59 UTC
ok, so strcoll() won't change it's behaviour.  A note in the man page would be
nice.  I'll do a patch.  I can either try to keep strcoll and add a check for
lower/upper, or dump it for strcmp.  anyone have a vote?  can't do it now
though, but i'll do it in a few hours.  suppose i should ask the tcsh people...

Comment 7 kevin lyda 2000-01-01 13:05:59 UTC
ok, this seems to work for me.  these lines get added to glob.c:

    if (islower(c1) && isupper(c2))
        return (1);

now you can't just put this anywhere!  see below for where it goes.

int
globcharcoll(c1, c2)
    int c1, c2;
{
#if defined(NLS) && defined(LC_COLLATE) && !defined(NOSTRCOLL)
    char s1[2], s2[2];

    if (c1 == c2)
        return (0);
    if (islower(c1) && isupper(c2))
        return (1);
    s1[0] = c1;
    s2[0] = c2;

Comment 8 kevin lyda 2000-01-01 22:08:59 UTC
i've emailed the author of tcsh with the changes (christos@zoulas.com (Christos
Zoulas)), and he's accepted them.  i think 6.09.00 is current, and this change
would be a further version along.  before making an additional patch file to be
applied within the rpm, i'd suggest the new version of tcsh.

if the other people who have seen this bug could report back with any adverse
behaviours i'd appreciate it.  likewise success.  :)

Comment 9 kevin lyda 2000-01-03 00:31:59 UTC
bug id's 6244 & 6398. relate to this one.  if this gets closed, both of those do
as well.  (at least if the fix involves snarfing the new tcsh)

Comment 10 santini 2000-01-06 22:48:59 UTC
bash2 has the same annoyng problem (being at home, with no access to bugzilla
db, it took me two ours of digging to trace the problem :-)

Comment 11 Jeff Johnson 2000-01-10 20:18:59 UTC
*** Bug 6398 has been marked as a duplicate of this bug. ***

Comment 12 Jeff Johnson 2000-01-10 20:19:59 UTC
*** Bug 6244 has been marked as a duplicate of this bug. ***

Comment 13 Jeff Johnson 2000-01-10 20:51:59 UTC
Fixed (at least tcsh) in tcsh-6.09-1 in Raw Hide.

Changing component to bash2 ...

Comment 14 Andy Newsam 2000-03-31 09:41:59 UTC
*** Bug 10473 has been marked as a duplicate of this bug. ***

Comment 15 Bernhard Rosenkraenzer 2000-08-08 15:23:30 UTC
You can use LANG="C" to turn off this feature.