Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 157071 - Perl doesn't lovercase accented caracters in UTF-8
Summary: Perl doesn't lovercase accented caracters in UTF-8
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: perl
Version: 4
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Jason Vas Dias
QA Contact: David Lawrence
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-05-06 16:27 UTC by Horst H. von Brand
Modified: 2007-11-30 22:11 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-11-08 22:32:20 UTC


Attachments (Terms of Use)

Description Horst H. von Brand 2005-05-06 16:27:46 UTC
Description of problem:
Perl doesn't lovercase accented characters. I'm running LANG=en_US.UTF-8
The steps given below are copy-pasted from a Gnome terminal.

Version-Release number of selected component (if applicable):
perl-5.8.6-10

How reproducible:
Always

Steps to Reproduce:
1. perl -e 'print lc "ÃÃÃÃÃ\n"'
2. perl -e 'use locale; print lc "ÃÃÃÃÃ\n"'
3.
  
Actual results:
ÃÃÃÃÃ

Expected results:
áéüòê

Additional info:
With:

   perl -e 'use utf8; print lc "ÃÃÃÃÃ\n"'

there is no visible output, od(1) shows junk:

   perl -e 'use locale; print lc "ÃÃÃÃÃ\n"' | od -c
   0000000 303 201 303 211 303 234 303 222 303 212  \n
   0000013

Comment 1 Jason Vas Dias 2005-11-08 22:32:20 UTC
Yes, I know the perl unicode implementation is far from user-friendly
or intuitive - this is an upstream issue that is being addressed - but
it does work (just) if used correctly .

perl's lc / uc DO work for UTF-8, IF the UTF-8 is properly encoded, AND perl is
running in wide-character mode , AND the characters have defined  upper/lower
case counterparts in your current locale.

These examples should expose the issues - I suggest you also read the 
perlunicode and perllocale man-pages .

$ perl -C -e 'use locale; use utf8; use Encode qw(decode); 
$s=decode(utf8,"\xc5\x99\xc4\x9b"); print uc $s,"\n";'
ÅÄ

$ perl -C -e 'use locale; use utf8; use Encode qw(decode); 
$s=decode(utf8,"\xc5\x99\xc4\x9b"); print  $s,"\n";'
ÅÄ

$ perl -e 'use Encode qw(decode);  $s=decode(utf8,"\xc5\x99\xc4\x9b"); print 
$s,"\n";'
Wide character in print at -e line 1.
ÅÄ

$ perl -C -e 'use Encode qw(decode);  $s=decode(utf8,"\xc5\x99\xc4\x9b"); print
 $s,"\n";'
ÅÄ

$ PERL_UNICODE=31 perl -e 'use Encode qw(decode);
$s=decode(utf8,"\xc5\x99\xc4\x9b"); print uc $s,"\n";'
ÅÄ

$ PERL_UNICODE=31 perl -e 'use Encode qw(decode);  $s=decode(utf8,"ÅÄ"); print
lc $s,"\n";'
ÅÄ





Note You need to log in before you can comment on or make changes to this bug.