Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 153724 - pdflatex generates pdf file that xpdf and Adobe Acroread cannot search for underscores in
pdflatex generates pdf file that xpdf and Adobe Acroread cannot search for un...
 Keywords: ABIAssurance AcceptanceCriteriaApproved AcceptanceCriteriaNeeded AcceptanceCriteriaProvided AcceptanceCriteriaRejected Automation AutomationBackLog AutomationBlocker AutomationTriaged AutoVerified BetaBlocker Branch Bugfix BuildBlocker CodeChange CommonBugs DeliveryBlocker Desktop DevelBlocker Documentation EasyFix EC2 Embedded External Extras FastFix FeatureBackport Field FieldEngineering FutureFeature GraphicArt GSS-NFV-Escalation GSSTriaged HardwareEnablement HwCertBlocker i18n Improvement ImprovesTestability InstallerIntegration ManPageChange ManyUsersImpacted MoveUpstream NeedsTestCase OnlineDedicated OnlinePro OnlineStarter OpsBlocker OracleCert OtherQA Patch Performance PrioBumpField PrioBumpGSS PrioBumpPM PrioBumpQA PromotionBlocker QA-Closed Question Rebase Regression ReleaseNotes Reopened Reproducer RevisionTracker RFE RHELNAK screened ScreenshotChange Security SecurityTracking SELinux StringChange StudentProject SubBug SubFeature SubTask SupportQuestion Task TechPreview TestBlocker TestCaseApproved TestCaseNeeded TestCaseProvided TestCaseRejected TestOnly Tracking Translation Triaged Unconfirmed UpcomingRelease UpcomingSprint Upgrades Upstream UseCase UserExperience UserStory VerifiedOnDev VerifiedUpstream WorkAround WorkItem ZStream CLOSED WONTFIX None Fedora Fedora tetex --- 5 i386 Linux medium medium --- Jindrich Novy David Lawrence depends on / blocked

 Reported: 2005-04-05 11:24 UTC by James Hunt 2013-07-02 23:07 UTC (History) 3 users (show) hnassrat mattdm pknirsch Bug Fix 2006-09-25 13:14:54 UTC Red Hat Enterprise Virtualization Manager Red Hat OpenStack

 James Hunt 2005-04-05 11:24:07 UTC From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050323 Firefox/1.0.2 Fedora/1.0.2-1.3.1 Description of problem: If you search in a PDF file created with 'pdflatex' from a LaTeX source file for a string containing an underscore character ('_'), the search will fail. The PDF view can be 'xpdf' or Adobe Acroread - the result is the same. Version-Release number of selected component (if applicable): tetex-latex-2.0.2-21.3 How reproducible: Always Steps to Reproduce: 1. Create a LaTeX source file containing an underscore. 2. Run, "pdflatex file.tex" (up to 3 times as necessary). 3. Run, "xpdf file.pdf". 4. Search for a string containing an underscore character by pressing 'f' key and entering the search string. 5. Press return. Actual Results: Nothing - the string was not found. Expected Results: xpdf should have found the string and highlighted it. Additional info: To recreate the problem, put the 4 lines below in a file called "file.tex", and follow the steps above: \documentclass[12pt]{article} \begin{document} hello\_world. \end{document} ______________ If you use this document, you can search for "hello" and this will be found. You can search for "world" and this will be found. However, if you search for "hello_world", this will *NOT* be found. I initially suspected a problem with xpdf, however, I now believe the problem is with the pdflatex command since I downloaded a PDF from http://www.w3c.org and searched for underscores and these _are_ found by xpdf. This is what I did: 1. curl -O http://www.w3.org/TR/html401/html40.pdf.gz 2. gunzip html40.pdf.gz 3. xpdf html40.pdf 4. Search for string "section_2" by typing 'f' and then typing "section_2" followed by return. 5. The string will be found on page 20 in section, "2.1.2 Fragment identifiers". I then repeated the steps above using Adobe Acroread ("rpm -q acroread" shows, "acroread-5.07-2"). Again, the string was found. Note: "pdfinfo file.pdf" returns: Creator: TeX Producer: pdfTeX-1.10b CreationDate: Tue Apr 5 11:47:00 2005 Tagged: no Pages: 1 Encrypted: no Page size: 595.276 x 841.89 pts (A4) File size: 6919 bytes Optimized: no PDF version: 1.4 ...whilst "pdfinfo html40.pdf" shows: Title: HTML 4.01 Specification Subject: Keywords: Author: Creator: html2ps version 1.0 beta2 patched by Arnaud Le Hors 19990806 Producer: GNU Ghostscript 5.10 CreationDate: Fri Dec 24 18:35:43 1999 Tagged: no Pages: 389 Encrypted: no Page size: 612 x 792 pts (letter) File size: 3009579 bytes Optimized: no PDF version: 1.2 Is the PDF version relevant I wonder? Is pdflatex not generating correct PDF version 1.4 output??? This bug is a major irritant as I've got some very large PDF documents that have a lot of underscores in them and it's a real pain having to scan them by hand to find the sections I want.  Matthew Miller 2006-07-10 21:58:20 UTC Fedora Core 3 is now maintained by the Fedora Legacy project for security updates only. If this problem is a security issue, please reopen and reassign to the Fedora Legacy product. If it is not a security issue and hasn't been resolved in the current FC5 updates or in the FC6 test release, reopen and change the version to match. Thank you!  James Hunt 2006-07-11 19:47:26 UTC Yep, it's still a problem. Here are the current versions of my PDF viewers: xpdf-3.01-12.1 gpdf-2.8.2-4.2 kdegraphics-3.5.3-0.2.fc5 (kpdf) All 3 pdf readers suffer from the same problem: - Search for "hello" - finds it - Search for "world" - finds it - Search for "hello_world" - doesn't find it - Search for "_" - doesn't find it.  Jindrich Novy 2006-09-25 13:14:54 UTC The problem is that teTeX renders underscore like graphics and not a letter so that one couldn't search for underscore directly. Note that even pdftotext outputs a space character instead of underscore so that it's not visible to other pdf viewing utilities as well.  Pykler 2009-04-09 23:43:03 UTC I have an update on this issue, Quoting Karl Berry from the Mac-Tex user group: > the TeX engine generates a weird graphic rather than using > the underscore character You are correct about that (except I wouldn't call it "weird"). The standard definition of \_ is \def\_{\leavevmode \kern.06em \vbox{\hrule width.3em}} > (maybe for a good reason). Yes, the reason is that it would have been crazy for Knuth to waste a precious slot in the original 1980s fonts (limited to 128 chars) on a character that could perfectly well be created by a rule. The answer is, don't use \_. Instead, put your address in \tt and use the actual _ character. In plain TeX: $<${\tt first\char\_last@gmail.com}$>$ Then the _ will be pastable (and the output will look better, too). I'm not sure if you're using LaTeX. If you are, and you load url or hyperref, you'll have a command \url that will let you type it without the extra \char sequence: \url{first_last@gmail.com} (And you'll get better line breaking behavior, too.) Of course a personal definition could be made to do the same thing with plain. Similar things could be done with other fonts that provide an _ character if you want something other than typewriter, but I don't have recipes at hand. Almost everything besides Knuth's original cm* fonts does have an _ character. If you feel like reposting this to any of the bug systems, feel free. Hope this helps. ------ I have a recipe as mentioned above to change to use something other than the default CM font. This recipe was contributed by Herb Schulz, also from MacTex support: Try using the Latin Modern font with T1 encoding; that font is an updated design of CM with more characters and built-in, rather than constructed, accented characters. To use the Latin Modern font with T1 encoding add the lines \usepackage{lmodern} \usepackage[T1]{fontenc} \usepackage{textcomp} to your preamble. ------ `