style:defaultalt 1alt 2

Sun, 06 Dec 2009 @ 20:56:26

searchable ligatures

While working on my resume, which is authored in LaTeX, I noticed an interesting problem: every word that contained the characters 'ff', 'fi', or 'ffi' was not searchable. The document was human-readable on the screen, but for some reason Adobe Acrobat Reader was unable to properly copy-and-paste or search these letters. Additionally, the Linux command `pdftotext' converted the PDF to plain text that was not pure ASCII. After discussing the problem on freenode with a few more experienced LaTeX users, I discovered what is probably a lesser-known typography feature: ligatures.

Ligatures are single characters (glyphs) that represent the combination of two or more characters in a more aesthetically pleasing fashion for printing of documentation. LaTeX and other typesetting software make these substitutions automatically. In this case, the 'f' character is positioned closer to the following character if the following character is 'f' or 'i', making a single glyph from two.

Normally a PDF using ligatures still has an internal font encoding that permits searching and copying of the text without the ligatures. For some yet unknown reason my distribution of TeXLive creates PDFs whose ligatures are encoding compatible with open source programs, such as Evince, but pose problems with Adobe products.

The solution was to use the lmodern LaTeX package with pdfLaTeX, substituting fonts in the PDF. Unfortunately, this can more than double the PDF size.

\usepackage{lmodern}

Posted by timotheus | Permanent link | Filed under: latex, software | (add a comment)
 
© Copyright Timothy Stotts 2002-2009. All rights reserved. ^ top of page
validate:  xhtml  css  rss  atomcontact:  email  gtalk <<< index of blog