Allow Latex to hyphenate words with dashes

Traditional typographical practice is to avoid hyphenating such words, so such a rule was coded by Knuth into the TeX program.

However, it is still possible to insert a hyphen that allows breaks, by fooling TeX into thinking we have two separate words (by inserting empty glue). This can be quite easily done if we're willing to type a special sequence like \Hyphdash or \-/ instead of a hyphen, which is what is provided by packages like extdash. But if we insist on typing regular hyphens into the TeX file, then “classical” TeX engines (meaning, other than LuaTeX) provide only one way of having regular hyphens be treated as something else, and that is to change the catcode of - to active, so that it becomes a macro that expands to something longer. But this would cause problems whenever - is used anywhere other than regular text, which is quite common (see examples of the problem in Steven's answer). I imagine that if a good solution existed for this in the classical TeX world, some package like extdash would have covered it.

(Actually, there's a perfectly sensible approach which for some reason is not very popular in the TeX world: and that is to preprocess your input file with some “smart” external script, replacing only the hyphens that occur in regular text with \Hyphdash (or its expansion) or whatever. But for some reason, people want to do everything inside TeX.)

With LuaTeX, which allows greater access to TeX internals, a lot of things become possible, and a lot of catcode and macro-related headaches are gone. For example, LuaTeX has a hyphenate callback which can be used to modify the regular hyphenation it does. This callback gets a list of nodes, into which we can insert discretionary nodes (hyphens) wherever we choose. This is much easier than having to do everything with macros (it is hard to set up appropriate input which later expands into exactly what we need, with no other interference). And there's even a lang.hyphenate function, which we can use to simply do what the regular hyphenation does.

The following seems to work: put the following in a file, say hyphenateall.lua:

function hyphenate_always(head, tail)
   local n = head
   while n do
      if node.type(n.id) == 'glyph' and n.char == string.byte('-') then
         -- Insert an infinite penalty before, and a zero-width glue node after, the hyphen.
         -- Like writing "\nobreak-\hspace{0pt}" or equivalently "\penalty10000-\hskip0pt"
         local p = node.new(node.id('penalty'))
         p.penalty = 10000
         head, p = node.insert_before(head, n, p)
         local g = node.new(node.id('glue'))
         head, g = node.insert_after(head, n, g)
         n = g
      end
      n = n.next
   end
   lang.hyphenate(head, tail)
end

luatexbase.add_to_callback('hyphenate', hyphenate_always, 'Hyphenate even words containing hyphens')

and just put \directlua{require('hyphenateall')} in your .tex file.


For example (borrowing the example from Steven's answer):

output

\documentclass{article}
% Not needed -- used in this example just to show it's not needed :-)
\newcommand\?{\nobreak-\hspace{0pt}}
\begin{document}

\begin{minipage}[t]{0in}
\hbox{Before:} often hungry Works: often\?hungry What about: often-hungry
\end{minipage}\hspace{1in}%
\directlua{require('hyphenateall')}%
\begin{minipage}[t]{0in}
 \hbox{After:} often hungry Works: often\?hungry What about: often-hungry
\end{minipage}

\end{document}

Note that after the \directlua call, we did not need the special syntax to get the extra hyphenation.


You can do it by changing the catcode of -, but that breaks all kind of stuff, since - is not only a dash but also a negative sign in math (as well as a dimensioning sign, as in \kern-1pt, \vspace{-\baselineskip}, \rule[-1pt]{1pt}{3pt}).

So this MWE below is a mere demonstration, not an actual recommendation. Shown in the MWE for two zero-width minipages, without then with active -.

\documentclass{article}
\let\svdash-
{\catcode`-=\active %
\gdef-{\nobreak\svdash\nobreak\hspace{0pt}}
}
\begin{document}
\begin{minipage}[t]{0in}
\catcode`-=12 %
First,
asteroid

Then,
shower

Last,
asteroid-shower
\end{minipage}\hspace{1in}%
\begin{minipage}[t]{0in}
\catcode`-=\active %
First,
asteroid

Then,
shower

Last,
asteroid-shower
\end{minipage}
\end{document}

enter image description here

Therefore, the better approach would be to (as noted for the extdash package), assign a special macro to be this special type of hyphen. Here I choose \?, resulting in the identical output as the prior MWE:

\documentclass{article}
\newcommand\?{\nobreak-\nobreak\hspace{0pt}}
\begin{document}
\begin{minipage}[t]{0in}
First,
asteroid

Then,
shower

Last,
asteroid-shower
\end{minipage}\hspace{1in}%
\begin{minipage}[t]{0in}
First,
asteroid

Then,
shower

Last,
asteroid\?shower
\end{minipage}
\end{document}

Tags:

Hyphenation