Listing with background color not line breaking correctly

See the bottom of this answer for a method not using any package.

Update: see the bottom of the bottom for a more flexible yet environment.

(the update has been updated)


Using alltt rather than listings and some modifications in your input:

  1. replace \colorbox by \ccolorbox (to use some macro defined next)
  2. replace the ! at the start of non colored segments by \!
  3. get rid of all other !

Here is the code snippet:

\documentclass{article}

\def\cccolorbox#1#2{\ifx#2\relax\let\next\allowbreak\else
       \def\next{\colorbox{#1}{#2}\allowbreak\cccolorbox{#1}}\fi\next}
\def\ccolorbox#1#2{\fboxsep0pt\cccolorbox{#1}#2\relax}

\def\!#1{\ifx#1\ccolorbox\allowbreak\expandafter\ccolorbox\else
         \ifx#1\end\expandafter\expandafter\expandafter\end\else
         #1\allowbreak\expandafter\expandafter\expandafter\!\fi\fi}

\usepackage{alltt}
\usepackage{color}
\begin{document}\pagestyle{empty}
\begin{alltt}
>unknown protein sequence
\ccolorbox{red}{MELFMKNSSLWGLKFYLFCLFIILSNINRAFASHNIFLDLQSS}\!SAISVKNVHRTRFHFQPPKHWINDPNAP\ccolorbox{red}{MYYNG}\ccolorbox{red}{VY}\!HLFYQYNPKGSVWGNIIWAHSVSKDLINWIHLEPAIY\ccolorbox{red}{PSKKFDKYGTWSGSSTILPNNKPVIIYTGVVDSYNNQVQNYAIPANLSDPFLRKWIKPNNNPL}\!IVPDNSINRTEFRDPTTA\ccolorbox{red}{WMGQDGLWRILIASMRKHRGMALLYRSRDFMKWIKAQ}\ccolorbox{red}{HPLHSSTN}\ccolorbox{red}{TGNWECPDFFPVLFNSTNGLDVSYR}\!GKNVKYVLKNSLDVARFD\ccolorbox{red}{YYTIGMYHTKIDRYIPNNNSIDGWKGL}\!RIDYGNFYASKTFYDPSRNRRVIWGWSNESDVLPDDEIKKGWAGIQGIPRQVWLNLSGKQLLQWPIEELE\ccolorbox{red}{TLRKQKVQLNNKKLSKGEM}\!FEVKGISASQADVEVLFSFSSLNEAEQFDPRWADLYAQDVCAIKG\ccolorbox{red}{STIQGGLGPFGLVTLASKNLEEYTPVFFRVFKAQKSYKILM}\ccolorbox{red}{CSDARR}\!SSMR\ccolorbox{red}{QNEAMYKPSFAGYVDVDLEDMKKLSLRSLIDNSVVESFGAGGKTCITSRVYPTLAIYDNAHLFVFNNGSETITIETLNAWSMDACKMN}
\end{alltt}
\end{document}

result of the proposal code

And to avoid having irregular boxes, use an additional \strut:

\def\cccolorbox#1#2{\ifx#2\relax\let\next\allowbreak\else
   \def\next{\colorbox{#1}{\strut #2}\allowbreak\cccolorbox{#1}}\fi\next}

result of adding a strut

(the ABC at the end is because I added \!ABC for testing purposes at the end of the protein sequence)

One may also question the need for an alltt environment. Just using a \ttfamily (with a \\ after ''unknown protein sequence'') should be enough (with some modification to the \! code which in its current version checks for an \end).


Here is now a solution along those lines. It does not use any package (apart from color). Put this in the preamble

\catcode`\?=\active\catcode`\!=\active
\newenvironment{proteinlisting}
{\fboxsep0pt\catcode`\?=\active\catcode`\!=\active
\def!##1{\ifx##1!\let\next!\else
        \ifx##1?\let\next?\else
        \ifx##1\end\let\next\end\else
        ##1\allowbreak\let\next!\fi\fi\fi\next}%
\def?##1{\ifx##1!\let\next!\else
        \ifx##1?\let\next?\else
        \ifx##1\end\let\next\end\else
        \colorbox{red}{\strut ##1}\allowbreak\let\next?\fi\fi\fi\next}%
\ttfamily}{\par}
\catcode`\?=12 \catcode`\!=12

Then prefix with ? colored segments of your sequence and with ! uncolored ones inside a proteinlisting environment:

\begin{proteinlisting}
\noindent>unknown protein sequence\\
\noindent?MELFMKNSSLWGLKFYLFCLFIILSNINRAFASHNIFLDLQSS!SAISVKNVHRTRFHFQPPKHWINDPNAP?MYYNG?VY!HLFYQYNPKGSVWGNIIWAHSVSKDLINWIHLEPAIY?PSKKFDKYGTWSGSSTILPNNKPVIIYTGVVDSYNNQVQNYAIPANLSDPFLRKWIKPNNNPL!IVPDNSINRTEFRDPTTA?WMGQDGLWRILIASMRKHRGMALLYRSRDFMKWIKAQ?HPLHSSTN?TGNWECPDFFPVLFNSTNGLDVSYR!GKNVKYVLKNSLDVARFD?YYTIGMYHTKIDRYIPNNNSIDGWKGL!RIDYGNFYASKTFYDPSRNRRVIWGWSNESDVLPDDEIKKGWAGIQGIPRQVWLNLSGKQLLQWPIEELE?TLRKQKVQLNNKKLSKGEM!FEVKGISASQADVEVLFSFSSLNEAEQFDPRWADLYAQDVCAIKG?STIQGGLGPFGLVTLASKNLEEYTPVFFRVFKAQKSYKILM?CSDARR!SSMR?QNEAMYKPSFAGYVDVDLEDMKKLSLRSLIDNSVVESFGAGGKTCITSRVYPTLAIYDNAHLFVFNNGSETITIETLNAWSMDACKMN
\end{proteinlisting}

The output is the same as in the previous proposal. The environment may be customized to use other colors and more markers, for different colors, just imitate the code.


Ok, here is one such final variant. In the preamble of the document:

\catcode`\?=\active \catcode`\!=\active
\newenvironment{proteinseqlst}[1][60]
{\fboxsep0pt \catcode`\?=\active \catcode`\!=\active
\ttfamily
\setbox0=\hbox{A}\hsize=\wd0 \multiply\hsize by #1\relax
\def!##1{\ifx##1!\let\next!\else
        \ifx##1?\let\next?\else
        \ifx##1\end\let\next\end\else
        ##1\allowbreak\let\next!\fi\fi\fi\next}%
\def?##1##2{\ifx##2!\let\next!\else
        \ifx##2?\let\next?\else
        \ifx##2\end\let\next\end\else
        \colorbox{##1}{\strut ##2}\allowbreak\def\next{?{##1}}\fi\fi\fi\next}%
}{\par}
\catcode`\?=12 \catcode`\!=12

This environment typesets the amino acid sequences with a number of letters per line indicated as an optional parameter (hence withing square brackets). The default is 60. Line breaks in the source are allowed and have no influence on the output (no empty line though, else an error will be raised on the tex run).

This environment allows arbitrary colors. One puts the desired colors within braces after a ?. One prefixes uncolored segments with a !. Note that the xcolor syntax as in ?{yellow!20} is allowed, the ! inside the braces will be treated by xcolor, not by the environment definition.

And the tex run produces no overfull boxes warnings in the log file.

With the following input (arbitrarily here 53 characters per input line, has no influence on the output):

\begin{proteinseqlst}[40]
\noindent>unknown protein sequence\\
\noindent
?{yellow}MELFMKNSSLWGLKFYLFCLFIILSNINRAFASHNIFLDLQSS!
SAISVKNVHRTRFHFQPPKHWINDPNAP?{blue}MYYNG?{green}VY!HL
FYQYNPKGSVWGNIIWAHSVSKDLINWIHLEPAIY?{red}PSKKFDKYGTWS
GSSTILPNNKPVIIYTGVVDSYNNQVQNYAIPANLSDPFLRKWIKPNNNPL!I
VPDNSINRTEFRDPTTA?{green}WMGQDGLWRILIASMRKHRGMALLYRSR
DFMKWIKAQ?{yellow}HPLHSSTN?{blue}TGNWECPDFFPVLFNSTNGL
DVSYR!GKNVKYVLKNSLDVARFD?{yellow}YYTIGMYHTKIDRYIPNNNS
IDGWKGL!RIDYGNFYASKTFYDPSRNRRVIWGWSNESDVLPDDEIKKGWAGI
QGIPRQVWLNLSGKQLLQWPIEELE?{blue}TLRKQKVQLNNKKLSKGEM!F
EVKGISASQADVEVLFSFSSLNEAEQFDPRWADLYAQDVCAIKG?{red}STI
QGGLGPFGLVTLASKNLEEYTPVFFRVFKAQKSYKILM?{blue}CSDARR!S
SMR?{red}QNEAMYKPSFAGYVDVDLEDMKKLSLRSLIDNSVVESFGAGGKT
CITSRVYPTLAIYDNAHLFVFNNGSETITIETLNAWSMDACKMN
\end{proteinseqlst}

the output is:

example with 40 letters per line

Tags:

Listings