Display issues drawing DNA sequences with TikZ

Not sure if this addresses any of your specific questions, as there are many disjoint ones in the posting. It would be better to split them into separate questions so that they might be of more help to others, and also easier to answer. So this is more of a how I would recommend you do this kind of diagram.

Fist thing to figure out is not the diagram itself, but rather

What is the actual information that you want to specify to obtain the desired result?

1. Capture the Data:

Since I am not knowledgeable in DNA sequencing, there may well be a better way that more accurately reflects what you want, but the way I see it is that in each row you have three sets of DNA sequences. In a chat discussion you mentioned that there was not an easy algorithm to automatically determine the color to use, so the natural syntax I see is something such as:

\ThreeDNASequences
    {C/cyan,, A/orange,,,G/blue,,G/red!25,,C/yellow}
    {C/violet,G/brown}
    {A/cyan,G/yellow,C/brown};

where each of the three parameters represent the left, middle, and right sequence along with the color. Multiple subsequent commas are used to denote an empty cell. So you would need one of these calls for each row of your diagram. So with three instances you get three rows:

enter image description here

2. Draw the Arrows:

The other portion of the problem is how to draw the arrows between the various nodes. Using a process similar to the \tikzmark we simply name each of the nodes using a counter to provide a unique reference to each column, and provide an optional parameter to the \ThreeDNASequences macro so that we can have a prefix to distinguish between the rows.

So using \ThreeDNASequences[Top]{}{}{} for the top row, and \ThreeDNASequences[Bottom]{}{}{} for the bottom row can be used to label the nodes as Top-0, Top-1, ..., and Bottom-0, Bottom-1, ... with which we can access each of those nodes. Since this also has a pattern where one node connects to many nodes, it makes sense to define a macro such as:

\ConnectNodes[red, out=100, in=80]
    {Top-12.north}
    {Top-2.north,Top-7.north,Top-9.north};

to specify the options of the lines, and the nodes to connect.

Putting these together you have:

enter image description here

Further Enhancements:

  • Since it is now possible to access each of the nodes, these nodes can be used to place the text that goes adjacent (below, right, or top) of each of these nodes. Note that in the MWE below I did not provide a unique label to use for the middle rows as I was not concerned about the text that needs to be placed. So when placing the text it would be recommended that you also provide unique prefix for each row's node labels.
  • One may need to tweak the arrows via shorten syntax if desired.
  • The xshift used in \ThreeDNASequences should really be computed based on the number of members of the previous sequence. This is really only important if the numbers of each of the columns might change in a different diagram. I used \pgfmathsetmacro{\Shift}{} only to provide a place holder as to where this change may be needed.

Code:

\documentclass[border=3pt]{article}
\usepackage{tikz}
\usepackage{xstring}
\usetikzlibrary{calc, arrows}

\newcommand*{\NodeSize}{0.5cm}%
\newcommand*{\YShiftBetweenRows}{-1cm}% Subsequent rows are shited down so they don't overlap
\tikzset{DNA Style/.style={minimum size=0.5cm, draw=gray, line width=1pt}}{}

\newlength{\YShift}% 
\newcounter{ColumnCounter}% Prefix for node labels

% Initialize - These are probably not needed, but prefer to set them
\setlength{\YShift}{0cm}% 
\setcounter{ColumnCounter}{0}


\newcommand*{\DNASequence}[2][Mark]{%
    % http://tex.stackexchange.com/questions/12091/tikz-foreach-loop-with-macro-defined-list
    \def\Sequence{#2}
    \foreach [count=\xi] \Label/\Color in \Sequence {%
        \pgfmathsetmacro{\XShift}{\NodeSize*\xi}%
        \IfStrEq{\Color}{}{\def\Color{white}}{}
        \edef\NodeName{#1-\arabic{ColumnCounter}}
        \node [DNA Style, fill=\Color, xshift=\XShift] (\NodeName) {\Label};
        \stepcounter{ColumnCounter}
    } 
}%


\newcommand*{\ThreeDNASequences}[4][Mark]{% #1 = tikzmark prefix
    \setcounter{ColumnCounter}{0}% reset column counter
    \begin{scope}[yshift=\YShift]
        \DNASequence[#1]{#2} 
        \pgfmathsetmacro{\Shift}{6cm}% Should compute this based on num of items in #1
        \begin{scope}[xshift=\Shift]
            \DNASequence[#1]{#3} 
        \end{scope}
        \pgfmathsetmacro{\Shift}{8cm}% Should compute this based on num of items in #2  
        \begin{scope}[xshift=\Shift]
            \DNASequence[#1]{#4} 
        \end{scope}
    \end{scope}
    \pgfmathsetlength{\YShift}{\YShift\YShiftBetweenRows}%
}

\newcommand*{\ConnectNodes}[3][]{%
    % #1 = draw options
    % #2 = ending node
    % #3 = list of starting nodes
    \def\ListOfEndNodes{#3}
    \foreach \EndNode in \ListOfEndNodes {%
    \draw[latex'-, thick, #1] (#2) to[#1] (\EndNode);
    }%
}


\begin{document}
\begin{tikzpicture}
    \ThreeDNASequences[Top]
        {C/blue!20,, A/cyan!30,,, G/blue!20,, G/cyan!30,, C/cyan!30}
        {C/blue!20, G/blue!20}
        {A/cyan!30, G/cyan!30, C/cyan!30};

    \ThreeDNASequences
        {A/green!20,, C/orange!50,,, C/green!20,, A/orange!50,, A/orange!50}
        {A/green!20, C/green!20}
        {C/orange!50, A/orange!50, A/orange!50};

    \ThreeDNASequences[Bottom]
        {C/blue!20,, A/cyan!30,,, G/blue!20,, G/cyan!30,, C/cyan!30}
        {C/blue!20, G/blue!20}
        {A/cyan!30, G/cyan!30, C/cyan!30};    

    % Now, draw the arrows as desired

\ConnectNodes[out=90, in=90]
        {Top-10.north east}
        {Top-0.north,Top-5.north};

    \ConnectNodes[out=-90, in=-90]
        {Bottom-13.south}
        {Bottom-2.south,Bottom-7.south,Bottom-9.south};  

\end{tikzpicture}
\end{document}

I looked at the image a bit more and perhaps this is a slightly different way to show the merging of the cells:

enter image description here

It looks better with just one side showing, but the MWE below has the code (commented though) for the top as well:

Code:

\documentclass[border=3pt]{article}
\usepackage{tikz}
\usepackage{xstring}
\usetikzlibrary{calc,fit,backgrounds}

\pgfdeclarelayer{background layer} 
\pgfdeclarelayer{foreground layer} 
\pgfsetlayers{background layer,main,foreground layer}

\newcommand*{\NodeSize}{0.5cm}%
\newcommand*{\YShiftBetweenRows}{-1cm}% Subsequent rows are shited down so they don't overlap
\tikzset{DNA Style/.style={minimum size=0.5cm, draw=gray, line width=1pt}}

\tikzset{Fit Line Style 1/.style={draw=olive, thick, dotted}}
\tikzset{Fill Style 1/.style={fill=olive!20}}

\tikzset{Fit Line Style 2/.style={draw=green!50!black, thick, dashed}}
\tikzset{Fill Style 2/.style={fill=green!20}}

\newlength{\YShift}% 
\newcounter{ColumnCounter}% Prefix for node labels

% Initialize - These are probably not needed, but prefer to set them
\setlength{\YShift}{0cm}% 
\setcounter{ColumnCounter}{0}


\newcommand*{\DNASequence}[2][Mark]{%
    % http://tex.stackexchange.com/questions/12091/tikz-foreach-loop-with-macro-defined-list
    \def\Sequence{#2}
    \foreach [count=\xi] \Label/\Color in \Sequence {%
        \pgfmathsetmacro{\XShift}{\NodeSize*\xi}%
        \IfStrEq{\Color}{}{\def\Color{white}}{}
        \edef\NodeName{#1-\arabic{ColumnCounter}}
        \begin{pgfonlayer}{foreground layer}
        \node [DNA Style, fill=\Color, xshift=\XShift] (\NodeName) {\Label};
        \end{pgfonlayer}
        \stepcounter{ColumnCounter}
    } 
}%


\newcommand*{\ThreeDNASequences}[4][Mark]{% #1 = tikzmark prefix
    \setcounter{ColumnCounter}{0}% reset column counter
    \begin{scope}[yshift=\YShift]
        \DNASequence[#1]{#2} 
        \pgfmathsetmacro{\Shift}{6cm}% Should compute this based on num of items in #1
        \begin{scope}[xshift=\Shift]
            \DNASequence[#1]{#3} 
        \end{scope}
        \pgfmathsetmacro{\Shift}{8cm}% Should compute this based on num of items in #2  
        \begin{scope}[xshift=\Shift]
            \DNASequence[#1]{#4} 
        \end{scope}
    \end{scope}
    \pgfmathsetlength{\YShift}{\YShift\YShiftBetweenRows}%
}

\newcommand*{\ConnectNodes}[3][]{%
    % #1 = draw options
    % #2 = ending node
    % #3 = list of starting nodes
    \def\ListOfEndNodes{#3}
    \foreach \EndNode in \ListOfEndNodes {%
    \draw[red, <->, thick, #1] (#2) to[#1] (\EndNode);
    }%
}

\newcommand*{\Fit}[3][]{\node [inner sep=2pt, #1, fit= #3] (#2) {};}%

\begin{document}
\begin{tikzpicture}

    \ThreeDNASequences[Top]
        {C/blue!20,, A/cyan!30,,, G/blue!20,, G/cyan!30,, C/cyan!30}
        {C/blue!20, G/blue!20}
        {A/cyan!30, G/cyan!30, C/cyan!30};

    \ThreeDNASequences
        {A/green!20,, C/orange!50,,, C/green!20,, A/orange!50,, A/orange!50}
        {A/green!20, C/green!20}
        {C/orange!50, A/orange!50, A/orange!50};

    \ThreeDNASequences[Bottom]
        {C/blue!20,, A/cyan!30,,, G/blue!20,, G/cyan!30,, C/cyan!30}
        {C/blue!20, G/blue!20}
        {A/cyan!30, G/cyan!30, C/cyan!30};

%    % Now, draw the arrows as desired
%
%   \ConnectNodes[red, out=100, in=80, ]
%       {Top-12.north}
%       {Top-2.north,Top-7.north,Top-9.north};
%
%   \ConnectNodes[blue, out=-80, in=-100]
%       {Bottom-10.south east}
%       {Bottom-2.south,Bottom-7.south,Bottom-9.south};


   % Bottom connections
    \Fit[Fit Line Style 1, Fill Style 1]{LeftB1}{(Top-0.north west) (Bottom-0.south east)}
    \Fit[Fit Line Style 1, Fill Style 1]{LeftB2}{(Top-5.north west) (Bottom-5.south east)}
    \Fit[Fit Line Style 1, Fill Style 1]{RightB1}{(Top-10.north west) (Bottom-11.south east)}


    \ConnectNodes[Fit Line Style 1, solid, <-, out=-120, in=-20]
        {RightB1.south}
        {LeftB1.south, LeftB2.south};

%   % Top connections
%   \Fit[Fit Line Style 2, Fill Style 2]{LeftT1}{(Top-2.north west) (Bottom-2.south east)}
%   \Fit[Fit Line Style 2, Fill Style 2]{LeftT2}{(Top-7.north west) (Bottom-7.south east)}
%   \Fit[Fit Line Style 2, Fill Style 2]{LeftT3}{(Top-9.north west) (Bottom-9.south east)}
%   \Fit[Fit Line Style 2, Fill Style 2]{RightT1}{(Top-12.north west) (Bottom-14.south east)}
%
%
%   \ConnectNodes[Fit Line Style 2, solid, out=100, in=80, <-]
%       {RightT1.north}
%       {LeftT1.north, LeftT2.north, LeftT3.north};
\end{tikzpicture}
\end{document}

1) and 3) Change none to white when filling the parts of the rectangles.

2) Change the stile for the line to have arrow tip only on one end (I wasn't sure which ones to you wanted to change, so I chose the two on the bottom).

4) One possibility is to draw the arrows first, and then place the labels (shifting them a little bit vertically, if required).

6) Loops fixed.

Additionally, I changed from the old \tikzstyle syntax to the newer \tikzset.

\documentclass[a4paper, 12pt]{article}
\usepackage{tikz}
\usetikzlibrary{shapes,arrows, positioning, calc, patterns, shadows, external}
%%%<
\usepackage{verbatim}
\usepackage{lmodern}
\usepackage{scrextend}
\usepackage{relsize}
\usepackage[active,tightpage]{preview}
\PreviewEnvironment{tikzpicture}
\setlength\PreviewBorder{5pt}%
%%%>
\usetikzlibrary{chains,fit,shapes, shapes.multipart}

\begin{document}
\changefontsizes{20pt}
\begin{tikzpicture}
\tikzset{
  every path/.style={very thick},
  line/.style={draw, -latex', thick},
  seq/.style={rectangle split,
    rectangle split horizontal,
    rectangle split parts=#1,
    minimum height=1cm,
    draw, anchor=center}
}

\matrix[row sep=0.5cm] at (0cm, 4cm)
{
\node [seq=10, rectangle split part fill={blue!20,white, cyan!30,white, blue!20, white, white, cyan!30,white,cyan!30}] (leftrow1)
{C \nodepart{two} \phantom{X} \nodepart{three} A \nodepart{four} \phantom{X} \nodepart{five} G \nodepart{six} \phantom{X} \nodepart{seven} \phantom{X} \nodepart{eight} G \nodepart{nine} \phantom{X} \nodepart{ten} C}; \\

\node [seq=10, rectangle split part fill={green!20, white, orange!50, white, green!20, white, white, orange!50, white, orange!50}] (leftrow2)
{A \nodepart{two} \phantom{X} \nodepart{three} C \nodepart{four} \phantom{X} \nodepart{five} C \nodepart{six} \phantom{X} \nodepart{seven} \phantom{X} \nodepart{eight} A \nodepart{nine} \phantom{X} \nodepart{ten} A}; \\

\node [seq=10, rectangle split part fill={green!20, white, yellow!50, white, green!20, white, white, yellow!50, white, yellow!50}] (leftrow3)
{A \nodepart{two} \phantom{X} \nodepart{three} T \nodepart{four} \phantom{X} \nodepart{five} C \nodepart{six} \phantom{X} \nodepart{seven} \phantom{X} \nodepart{eight} T \nodepart{nine} \phantom{X} \nodepart{ten} T}; \\

\node [seq=10, rectangle split part fill={blue!20, white, cyan!30, white, blue!20, white, white, cyan!30, white, cyan!30}] (leftrow4)
{C \nodepart{two} \phantom{X} \nodepart{three} A \nodepart{four} \phantom{X} \nodepart{five} G \nodepart{six} \phantom{X} \nodepart{seven} \phantom{X} \nodepart{eight} G \nodepart{nine} \phantom{X} \nodepart{ten} C}; \\

\node [seq=10, rectangle split part fill={blue!20, white, cyan!30, white, blue!20, white, white, cyan!30, white, cyan!30}] (leftrow5)
{C \nodepart{two} \phantom{X} \nodepart{three} A \nodepart{four} \phantom{X} \nodepart{five} G \nodepart{six} \phantom{X} \nodepart{seven} \phantom{X} \nodepart{eight} G \nodepart{nine} \phantom{X} \nodepart{ten} C}; \\

\node [seq=10, rectangle split part fill={red!50, white, orange!50, white, red!50, white, white, orange!50, white, orange!50}] (leftrow6)
{A \nodepart{two} \phantom{X} \nodepart{three} C \nodepart{four} \phantom{X} \nodepart{five} A \nodepart{six} \phantom{X} \nodepart{seven} \phantom{X} \nodepart{eight} A \nodepart{nine} \phantom{X} \nodepart{ten} A}; \\
};

\matrix[row sep=0.5cm] at (10cm, 4cm)
{
\node [seq=2, rectangle split part fill={blue!20, blue!20}] (tupletoprow)
{C \nodepart{two} G}; \\
\node [seq=2, rectangle split part fill={green!20, green!20}]
{A \nodepart{two} C}; \\
\node [seq=2, rectangle split part fill={green!20, green!20}]
{A \nodepart{two} C}; \\
\node [seq=2, rectangle split part fill={blue!20, blue!20}]
{C \nodepart{two} G}; \\
\node [seq=2, rectangle split part fill={blue!20, blue!20}]
{C \nodepart{two} G}; \\
\node [seq=2, rectangle split part fill={red!50, red!50}] (tuplebottomrow)
{A \nodepart{two} A}; \\
};

\matrix[row sep=0.5cm] at (13cm, 4cm)
{
  \node [seq=3, rectangle split part fill={cyan!30, cyan!30}] (tripletoprow)
  {A \nodepart{two} G \nodepart{three} C}; \\
  \node [seq=3, rectangle split part fill={orange!50, orange!50}]
  {C \nodepart{two} A \nodepart{three} A}; \\
  \node [seq=3, rectangle split part fill={yellow!50, yellow!50}]
  {T \nodepart{two} T \nodepart{three} T}; \\
  \node [seq=3, rectangle split part fill={cyan!30, cyan!30}]
  {A \nodepart{two} G \nodepart{three} C}; \\
  \node [seq=3, rectangle split part fill={cyan!30, cyan!30}]
  {A \nodepart{two} G \nodepart{three} C}; \\
  \node [seq=3, rectangle split part fill={orange!50, orange!50}]  (triplebottomrow)
  {C \nodepart{two} A \nodepart{three} A}; \\
};

\path [latex'-, thick] (leftrow6.one south) edge[out=270, in=270] node {}(tuplebottomrow);
\path [latex'-, thick] (leftrow6.five south) edge[out=270, in=270] node {}(tuplebottomrow);

\path [line] (leftrow1.three north) edge[out=90, in=90] node {}(tripletoprow);
\path [line] (leftrow1.eight north) edge[out=90, in=90] node {}(tripletoprow);
\path [line] (leftrow1.ten north) edge[out=90, in=90] node {}(tripletoprow);

%loop version works
\foreach \i [count=\x] in {one ,two ,three ,four ,five ,six ,seven ,eight ,nine ,ten }
    \node [below=8pt,fill=white] at (leftrow6.\i south) {$\mathsmaller{\mathbf{X_{\x}}}$};

% loop version works
\foreach \i in {1,...,6}
{
  \node [right] at (leftrow\i.ten east) {$\mathsmaller{\mathbf{X^{(\i)}}}$};
}

\node [below=8pt,fill=white] at (tuplebottomrow.south) {$(\mathsmaller{\mathbf{X_{0}}}, \mathsmaller{\mathbf{X_{4}}})$};
\node [below=8pt,fill=white] at (triplebottomrow.south) {$(\mathsmaller{\mathbf{X_{3}}}, \mathsmaller{\mathbf{X_{7}}}, , \mathsmaller{\mathbf{X_{9}}})$};

\node [above=8pt] at (tupletoprow.north) {(0, 4)};
\node [above=8pt,fill=white] at (tripletoprow.north) {(3, 7, 9)};

\end{tikzpicture}
\changefontsizes{12pt}
\end{document}

enter image description here

Tags:

Tikz Pgf