How to automatically create a book of abstracts of a scientific conference?

You can try this:

\documentclass{article}
\usepackage{lipsum}% just for this example
\newcommand{\university}[1]{\cr \itshape #1}

\def\onlyabstracts{%
   \long\def\documentclass##1##2\title##3##4\begin##5{\title{##3}##4}
   \def\title##1{\vskip5ex{\centering\LARGE##1\par}}
   \let\author=\authorX \let\and=\andX
   \def\date##1{}
   \let\maketitle=\relax
   \let\abstractX=\abstract
   \def\abstract{\abstractX \aftergroup\endinput}
}
\def\authorX#1{\medskip{\leftskip=0pt plus1fill \rightskip=\leftskip \lineskip=8pt
   \noindent\andA#1\crcr\egroup\egroup\par}}
\def\andA{\vtop\bgroup\baselineskip=14pt\halign\bgroup\hfil\large##\unskip\hfil\cr}
\def\andX{\crcr\egroup\egroup\hskip3em plus1em\andA}

\begin{document}
\onlyabstracts
\input article1
\input article2
\input article3
...
\input article169
\end{document}

If you don't want to write 169times the \input article<number> you can use \loop. For example:

\newcount\tmpnum
\loop \advance\tmpnum by1 \input article\the\tmpnum\relax \ifnum\tmpnum<169 \repeat

I've written a short pythontex solution:

\documentclass{article}
\usepackage{lipsum}
\usepackage{pythontex}

\newcommand{\university}[1]{{\centering\itshape\large #1\par}\vskip5ex}
\newcommand{\abstitle}[1]{{\centering\LARGE #1\par}\vskip3ex}
\newcommand{\absauthor}[1]{{\centering\large #1\par}}
\newcommand{\absname}{{\centering\bfseries\abstractname\par}\vskip1ex}

\begin{document}

\begin{pycode}
from __future__ import print_function
import os

os.chdir("..") #delete this for standalone script

content = open("content.tex", "w+")
path = os.getcwd() + "/abstracts" #directory for abstract tex files

for file in os.listdir(path):
    if file.endswith(".tex"):
        with open(path + "/" + file) as input_data:
            for line in input_data:
                if "\\title{" in line: 
                    line = line.replace("\\title{","\\abstitle{")
                    print(line, file=content)  
                if "\\author{" in line:
                    line = line.replace("\\author{","\\absauthor{")
                    print(line, file=content)
                if "\\university{" in line:
                    print(line, file=content)
                    print("\\absname\n", file=content)  
                if line.strip() == "\\begin{abstract}": 
                    break
            for line in input_data:  
                if line.strip() == "\\end{abstract}":
                    break
                print(line, file=content)
                #additional stuff after abstract content
\end{pycode}
\IfFileExists{content.tex}{
 \input{content.tex}
}{Test}
\end{document}

Place all .tex-files in folder abstracts (or simply change the path) and run:

pdflatex filename.tex
pythontex filename.tex
pdflatex filename.tex

You really have to pay attention to the indents. The following commands can't be nested:

\title{...}
\author{...}
\university{...}

Otherwise, they appear twice.

You can customize everything in LaTeX or add some additional stuff with the python print-function.

enter image description here


I was threatened in the chatroom for writing an answer, so here it is. :)

Personally, I think there cannot be an automatic solution in the sense of working without a minimal effort on complying with a couple of rules. That said, I decided to take the scripting path and provide a very simple yet powerful solution to this question.

First things first: I admit a paper containing at least something along these lines:

\title{How to eat rice}
\author{Foo Bar and Baz Fooz}
...
\begin{abstract}
Here lies my awesome text.
\end{abstract}

I could use regular expressions in order to extract these fields, but I decided to implement my own finite state machine with a few tricks under the sleeve: a very simple counting mechanism in order to detect a straightforward syntactic nesting and avoid premature symbol extraction. Of course, the script is hardcoded, but that's the price when you want things that work. :)

Without further ado, here's the Python script. Hopefully the Python enthusiasts won't kill me for being excessively verbose, but that's just the way I like to code. :) In case you are lost, these are FSM's. Of course, the solution is trivial, but I wouldn't even think of expanding the idea of using them beyond this simple case study, since TeX is not even context free. But that's another story.

from Cheetah.Template import Template
import glob, sys, getopt

def getTitle(text):
    output = ""
    state = 0
    counter = 0
    for symbol in text:
        if state == 0:
            if symbol == '\\':
                state = 1
        elif state == 1:
            if symbol == 't':
                state = 2
            else:
                state = 0
        elif state == 2:
            if symbol == 'i':
                state = 3
            else:
                state = 0
        elif state == 3:
            if symbol == 't':
                state = 4
            else:
                state = 0
        elif state == 4:
            if symbol == 'l':
                state = 5
            else:
                state = 0
        elif state == 5:
            if symbol == 'e':
                state = 6
            else:
                state = 0
        elif state == 6:
            if symbol == '{':
                counter = counter + 1
                state = 7
            elif symbol not in [ ' ', '\n', '\t', '\r' ]:
                state = 0
        elif state == 7:
            output = output + symbol
            if symbol == '{':
                counter = counter + 1
            elif symbol == '\\':
                state = 8
            elif symbol == '}':
                counter = counter - 1
                if counter == 0:
                    return output[:-1]
        elif state == 8:
            output = output + symbol
            state = 7
    return "not found"

def getAuthor(text):
    output = ""
    state = 0
    counter = 0
    for symbol in text:
        if state == 0:
            if symbol == '\\':
                state = 1
        elif state == 1:
            if symbol == 'a':
                state = 2
            else:
                state = 0
        elif state == 2:
            if symbol == 'u':
                state = 3
            else:
                state = 0
        elif state == 3:
            if symbol == 't':
                state = 4
            else:
                state = 0
        elif state == 4:
            if symbol == 'h':
                state = 5
            else:
                state = 0
        elif state == 5:
            if symbol == 'o':
                state = 6
            else:
                state = 0
        elif state == 6:
            if symbol == 'r':
                state = 7
            else:
                state = 0
        elif state == 7:
            if symbol == '{':
                counter = counter + 1
                state = 8
            elif symbol not in [ ' ', '\n', '\t', '\r' ]:
                state = 0
        elif state == 8:
            output = output + symbol
            if symbol == '{':
                counter = counter + 1
            elif symbol == '\\':
                state = 9
            elif symbol == '}':
                counter = counter - 1
                if counter == 0:
                    return output[:-1]
        elif state == 9:
            output = output + symbol
            state = 8
    return "not found"

def getAbstract(text):
    output = ""
    state = 0
    counter = 0
    for symbol in text:
        if state == 0:
            if symbol == '\\':
                state = 1
        elif state == 1:
            if symbol == 'b':
                state = 2
            else:
                state = 0
        elif state == 2:
            if symbol == 'e':
                state = 3
            else:
                state = 0
        elif state == 3:
            if symbol == 'g':
                state = 4
            else:
                state = 0
        elif state == 4:
            if symbol == 'i':
                state = 5
            else:
                state = 0
        elif state == 5:
            if symbol == 'n':
                state = 6
            else:
                state = 0
        elif state == 6:
            if symbol == '{':
                counter = counter + 1
                state = 7
            elif symbol not in [ ' ', '\n', '\t', '\r' ]:
                state = 0
        elif state == 7:
            if symbol == 'a':
                state = 8
            else:
                state = 0
        elif state == 8:
            if symbol == 'b':
                state = 9
            else:
                state = 0
        elif state == 9:
            if symbol == 's':
                state = 10
            else:
                state = 0
        elif state == 10:
            if symbol == 't':
                state = 11
            else:
                state = 0
        elif state == 11:
            if symbol == 'r':
                state = 12
            else:
                state = 0
        elif state == 12:
            if symbol == 'a':
                state = 13
            else:
                state = 0
        elif state == 13:
            if symbol == 'c':
                state = 14
            else:
                state = 0
        elif state == 14:
            if symbol == 't':
                state = 15
            else:
                state = 0
        elif state == 15:
            if symbol == '}':
                counter = 0
                state = 16
            else:
                state = 0
        elif state == 16:
            output = output + symbol
            if symbol == '{':
                counter = counter + 1
            elif symbol == '\\':
                state = 17
            elif symbol == '}':
                counter = counter - 1
        elif state == 17:
            output = output + symbol
            if symbol == 'e':
                state = 18
            else:
                state = 16
        elif state == 18:
            output = output + symbol
            if symbol == 'n':
                state = 19
            else:
                state = 16
        elif state == 19:
            output = output + symbol
            if symbol == 'd':
                state = 20
                payload = 0
            else:
                state = 16
        elif state == 20:
            output = output + symbol
            if symbol == '{':
                counter = counter + 1
                state = 21
            elif symbol not in [ ' ', '\n', '\t', '\r' ]:
                state = 16
            else:
                payload = payload + 1
        elif state == 21:
            output = output + symbol
            if symbol == 'a':
                state = 22
            else:
                state = 16
        elif state == 22:
            output = output + symbol
            if symbol == 'b':
                state = 23
            else:
                state = 16
        elif state == 23:
            output = output + symbol
            if symbol == 's':
                state = 24
            else:
                state = 16
        elif state == 24:
            output = output + symbol
            if symbol == 't':
                state = 25
            else:
                state = 16
        elif state == 25:
            output = output + symbol
            if symbol == 'r':
                state = 26
            else:
                state = 16
        elif state == 26:
            output = output + symbol
            if symbol == 'a':
                state = 27
            else:
                state = 16
        elif state == 27:
            output = output + symbol
            if symbol == 'c':
                state = 28
            else:
                state = 16
        elif state == 28:
            output = output + symbol
            if symbol == 't':
                state = 29
            else:
                state = 16
        elif state == 29:
            output = output + symbol
            if symbol == '}':
                counter = counter - 1
                if counter == 0:
                    return output[:-(payload + 14)]
                else:
                    state = 16
            else:
                state = 16
    return "not found"

def getArticles(path):
    elements = glob.glob(path)
    articles = []
    for element in elements:
        with open(element, 'r') as handler:
            content = handler.read()
            article = {}
            article['title'] = getTitle(content)
            article['author'] = getAuthor(content)
            article['abstract'] = getAbstract(content).strip()
            articles.append(article)
    return articles

def generateTemplate(articles, template, output):
    with open(template, 'r') as handler:
        content = handler.read()
        merge = Template(source=content, searchList=[ { 'data': articles } ])
    with open(output, 'w') as handler:
        handler.write(str(merge))

def main(arguments):
    options, _ = getopt.getopt(arguments, 'p:t:o:', ['path=', 'template=', 'output='])
    if len(options) == 3:
        for flag, value in options:
            if flag in ('-p', '--path'):
                path = value
            if flag in ('-t', '--template'):
                template = value
            if flag in ('-o', '--output'):
                output = value
        articles = getArticles(path)
        generateTemplate(articles, template, output)

main(sys.argv[1:])

Now let's say I have two papers as follows, following the very same structure:

article1.tex

\documentclass{article}

\newcommand{\university}[1]{{(\itshape#1)}}

\title{Title of the first article}
\author{Author 1 \university{Foo University} and Author 2 \university{Potato U}}
\date{}

\begin{document}

\maketitle

\begin{abstract}
This is a lovely abstract for paper 1. How do you like my text?
\end{abstract}

\section{Introduction}

Here goes some text of article 1.

\end{document}

article2.tex

\documentclass{article}

\newcommand{\university}[1]{{(\itshape#1)}}

\title{Title of the second article}
\author{Author 1 \university{Disney University}, Author 2 \university{Duck Academy} and Author 3 \university{University of Nowhere}}

\begin{document}

\maketitle

\begin{abstract}
This is a lovely abstract for paper 2. How do you like my text?
\end{abstract}

\section{Introduction}

Here goes some text of article 2.

\end{document}

Output of both, as one would expect. :)

article1.pdf

a1

article2.tex

a2

Now let's use the script. First of all, we need to create a template using the Cheetah language description. Simply put, we will use a for loop where we iterate through all dictionaries in the data array. Note that $ is used as a placeholder and not as math mode, as we are used in TeX. Don't worry with them, they'll be gone once we merge everything. :) Since we are dealing with a dictionary, we access values from its keys, so $article['title'] will return the title of that article. :) Here's our template:

\documentclass{article}

\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}

\newcommand{\university}[1]{{(\itshape#1)}}

\begin{document}

\begin{tabular}{p{.33\linewidth}p{.33\linewidth}p{.33\linewidth}}
\hline
Title & Authors & Abstract\\
\hline
#for $article in $data
$article['title'] & $article['author'] & $article['abstract']\\
#end for
\hline
\end{tabular}

\end{document}

We can run the Python script providing the following parameters:

  • --path/-p <full path + file patterns>: the script will read all files in the provided path matching the file pattern. For example, /home/paulo/papers/*.tex will extract info from all .tex files in the papers folder under my home directory.
  • --template/-t <full template path>: the template to be populated with data extracted from files in the previous option. For example, /home/paulo/template.tex.
  • --output/-o <full output path>: the output file with all data merged into the template. For example, /home/paulo/abstracts.tex.

Let's run:

$ python code.py --path /home/paulo/papers/.tex --template /home/paulo/template.tex --output abstracts.tex

And that's it. This is the generated file:

\documentclass{article}

\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}

\newcommand{\university}[1]{{(\itshape#1)}}

\begin{document}

\begin{tabular}{p{.33\linewidth}p{.33\linewidth}p{.33\linewidth}}
\hline
Title & Authors & Abstract\\
\hline
Title of the first article & Author 1 \university{Foo University} and Author 2 \university{Potato U} & This is a lovely abstract for paper 1. How do you like my text?\\
Title of the second article & Author 1 \university{Disney University}, Author 2 \university{Duck Academy} and Author 3 \university{University of Nowhere} & This is a lovely abstract for paper 2. How do you like my text?\\
\hline
\end{tabular}

\end{document}

And this is the output:

a3

Hope it helps. :)

Tags:

Abstract