How to use a command-line tool to extract a bibtex reference that contains a search term?

bibtool --select {"Smith"} <file>.aux -o <file>.bib creates a new bibliography data file which has only entries with Smith.

bibtool should be part of your TeX distribution.


The bib2bib command line tool provides pretty flexible and reliable ways to filter/extract bibtex entries according to certain criteria. This (little known) utility is part of the bibtex2html tool suite. (Note: you have to look for the PDF documentation, the HTML documentation does not discuss bib2bib!)

For instance, to extract all entries where Smith is an author, one just writes:

bib2bib -oc smith-citations -ob smith.bib -c 'author : "Smith"' mydatabase.bib

Multiple conditions can be grouped and combined with and/or/not, so you could extend the above query to include also references that mention Smith in the title, abstract, or whatever.

A big plus of bib2bib compared to "hand-crafted" awk/sed/grep solutions is that it deals very well with string constants, crossrefs and so on (you can either include all dependencies in the output file or let bib2bib expand them, so that the entries are self-contained).


This kind of thing is a complete nuisance to do properly with the standard textutils, because Bibtex is not properly specified and the tools are not good at processing files if they are not structured by line.

You can get something close to what you want using tr to convert bib items into lines and back again. E.g.:

<input.bib tr "@\n" "\n\0"|grep -a article|(tr "\n\0" "@\n")

will select all bib items with the word article in them.

Two issues:

  1. The way this handles @ is crufty: occurrences within entries will cause them to be split into two lines and the first @ will be dropped. This can be fixed, but it will make the scripts more complex.
  2. The lines between tr-conversions contain lots of \0 characters, which means that many textutils will either not handle them or will need to have switches passed to them: here grep needs the -a switch.

Postscript

The following function definitions perform a slightly more sophisticated conversion that handles @ better (and uses sed):

bib2unix () { if test $# > 0; then cat "$@"; else cat; fi | tr "@\n" "\n\0" | sed "2,\$s/^/@/"; }
unix2bib () { tr -d "\n" | tr "\0" "\n"; }

which can be used so:

bib2unix input.bib |grep -a @article| unix2bib

to select all articles and cat them to STDOUT.

Tags:

Bibtex

Tools