Is there an open source tool for producing bibtex entries from paper PDFs?

I use Zotero which in itself is a system for handling references, it comes as both a plugin to Firefox and as standalone. I use the standalone version to extract reference information from pdf and then export to, in my case, BibTeX .bib format. There are possibilities to export to other formats as well.

This doesn't answer your entire question, but may be useful (for example, you might have got the papers from a list of DOIs in the first place).

Assuming these are PDFs with CrossRef DOIs, if you can extract the DOI from the PDF, you can get citation directly from CrossRef's API. For the DOI 10.5555/12345678, the query:

http://api.crossref.org/works/10.5555/12345678/transform/application/x-bibtex

returns

@article{Carberry_2008,
    doi = {10.5555/12345678},
    url = {http://dx.doi.org/10.5555/12345678},
    year = 2008,
    month = {aug},
    publisher = {{CrossRef}},
    volume = {5},
    number = {11},
    pages = {1--3},
    author = {Josiah Carberry},
    title = {Toward a Unified Theory of High-Energy Metaphysics: Silly String Theory},
    journal = {Journal of Psychoceramics}
}

You could write a very small script to scan a list of DOIs and download the citations.

NB: My answer does not differentiate between open and closed sourced projects and I have not used any of the seemingly big list of solutions.

This SO answer suggests that the 2010 London Dev8D meeting, whatever that is, ran a contest for meta data extraction and resulted in pdfssa4met. I cannot find any documentation on the meeting and anything else that came out of it. The JISC ConnectedWorks project produced a review document that considered Zotero, Mendeley, Google Scholar, CB2BIB, Metadata Extraction Tool, pdfssa4met, pdfmeat, GNU libextractor, FITS, Apache Tika, XPDF, PDFTOHTML, pdf2xml, CiteSeerX, and Paperpile. This list seems to leave out some other solutions, although it is possible that they rely on the same underlying technology. This answers to this TeX.SX question suggests BibDesk and JabRef do metadata extraction. Papers also seems to do metadata extraction. This blog reviews the metadata extraction performance of WizFolio.

There is also Mr. dLib, pdfextract and TeamBeam which seem to have scholarly papers associated with them and therefore seem to be misssed by the JISC review (or developed afterwards). I also found exiftool.

Is there an open source tool for producing bibtex entries from paper PDFs?

Tags:

Bibtex

Citations

Tools

Software

Reference Managers

Related

Recent Posts