How do I convert a Word file's references to .bib format

I think that there two fundamental information-related issues which must be dealt with when performing a conversion of a formatted bibliography to a bib file:

  1. Information that needs to be deleted. This includes almost all formatting-related information, such as the italicizing of journal names and book titles; placing single or double quotes (smart or dumb) around title fields; commas, colons, and periods used as separators between fields; etc. If you use biblatex and biber, you can leave "accented" characters such as ö, é, and ß as is. If you use BibTeX, though, you should replace all accented characters with their LaTeX representations. For the examples mentioned earlier, write \"o, \'e, and \ss, respectively. For more on this subject, see the posting How to write “ä” and other umlauts and accented letters in bibliography.

    This is the easier of the two issues!

  2. Meta information that needs to be created. This is the much harder issue.

    Among the crucial pieces of meta-information that must be supplied are

    • the entry type: @article, @book, @misc, etc. I've seen "auto-converted" bib files in which the only entry type is @article, even though most of the entries should have been of type @book. Trying to clean up the resulting mess can be enormously frustrating.

    • the "key" -- the item used in the argument of \cite instructions. It's helpful in the long run to employ a key system that's at least somewhat mnemonic. Don't just use "A", "B", "C", etc.

    • the association of various pieces of information within a (formerly formatted) entry with fields. Suppose, for instance, that one has correctly identified a given entry as being of type @article and has settled on a key for this entry. One still has to decide which pieces of information copied from the formatted MS Word bibliography belong to the author, title, year, journal, volume, number, pages, url and (quite possibly!) further fields.

    • Within the author and editor fields, one has to make sure that (a) the keyword and is used correctly and consistently to separate individual authors and (b) that the first, von, surname, and junior components of every author have been identified correctly. For instance, Spanish names often contain a two-part surname that's separated by a space rather than a hyphen. It's important to notice that the surname of "Antonio Garcia Pascual" is "Garcia Pascual", not just "Pascual". (Movie actors frequently have two-part last names as well; cf. Kristin Scott Thomas and Helena Bonham Carter.) Do also be on the lookout for "corporate" authors. An author field of "National Aeronautic and Space Administration" may be correct in the narrow sense of not containing typos, but it'll confuse BibTeX into thinking that it's dealing with two separate authors separated by the keyword and: The first is named "National Aeronautic" and the second is named "Space Adminstration". (Ouch!) Be on the lookout for such cases and be prepared to insert extra pairs of curly braces around such corporate authors.

For a bib file to be really useful in the long run, one needs to take care of (at least) three further attributes, which I call the "three c's": completeness, correctness, and consistency.

  • Is the stuff that was obtained from a formatted external source (say, an MS Word file) complete? For instance, if only the initials of the authors' given names are provided in the source file, it's probably a very good idea to take some time to find out what the full first names are and to insert these full names in the bib file. That way, if at some future point you need to use a bibliographic style that prints out full first names rather than just initials, you needn't go back and find out what those first names may be.

  • Is the information obtained from the formatted external source correct? Are there mis-spellings of names and of words in the title, and is there missing information (e.g., volume numbers and page ranges) of journal articles? Are you taking care to use curly braces to encase words in title fields that shouldn't be converted to lowercase?

  • Once you've assembled all the bib entries and are reasonably confident that the information is complete and correct, you should also check if the information is consistent across entries. Do you have "The Review of Economics and Statistics" as the journal name in one entry and just "Review of Economics and Statistics" in another entry? Both versions are correct and complete, but they're not consistent with each other. Another consistency issue to look out for is the spelling of author names if their original spelling doesn't use the Latin alphabet. E.g., is it Chebyshef, Tschebysheff, Tschebishev, etc? Is it Ito, Itoh, Itou, etc? Is it Goto or Gotoh? Your readers will thank you if you provide consistency in these areas.

Unless you have a tool that (a) does a very good job handling the chores listed under "2" above and (b) provides some assistance with the final three bullet points, you may be better off performing the conversion entirely by hand. Just set up a few templates for entries of type @article, @book, etc. that contain the required and optional fields and take it from there.


Online tool is available to convert the text to bib format: please check the site http://text2bib.economics.utoronto.ca/