Where can I learn the basics of writing a lexer?

The Dragon Book is probably the definitive guide on the subject, although it can be a bit overwhelming. Language Implementation Patterns and Programming Language Pragmatics are great resources as well.


Basically there are two main approaches to writing a lexer:

  1. Creating a hand-written one in which case I recommend this small tutorial.
  2. Using some lexer generator tools such as lex. In this case, I recommend reading the tutorials to the particular tool of choice.

Also I would like to recommend the Kaleidoscope tutorial from the LLVM documentation. It runs through the implementation of a simple language and in particular demonstrates how to write a small lexer. There is a C++ and an Objective Caml version of the tutorial.

The classical textbook on the subject is Compilers: Principles, Techniques, and Tools also known as the Dragon Book. However this probably falls under the category of "fairly advanced write ups".