Unix Flex Regex for Multi-Line Comments

If you're required to make do with just regex, however, there is indeed a not-too-complex solution:


"/*"( [^*] | (\*+[^*/]) )*\*+\/
The full explanation and derivation of that regex is excellently elaborated upon here.
In short:
  • "/*" marks the start of the comment
  • ( [^*] | (\*+[^*/]) )* says accept all characters that are not * (the [^*] ) or accept a sequence of one or more * as long as the sequence does not have a '*' or a /' following it (the (*+[^*/])). This means that all ******... sequences will be accepted except for *****/ since you can't find a sequence of * there that isn't followed by a * or a /.
  • The *******/ case is then handled by the last bit of the RegEx which matches any number of * followed by a / to mark the end of the comment i.e \*+\/

  • You don't match C style comments with a simple regular expression in Flex; they require a more complex matching method based on start states. The Flex FAQ says how (well, they do for the /*...*/ form; handling the other form in just the <INITIAL> state should be simple).