Handling whitespaces in http headers

See RFC 7230 Appendix B. Collected ABNF

header-field  = field-name ":" OWS field-value OWS
field-value   = *( field-content / obs-fold )
field-content = field-vchar [ 1*( SP / HTAB ) field-vchar ]
field-vchar   = VCHAR / obs-text

OWS means optional whitespace, field-value includes whitespace too. You have to ignore leading and trailing whitespace somehow.

Leading whitespace is not a problem, but trailing whitespace breaks streaming design. You can't parse HTTP 1.1 header in a pure streaming way (without storing anything).

For example: Connection: a b \r\n b c\r\n d \r\n

You have received a, than whitespace. You can't remove this whitespace, because you don't know whether it is a part of field-value or OWS. So you have to store whitespace before receiving non-whitespace byte or \r\n. Same thing for b + whitespace + c and d + whitespace.

Popular streaming parsers like nginx/nodejs just stops header value after first whitespace. It means that these parsers are not 100% compatible with RFC 7230.

OBS folding was deprecated, but there are many old webservers that are capable to produce it. So you have to deal with whitespace hell in header value anyway.

The only one way to keep your parser working in pure streaming way is to provide whitespace to user as it is without trimming.


According with paragraph 4.2 of RFC2616 (HTTP/1.1), field values might be preceded by whitespace, but not the field name:

Each header field consists of a name followed by a colon (":") and the field value. Field names are case-insensitive. The field value may be preceded by any amount of LWS (linear white space), though a single SP is preferred. Header fields can be extended over multiple lines by preceding each extra line with at least one SP or HT. Applications ought to follow "common form", where one is known or indicated, when generating HTTP constructs, since there might exist some implementations that fail to accept anything.

Tags:

Http