Amazon Athena not parsing cloudfront logs

This is what I ended up with:

CREATE EXTERNAL TABLE logs (
  `date` date,
  `time` string,
  `location` string,
  `bytes` int,
  `request_ip` string,
  `method` string,
  `host` string,
  `uri` string,
  `status` int,
  `referer` string,
  `useragent` string,
  `uri_query` string,
  `cookie` string,
  `edge_type` string,
  `edget_requiest_id` string,
  `host_header` string,
  `cs_protocol` string,
  `cs_bytes` int,
  `time_taken` string,
  `x_forwarded_for` string,
  `ssl_protocol` string,
  `ssl_cipher` string,
  `result_type` string,
  `protocol` string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
  'input.regex' = '^(?!#.*)(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s*(\\S*)'
) LOCATION 's3://logs'

Note the double backslashes are intentional.

The format of the cloudfront logs changed at some point to add the protocol. This handles older and newer files.


Actually, all answers here have a small mistake: the 4th field must be a BIGINT, not a INT. Otherwise your >2GB file requests are not parsed correctly. After a long discussion with AWS Business Support, it appears that the correct format would be:

CREATE EXTERNAL TABLE your_table_name (
  `Date` DATE,
  Time STRING,
  Location STRING,
  SCBytes BIGINT,
  RequestIP STRING,
  Method STRING,
  Host STRING,
  Uri STRING,
  Status INT,
  Referrer STRING,
  UserAgent STRING,
  UriQS STRING,
  Cookie STRING,
  ResultType STRING,
  RequestId STRING,
  HostHeader STRING,
  Protocol STRING,
  CSBytes BIGINT,
  TimeTaken FLOAT,
  XForwardFor STRING,
  SSLProtocol STRING,
  SSLCipher STRING,
  ResponseResultType STRING,
  CSProtocolVersion STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION 's3://path_to_your_data_directory'
TBLPROPERTIES ('skip.header.line.count' = '2')