Convert JSON file to a CSV file using R

There are several issues with your text file. As you already noticed, you need to remove the lines of the form /* 0 */. What results is still not valid json. If you want to have several json objects in a file, you need to store them in an array. The json objects are the parts that are closed in curly brakets, e.g.,

{
  "_id" : "93ccbdb6-8947-4687-8e12-edf4e40d6650",
  ...
  "totalRecords" : 0,
  "status" : "SUCCESS"
}

The structure of an array of objects is as follows:

[
  {
    ...
  },
  {
    ...
  }
]

To get your file in shape, you need to add a comma between the objects and add the square brackets. You could do this as follows:

raw <- readLines("mydata.txt")

# get rid of the "/* 0 */" lines
json <- grep("^/\\* [0-9]* \\*/", raw, value = TRUE, invert = TRUE)

# add missing comma after }
n <- length(json)
json[-n] <- gsub("^}$", "},", json[-n])

# add brakets at the beginning and end
json <- c("[", json, "]")

This can be read by fromJSON(), so I assume it is valid json:

library(jsonlite)
table <- fromJSON(json)

The table is nested, that is, some of the tables cells contain a data frame or a list themselves. For example,

table[1,2]
##   travelDate travelDuration shopperDuration oneWay  userId                              queryId
## 1   20151206              7              30  FALSE ATP1KKP 93ccbdb6-8947-4687-8e12-edf4e40d6650
##               subRequests
##     1 WAS, LON, AA, , 1,2

You could use flatten() from the jsonlite package, to get a table with one level of nesting less

flatten(table)[1:3, c(1, 6, 12)]
##                                    _id uiSearchRequest.travelDate uiSearchRequest.subRequests
## 1 93ccbdb6-8947-4687-8e12-edf4e40d6650                   20151206         WAS, LON, AA, , 1,2
## 2 b736c374-b8ae-4e99-8073-9c54517fecd5                   20151206         WAS, LON, AA, , 1,2
## 3 3312605f-8304-4ab8-96d6-6e1a03cfbd9e                   20151206         LON, IAD, AA, , 1,2

The last column is still a list. There are many ways you could handle this. One possibility, is to create a row per subrequest, where the contents of all the other columns (X_id, downloadCount, etc.) are repeated. (This is almost the form that you give in your question, with the only difference that you left cells empty in the reapeated columns, while I repeat the contents.) This is how it can be done:

table <- flatten(fromJSON(json))
tab_list <- lapply(1:nrow(table),
                  function(i) data.frame(table[i, -12], table[i, 12],
                              stringsAsFactors = FALSE))
library(dplyr)
flat_table <- bind_rows(tab_list)

The second line creates a list of data frames. These are combined into a single data frame using bind_rows() from dpylr. (To be more precise, flat_table will be a tbl_df, but the difference to a data.frame is small.) This can then be written to a csv file in the usual way:

write.csv(flat_table, file = "mydata.csv")

Tags:

Csv

Json

R