How to find the percentage of NAs in a data.frame?

If you are interested to find percentage of complete cases.

Using Same Example mentioned here.

x = data.frame(x = c(1, 2, NA, 3), y = c(NA, NA, 4, 5))

Output :

   x  y
1  1 NA
2  2 NA
3 NA  4
4  3  5

Finding Complete cases:

complete.cases(x)

Output :

[1] FALSE FALSE FALSE  TRUE

Percentage of complete cases:

mean(complete.cases(x))

Output:

[1] 0.25

That means 25% of complete rows are available in data provided. i.e Only fourth row is complete rest all contains NA values.

Cheers!


Updated version of dplyr which doesnt support funs anymore:

x%>% summarise_all(list(name = ~sum(is.na(.))/length(.)))


x = data.frame(x = c(1, 2, NA, 3), y = c(NA, NA, 4, 5))

For the whole dataframe:

sum(is.na(x))/prod(dim(x))

Or

mean(is.na(x))

For columns:

apply(x, 2, function(col)sum(is.na(col))/length(col))

Or

colMeans(is.na(x))

You could also use dplyr::summarize_all for the column-wise proportions.

x %>% summarize_all(funs(sum(is.na(.)) / length(.)))

Which will give

     x   y
1 0.25 0.5

Tags:

Csv

R

Dataframe

Na