How to drop rows with nulls in one column pyspark

Dataframes are immutable. so just applying a filter that removes not null values will create a new dataframe which wouldn't have the records with null values.

df = df.filter(df.col_X. isNotNull())

Use either drop with subset:

df.na.drop(subset=["col_X"])

or isNotNull()

df.filter(df.col_X.isNotNull())

if you want to drop any row in which any value is null, use

df.na.drop()  //same as df.na.drop("any") default is "any"

to drop only if all values are null for that row, use

df.na.drop("all")

to drop by passing a column list, use

df.na.drop("all", Seq("col1", "col2", "col3"))

How to drop rows with nulls in one column pyspark

Tags:

Apache Spark

Pyspark

Spark Dataframe

Related

Recent Posts