Get the size/length of an array column

You can use the size function and that would give you the number of elements in the array. There is only issue as pointed by @aloplop85 that for an empty array, it gives you value of 1 and that is correct because empty string is also considered as a value in an array but if you want to get around this for your use case where you want the size to be zero if the array has one value and that is also empty string.

//source data
val df = Seq((Array("a","b","c"), 2), (Array("a"), 4),(Array(""),6)).toDF("friends", "id")
//check the size of the array and see if it 1 and first element is empty string then set value to 0
val df1 = df.withColumn("no_of_friends",when(size(col("friends")) === 1 && col("friends")(0) === "" , lit(0)).otherwise(size(col("friends")) ))

You can verify the output as below:

enter image description here


You can use the size function:

val df = Seq((Array("a","b","c"), 2), (Array("a"), 4)).toDF("friends", "id")
// df: org.apache.spark.sql.DataFrame = [friends: array<string>, id: int]

df.select(size($"friends").as("no_of_friends")).show
+-------------+
|no_of_friends|
+-------------+   
|            3|
|            1|
+-------------+

To add as a new column:

df.withColumn("no_of_friends", size($"friends")).show
+---------+---+-------------+
|  friends| id|no_of_friends|
+---------+---+-------------+
|[a, b, c]|  2|            3|
|      [a]|  4|            1|
+---------+---+-------------+