PySpark truncate a decimal


>>> from pyspark.sql.functions import pow, lit
>>> from pyspark.sql.types import LongType
>>> num_places = 3
>>> m = pow(lit(10), num_places).cast(LongType())
>>> df = sc.parallelize([(0.6643, ), (0.6446, )]).toDF(["x"])
>>> df.withColumn("trunc", (col("x") * m).cast(LongType()) / m).

Spark 1.5.2

You can simply use the format_number(col,d) function, which rounds the numerical input to d decimal places and returns it as a string. In your case:

raw_data = raw_data.withColumn("LATITUDE_ROUND", format_number(raw_data.LATITUDE, 3))

You could use the floor() function. So (without testing) I'd suggest:

raw_data = raw_data.withColumn("LATITUDE_TRUNCATED", floor(raw_data.LATITUDE))

But watch out for negative values - as in