Iterating over PySpark GroupedData

The approach below should work for you, under the assumption that the list of unique values in the grouping column is small enough to fit in memory on the driver. Hope this helps!

import pyspark.sql.functions as F
import pandas as pd

# Sample data 
df = pd.DataFrame({'region': ['aa','aa','aa','bb','bb','cc'],
                   'x2': [6,5,4,3,2,1],
                   'x3': [1,2,3,4,5,6]})
df = spark.createDataFrame(df)

# Get unique values in the grouping column
groups = [x[0] for x in"region").distinct().collect()]

# Create a filtered DataFrame for each group in a list comprehension
groups_list = [df.filter(F.col('region')==x) for x in groups]

# show the results
[ for x in groups_list]


|region| x2| x3|
|    cc|  1|  6|

|region| x2| x3|
|    bb|  3|  4|
|    bb|  2|  5|

|region| x2| x3|
|    aa|  6|  1|
|    aa|  5|  2|
|    aa|  4|  3|