How many RDDs does DStream generate for a batch interval?

Yes, there is exactly one RDD per batch interval, produced at every batch interval independent of number of records (that are included in the RDD -- there could be zero records inside).

If there wasn't, and RDD creation was conditioned on the number of elements, you wouldn't have synchronous (micro-batching) streaming, but rather a form of asynchronous processing.


It's very late to reply to this thread. But still, It's worth adding a few more points. Number of RDDs depends upon how many receivers you have in your application. That's why "sparkContext.read" will have multiple RDDs. But if you have only one receiver or Kafka as a source (receiver-less) in that case you will get only one RDD.