How to implement continuously running dataflow blocks in TPL?

As a gross oversimplification, DataFlow is a way to process a bunch of objects using a set of methods. It doesn't provide or expect any specific way of creating these objects.

If you want a pipeline to stay alive, just don't terminate the application. If you don't want to use a Console application, create a service that builds the pipeline and sends objects to it until it closes.

Messages are just objects that you will create by reading data, in response to events (whatever that means) or any other way.

As for external events, what do you mean by that? That someone will send data to your application? There are many ways this can happen:

If the data comes from another console application, you can pipe the results of one application to the other, parse data coming from the input stream of your command-line application, create messages and pass them to the pipeline
If you want a service listening for requests, you can host a .NET Pipe, WCF or Web API service to listen for calls and pass the posted data to the pipeline.
If the data comes from a database, you may be able to poll for changes and send any changed data to the pipeline.

The point is, Dataflow is about processing data, not about listening to events. It's not a full blown distributed agent system, if that's what you were looking for.

Your batch size is defined by the variable 'BoundingCapacity' in the batchblock constructor. A batch will be posted when:

A number of posts equal to the batch size have been received (specified in constructor)
The batch block is marked for completion
The triggerbatch method is called

It seems like you want a batch to post when the bath size is met or a timeout occurs. If this is the case, and if batch size is not critical, I would really just add a recurring interval to the timer you have and make the object downstream of the batchblock ignore empty posts.

What you may actually want, and what is most inline with the philosophy of dataflow programming, is to create a new batch block when you begin posting a series of items and then completing it when done or when a timeout occurs. New posts would create a new batchblock if one does not already exist.

The problem with trying to implement a timeout timer around the batchblock that only fires based on the first trigger is that you will either need to count and verify posts to bufferblock or you will need to watch posts from bufferblock. Both of these scenarios will create a lot of ugliness and/or violate block encapsulation.

How to implement continuously running dataflow blocks in TPL?

Tags:

C#

Task Parallel Library

.Net 4.5

Tpl Dataflow

Related

Recent Posts