How to Reference the External Jar in Flink

Flink's Command Line Interface (CLI) allows passing additional jar location paths using the -C option. We use it to pass dependencies to each job.

Our problem: Given that usually our jobs evolve during the whole project lifetime and that their external dependencies change their versions and that we run several processes in the same cluster, we wanted to select the exact jar versions to load in each run. Therefore, the $FLINK/lib directory was not enough for us.

Details: What we do is to distribute the jars to a fixed directory (different from $FLINK/lib) on every node. Later we use the CLI to start the job (not directly as the call is quite long, but using a bash script to abbreviate the call).


In general, building a fat jar is the best way to go. Not sure how big your far jar gets, that you thinks it is "too heavy"?

Copying jars to $FLINK/lib should work. However, you need to restart Flink such that the jars are added to Flink's classpath. Thus, this approach does not allow to dynamically add jars -- it should work for a bunch of stable jars however.

In order to manage jars in the whole cluster, it might be helpful to use a NFS folder as $FLINK/lib to keep all TaskManagers in sync. Or you simple write a bash script to distribute your jars.


If you want to avoid dependency conflict, don't copy your jars to ${FLINK}/lib. If you use yarn-cluster as your master, you can utilize -yt(--yarn-ship), it will copy jars onto hdfs and as your distributed program classpath.

Tags:

Apache Flink