What is apache zeppelin?

Zeppelin is a great tool. It enables use different backend/languages in a single notebook. Here is a simple use case.

  1. Write some description using Markdown
  2. Prepare data using Shell. e.g. download files with curl/wget, inject to HDFS
  3. Doing data analysis with Spark
  4. Simple visualisation with SQL
  5. Export the result with Shell
  6. Publish graph with a link

All those steps can be done in a single notebook. And there are much more can be done in a single notebook.

Zeppelin is very close to Databricks.com online solution


What is a note book interface ?

An interface for interactively running code, exploring and visualizing data. They allow you to mix narrative, rich media and data.


Short Answer : Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.

Long answer :

  1. Zeppelin notebook gives you an easy, straightforward way to execute arbitrary code in a web notebook. You can execute Scala, SQL, and even schedule a job (via cron) to run at a regular interval.

  2. First it's easier to mix languages in the same notebook. You can do some SQL, scala, then markdown to document it all together. You can also easily convert your notebook into a presentation style - for maybe presenting to a management or using in dashboards.

  3. The Jupyter (formerly known as IPython) Notebook that has been extremely popular in the Python community. I cant use the word "replace" rather I would use similar kind of...

Further more .

  • Zeppelin supports Spark, PySpark, Spark R, Spark SQL with dependency loader.

  • Zeppelin lets you connect any JDBC data sources seamlessly. Postgresql, Mysql, MariaDB, Redshift, Apache Hive and so on.

  • Python is supported with Matplotlib, Conda, Pandas SQL and PySpark integrations.