AWS Big Data Blog

Import Zeppelin notes from GitHub or JSON in Zeppelin 0.5.6 on Amazon EMR

Jonathan Fritz is a Senior Product Manager for Amazon EMR

Many Amazon EMR customers use Zeppelin to create interactive notebooks to run workloads with Spark using Scala, Python, and SQL. These customers have found Amazon EMR to be a great platform for running Zeppelin because of strong integration with other AWS services and the ability to quickly create a fully configured Spark environment. Many customers have already discovered Amazon S3 to be a useful way to durably store and move their notebook files between EMR clusters.

With the latest Zeppelin release (0.5.6) included on Amazon EMR release 4.4.0, you can now import notes using links to S3 JSON files, raw file URLs in GitHub, or local files. You can also download a note as a JSON file as well. This new functionality makes it easier to save and share Zeppelin notes, and it allows you to version your notes during development. The import feature is located on the Zeppelin home screen, and the export feature is located on the toolbar for each note. Additionally, you can still configure Zeppelin to store its entire notebook file in S3 by adding this configuration for zeppelin-env when creating your cluster (just make sure you have already created the bucket in S3 before creating your cluster):

[
  {
    "Classification": "zeppelin-env",
    "Properties": {
      
    },
    "Configurations": [
      {
        "Classification": "export",
        "Properties": {
        "ZEPPELIN_NOTEBOOK_STORAGE":"org.apache.zeppelin.notebook.repo.S3NotebookRepo",
          "ZEPPELIN_NOTEBOOK_S3_BUCKET":"my-zeppelin-bucket-name",
          "ZEPPELIN_NOTEBOOK_USER":"user"
        },
        "Configurations": [
          
        ]
      }
    ]
  }
]

Below is a screenshot of the import note functionality. You can specify the URL for a JSON in S3 or a raw file in GitHub here: