Apache Zeppelin

To access the Apache Zeppelin's notebook, you must first ensure that your cluster’s master security group is configured to allow access to the Proxy Agent (port 8442) from your desktop.

Follow these steps to access the Zeppelin notebook and execute queries on notebook to explore different access scenarios:
  1. On the Amazon EMR console click on Clusters and select the cluster LF-EMRCluster, and then click on View Details.
  2. Click on Security groups for Master link.
  3. Select ElasticMapReduce-master security group. Under Inbound tab click on Edit inbound rules to allow traffic (type: Custom TPC) from your computer IP to port 8442 and click Save .

    Please Turn off your VPN, sometimes high levels ports are blocked by VPN.

  4. Clear the browser cache/cookies as your previous login into IdP account is still in the session OR open in Incognito mode if you are using Google Chrome.
  5. To access Apache Zeppelin, copy the EMRMasterNodeDNS value from CloudFormation stack output. Using your browser, navigate to the following URL. Ensure the URL includes the trailing slash at the end and updated EMRMasterNodeDNS address.
    https://EMRMasterNodeDNS:8442/gateway/default/zeppelin/ 
  6. Once the Proxy Agent’s certificate is accepted, your browser redirects you to your Identity Provider (IdP) login page to authenticate.
    • For Auth0/Okta use emr-developer@somecompany.com and password you provided to authenticate.
    • For AD FS use emr-developer@hadoop.com and password (Password1!) to authenticate.

    In the case of Okta, you have to choose the forgot password question for the first time. It may ask you to provide a forgot password question.

  7. Once authenticated, you will be redirected to Zeppelin.
  8. A notebook for this exercise is already loaded to your Zeppelin. Click on the notebook named - LakeFormation-EMR-Notebook. Now, from the notebook, you can execute the queries one by one to see different AWS Lake Formation granular-level access patterns.
  9. One query at the end of the notebook is expected to fail due to limited data permission. Now, go back to the AWS Lake Formation Console (on a different browser's tab) and grant SELECT permission to the IdP user on the tpc.dl_tpc_item table. Go back to your notebook and re-execute the query which failed with AccessDeniedException error and validate the user's access.