EMR Notebooks

Follow these steps to create Amazon EMR Notebook and execute queries through notebook to explore different access patterns:
  1. If you already updated your EMR security group, you can move to step #5.
  2. On the Amazon EMR console click on Clusters and select the cluster LF-EMRCluster, and then click on View Details.
  3. Click on Security groups for Master link.
  4. Select ElasticMapReduce-master security group. Under Inbound tab click on Edit inbound rules to allow traffic (type: Custom TPC) from your computer IP to port 8442 and click Save .

    Please Turn off your VPN, sometimes high levels ports are blocked by VPN.

  5. Clear the browser cache/cookies as your previous login into IdP account is still in the session OR open in Incognito mode if you are using Google Chrome.
  6. To create an EMR notebook, open the Amazon EMR console. Choose Notebooks and click on Create notebook button.
  7. Enter a Notebook name LF-Notebook-<your-idp-name> and an optional description. Select the Amazon S3 notebook path under Notebook location section.
  8. Select Choose an existing cluster and click on Choose button. Select the EMR cluster LF-EMRCluster, which is created by the CloudFormation template.
  9. Review all the information and finish up the creation process by clicking on the Create notebook button. Notebook will get created and wait for the Ready state.
  10. Clear the browser cache/cookies as your previous login into IdP account is still in the session OR open in Incognito mode if you are using Google Chrome.
  11. Once notebook is in ready state, click on either Open In Jupyter or Open in JupyterLab button.
  12. You will be redirected to the Proxy Agent on the Amazon EMR cluster. Once the Proxy Agent’s certificate is accepted, your browser redirects you to your Identity Provider (IdP) login page to authenticate.
    • For Auth0/Okta use emr-developer@somecompany.com and password you provided to authenticate.
    • For AD FS use emr-developer@hadoop.com and password (Password1!) to authenticate.

    In the case of Okta, you have to choose the forgot password question for the first time. It may ask you to provide a forgot password question.

  13. Once authenticated, you will be redirected to the Jupyter notebook. Download an existing EMR Notebook script LF-EMR-Jupyter.ipynb into your local computer.
  14. Import the LF-EMR-Jupyter.ipynb file to your Jupyter Notebook.
  15. Once imported, you can execute the queries one by one to see different AWS Lake Formation granular-level access patterns.
  16. One query at the end of the notebook is expected to fail due to limited data permission. Now, go back to the AWS Lake Formation Console (on a different browser's tab) and grant SELECT permission to the IdP user on the tpc.dl_tpc_item table. Go back to your notebook and re-execute the query which failed with AccessDeniedException error and validate the user's access.