Blueprints enable data ingestion from common sources using automated workflows. At high level, Lake Formation
provides two type of blueprints:
- Database blueprints: This blueprints help ingest data from MySQL, PostgreSQL, Oracle, and SQL server databases to your data lake. You can ingest either as bulk load snapshot, or incrementally load new data over time.
- Log file blueprints: Ingest data from popular log file formats from AWS CloudTrail, Elastic Load Balancer, and Application Load Balancer logs.
In this exercise, we will use database snapshot as blueprint and will ingest entire TPC
database to your data lake.
- Click on the Blueprints option from the left navigation panel and then click on Use blueprint button.
- Select Database snapshot as the blueprint type.
- For the AWS Glue Database connection name, choose TPCGlueConnector which is created through CloudFormation to access the TPC database running on RDS.
- For the Source data path, enter "tpc/". Leave Exclude pattern options as default.
- Under Import target section, choose tpc as the target database. For the Target storage location, choose the S3 path which you used in the Data Lake Locations section. For Data format, choose Parquet as the format in which the data is written.
- Now move to Import options, enter a workflow name tpc-ingest. Choose LF-GlueServiceRole for the IAM role and enter "dl" as the Table prefix. Leave the rest of the fields as default.
- Choose Create. Wait for the status of the blueprints to go from Creating to Successfully created... message.
- Now select the newly created workflow tpc-ingest and start the workflow by selecting Start option from the Actions drop-down.
- It will take few minutes to ingest the TPC database to your data lake. During this phase, the Last run status column will reflect different phases of ingestion process. For this exercise, Discovering phase will take around ~4 minutes, Importing phase will take ~20 minutes.
- Move to the next chapter when the blueprint tpc-ingest is completed successfully.