Lake Formation Components

Lake Formation provides a console to set up and manage your data lakes. Lake Formation uses AWS Glue API operations through several language-specific SDKs and the AWS Command Line Interface (AWS CLI). For information about using the AWS CLI, see the AWS CLI Command Reference.

Lake Formation uses the Data Catalog to store metadata about data lakes, data sources, transforms, and targets. The AWS Glue API provides a managed infrastructure for defining, scheduling, and running extract, transform, and load (ETL) operations on your data. For more information, see AWS Glue API.

Lake Formation Console

You use the Lake Formation console to define your data lake. The console calls several API operations in the AWS Glue API to perform the following tasks:
  • Define AWS Glue objects such as jobs, tables, crawlers, and connections.
  • Schedule when crawlers run.
  • Define events or schedules for job triggers.
  • Search and filter lists of Lake Formation objects.

Data Catalog

The Data Catalog is your persistent metadata store. It is a managed service that lets you store, annotate, and share metadata in the AWS Cloud in the same way you would in an Apache Hive metastore. Each AWS account has one Data Catalog per AWS Region. It provides a uniform repository where disparate systems can store and find metadata to keep track of data in data silos, and use that metadata to query and transform the data.

Lake Formation provides a hierarchy of permissions to control access to databases and tables in a Data Catalog. You grant and revoke access to resources to manage access.