QLDB & Data Lake Lab

Architecture

In this workshop, you will learn how to export your data stored in Amazon QLDB to Amazon Simple Storage Service (S3), trigger an ETL Workflow with AWS Glue, and query the data as it sits in S3 using Amazon Athena. This process makes your QLDB ledger data available for query and analysis using AWS data lake technologies.

Amazon Athena is an interactive query service that makes it easy to analyze data directly in Amazon S3 using standard SQL. With a few actions in the AWS Management Console, you can point Athena at your data stored in Amazon S3 and begin using standard SQL to run ad-hoc queries and get results in seconds.

Athena is serverless, so there is no infrastructure to set up or manage, and you pay only for the queries you run. Athena scales automatically (executing queries in parallel) so results are fast, even with large datasets and complex queries.

AWS Glue is a fully managed ETL (extract, transform, and load) service that makes it simple and cost-effective to categorize your data, clean it, enrich it, and move it reliably between various data stores and data streams. AWS Glue consists of a central metadata repository known as the AWS Glue Data Catalog, an ETL engine that automatically generates Python or Scala code, and a flexible scheduler that handles dependency resolution, job monitoring, and retries. AWS Glue is serverless, so there’s no infrastructure to set up or manage.

This combination of technologies enables customers to keep a complete and verifiable history of all of their transactions with QLDB, while making that data available to the rest of the business for query and analysis in a serverless, cost-effective manner using Glue and Athena.

Duration

This lab should take approximately 45 minutes for most users.

Difficulty

Intermediate

Pre-requisites

To complete this lab, you must have access to an AWS account and sufficient privileges to administer QLDB ledgers. See Accessing Amazon QLDB for more info.