In this section, we’ll use Amazon Athena to query data in S3 that was exported from QLDB and transformed to partitioned Parquet. Since we’re querying the data stored in S3, our queries will not impose any load on our QLDB ledger or interfere with ongoing transactions. We can therefore use QLDB or analytic-style queries that QLDB is not optimized for.
Return to the main page of the AWS console by clicking the AWS logo in the upper-left hand corner of any console page. Go to the Athena page by typing
Athena in the Find Services box or by clicking
Athena under the Analytics section under all All Services.
If this is your first time using Athena, the Athena Welcome page will appear. Click the
Get Started button. Otherwise, you will be taken directly to the Athena Query editor page.
On the top right of the screen, click Settings.
In Settings, enter the S3 location to store output from Athena. You can use any S3 bucket in your account. The name of the S3 bucket created by CloudFormation in a previous exercise is available as an output parameter called GlueOutputBucketName in the qldb-athena-lab-create-workflow CloudFormation stack. Take the name of the bucket, add
s3:// to the beginning of it,
/athena/ to the end of it, and paste it into query result location in the Settings pop-up from Athena. Make sure there is a
/ at the end of the S3 URL or you will not be able to save. Click
At the left of the screen, under Data source select
AWSDataCatalog. Click the Database drop-down. You will see that two databases have been created. Select
Let’s preview the table. Enter the following query in the Athena editor and click
SELECT * FROM "qldb-parquet-db"."full_export" limit 10;
Take a moment to explore the results. You’ll notice that the columns contain structured data. Pay particular attention to the
Let’s assume that you want to know all of the
INSERT queries that were done in the export. Run the query listed below.
A Presto Lambda Expression is used in the following query because
transaction.statements is an array data type. The Lambda Expression allows the filter to be applied to the entire contents of the array.
SELECT FILTER(transactioninfo.statements, statements -> statements.statement LIKE 'INSERT%') as statements FROM "qldb-parquet-db"."full_export"
The results show all queries that contained ‘INSERT%’ but the data in the column is not in an easily readable format.
Let’s perform a similar query that returns a much more usable query result. Take a note that
UNNEST is used instead of
FILTER. This will allow the array to be presented in multiple rows, compared to a single row like the query above.
SELECT unnest_statements.starttime, unnest_statements.statement FROM "qldb-parquet-db"."full_export", UNNEST(transactioninfo.statements) AS t(unnest_statements) WHERE to_iso8601(unnest_statements.starttime) > to_iso8601(current_timestamp - interval '7' day) AND UPPER(unnest_statements.statement) LIKE 'INSERT%'
The results are in a much more readable format.
In this next query, we’ll create a view in Athena that will represent a QLDB ledger table. Let’s use
VehicleRegistration from the QLDB tutorial.
The query below will pull the data from the export and create a VIEW in Athena. This will allow for analysis on a table structure similar to a QLDB table.
Run the command.
CREATE VIEW VehicleRegistration AS SELECT DISTINCT listed_revisions.data.vin, listed_revisions.data.licenseplatenumber, listed_revisions.data.state, listed_revisions.data.city, listed_revisions.data.pendingpenaltyticketamount, listed_revisions.data.validfromdate, listed_revisions.data.validtodate, listed_revisions.data.owners, listed_revisions.metadata.version FROM "qldb-parquet-db"."full_export" CROSS JOIN UNNEST(revisions) AS t(listed_revisions) WHERE listed_revisions.data.owners IS NOT NULL
You will now see a new view named
Views in the object browser on the left.
Run the below query to inspect the new
SELECT * FROM "qldb-parquet-db"."vehicleregistration";
Notice that the results are similar to the QLDB table. However, this view will have multiple revisions of each record if you’ve made updates.
Run the below query to see how many record versions there are per VIN.
SELECT vin, ARRAY_JOIN(ARRAY_AGG(version), ',') version FROM "qldb-parquet-db"."vehicleregistration" GROUP BY vin;
For this query, let’s query the data on a field that is not indexed in the QLDB table. Let’s see the primary person ID for all vehicles registered in Kent WA.
SELECT owners.primaryowner.personid FROM "qldb-parquet-db"."vehicleregistration" WHERE city = 'Kent' AND state = 'WA'
Now we have our ID and can further the investigation.
We have been querying the
full_export table but you might have noticed the
just_revisions table which is partitioned on qldb_table.
Let’s take a peak at the data.
SELECT * FROM "qldb-parquet-db"."just_revisions" limit 10;
With both Glue and Athena, there are many options on how the QLDB export data can be processed and viewed.
How will you use your QLDB export data?