Query With Athena

In this section, we’ll use Amazon Athena to query data in S3 that was exported from QLDB and transformed to partitioned Parquet. Since we’re querying the data stored in S3, our queries will not impose any load on our QLDB ledger or interfere with ongoing transactions. We can therefore use QLDB or analytic-style queries that QLDB is not optimized for.

Return to the main page of the AWS console by clicking the AWS logo in the upper-left hand corner of any console page. Go to the Athena page by typing Athena in the Find Services box or by clicking Athena under the Analytics section under all All Services.

Find Services

If this is your first time using Athena, the Athena Welcome page will appear. Click the Get Started button. Otherwise, you will be taken directly to the Athena Query editor page.

On the top right of the screen, click Settings.

In Settings, enter the S3 location to store output from Athena. You can use any S3 bucket in your account. The name of the S3 bucket created by CloudFormation in a previous exercise is available as an output parameter called GlueOutputBucketName in the qldb-athena-lab-create-workflow CloudFormation stack. Take the name of the bucket, add s3:// to the beginning of it, /athena/ to the end of it, and paste it into query result location in the Settings pop-up from Athena. Make sure there is a / at the end of the S3 URL or you will not be able to save. Click Save.


At the left of the screen, under Data source select AWSDataCatalog. Click the Database drop-down. You will see that two databases have been created. Select qldb-parquet-db.

Perform a simple query

Let’s preview the table. Enter the following query in the Athena editor and click Run query.

SELECT * FROM "qldb-parquet-db"."full_export" limit 10;

Take a moment to explore the results. You’ll notice that the columns contain structured data. Pay particular attention to the transactioninfo and revisions columns.

Query based on statement type

Let’s assume that you want to know all of the INSERT queries that were done in the export. Run the query listed below.

A Presto Lambda Expression is used in the following query because transaction.statements is an array data type. The Lambda Expression allows the filter to be applied to the entire contents of the array.

SELECT FILTER(transactioninfo.statements,
         statements -> statements.statement LIKE 'INSERT%') as statements
FROM "qldb-parquet-db"."full_export"

The results show all queries that contained ‘INSERT%’ but the data in the column is not in an easily readable format.

Let’s perform a similar query that returns a much more usable query result. Take a note that UNNEST is used instead of FILTER. This will allow the array to be presented in multiple rows, compared to a single row like the query above.

SELECT unnest_statements.starttime,
FROM "qldb-parquet-db"."full_export", UNNEST(transactioninfo.statements) AS t(unnest_statements)
WHERE to_iso8601(unnest_statements.starttime) > to_iso8601(current_timestamp - interval '7' day)
        AND UPPER(unnest_statements.statement) LIKE 'INSERT%'

The results are in a much more readable format.

Query raw data

In this next query, we’ll create a view in Athena that will represent a QLDB ledger table. Let’s use VehicleRegistration from the QLDB tutorial.

The query below will pull the data from the export and create a VIEW in Athena. This will allow for analysis on a table structure similar to a QLDB table.

Run the command.

CREATE VIEW VehicleRegistration AS
SELECT DISTINCT listed_revisions.data.vin,
FROM "qldb-parquet-db"."full_export"
CROSS JOIN UNNEST(revisions) AS t(listed_revisions)
WHERE listed_revisions.data.owners IS NOT NULL

You will now see a new view named vehicleregistration under Views in the object browser on the left.

Run the below query to inspect the new VIEW.

FROM "qldb-parquet-db"."vehicleregistration";

Notice that the results are similar to the QLDB table. However, this view will have multiple revisions of each record if you’ve made updates.

Run the below query to see how many record versions there are per VIN.

         ',') version
FROM "qldb-parquet-db"."vehicleregistration"
GROUP BY  vin;

Query on non-indexed fields

For this query, let’s query the data on a field that is not indexed in the QLDB table. Let’s see the primary person ID for all vehicles registered in Kent WA.

SELECT owners.primaryowner.personid
FROM "qldb-parquet-db"."vehicleregistration"
WHERE city = 'Kent' AND
state = 'WA'

Now we have our ID and can further the investigation.


Query revision only data

We have been querying the full_export table but you might have noticed the just_revisions table which is partitioned on qldb_table.

Let’s take a peak at the data.

SELECT * FROM "qldb-parquet-db"."just_revisions" limit 10;

With both Glue and Athena, there are many options on how the QLDB export data can be processed and viewed.

How will you use your QLDB export data?