Query With Athena

In this section, we’ll use Amazon Athena to query data in S3 that was exported from QLDB and transformed to JSON. Since we’re querying the data stored in S3, our queries will not impose any load on our QLDB ledger or interfere with ongoing transactions. We can therefore use QLDB or analytic-style queries that QLDB is not optimized for.

Return to the main page of the AWS console by clicking the AWS logo in the upper-left hand corner of any console page. Go to the Athena page by typing “Athena” in the “Find Services” box or by clicking “Athena” under the “Analytics” section under all “All Services”.

Find Services

If this is your first time using Athena, the Athena “Welcome” page will appear. Click the “Get Started” button. Otherwise, you will be taken directly to the Athena “Query editor” page.

On the top right of the screen, click “Settings”.

In “Settings”, enter the S3 location to store output from Athena. You can use any S3 bucket in your account. The name of the S3 bucket created by CloudFormation in a previous exercise is available as an output parameter called “GlueOutputBucketName” in the “qldb-athena-lab-create-workflow” CloudFormation stack. Take the name of the bucket, add “s3://” to the beginning of it, “/athena/” to the end of it, and paste it into “query result location” in the “Settings” pop-up from Athena. Make sure there is a “/” at the end of the S3 URL or you will not be able to save. Click “Save”.

Settings

At the left of the screen, under “Data source” select “AWSDataCatalog”. Click the “Database” drop-down. You will see that two databases have been created. Select “qldb-json-db”.

Perform a simple query

Let’s preview the table. Enter the following query in the Athena editor and click “Run query”.

SELECT * FROM "qldb-json-db"."glue_job_output" limit 10;

Take a moment to explore the results. You’ll notice that the columns contain structured data. Pay particular attention to the transactioninfo and revisions columns.

Query based on statement type

Let’s assume that you want to know all of the INSERT queries that were done in the export. Run the query listed below.

A Presto Lambda Expression is used in the following query because transaction.statements is an array data type. The Lambda Expression allows the filter to be applied to the entire contents of the array.

SELECT FILTER(transactioninfo.statements,
         statements -> statements.statement LIKE 'INSERT%')
FROM "qldb-json-db"."glue_job_output"

The results show all queries that contained ‘INSERT%’ but the data in the column is not in an easily readable format.

Let’s perform a similar query that returns a much more usable query result. Take a note that UNNEST is used instead of FILTER. This will allow the array to be presented in multiple rows, compared to a single row like the query above.

SELECT unnest_statements.starttime,
         unnest_statements.statement
FROM "qldb-json-db"."glue_job_output", UNNEST(transactioninfo.statements) AS t(unnest_statements)
WHERE unnest_statements.starttime > to_iso8601(current_timestamp - interval '7' day)
        AND unnest_statements.statement LIKE 'INSERT%'

The results are in a much more readable format.

Query raw data

In this next query, we’ll create a view in Athena that will represent a QLDB ledger table. Let’s use VehicleRegistration from the QLDB tutorial.

The query below will pull the data from the export and create a VIEW in Athena. This will allow for analysis on a table structure similar to a QLDB table.

Run the command.

CREATE VIEW VehicleRegistration AS
SELECT DISTINCT listed_revisions.data.vin,
        listed_revisions.data.licenseplatenumber,
        listed_revisions.data.state,
         listed_revisions.data.city,
         listed_revisions.data.pendingpenaltyticketamount,
         listed_revisions.data.validfromdate,
         listed_revisions.data.validtodate,
         listed_revisions.data.owners,
         listed_revisions.metadata.version
FROM "qldb-json-db"."glue_job_output"
CROSS JOIN UNNEST(revisions) AS t(listed_revisions)
WHERE listed_revisions.data.owners IS NOT NULL

You will now see a new view named “vehicleregistration” under “Views” in the object browser on the left.

Run the below query to inspect the new VIEW.

SELECT *
FROM "qldb-json-db"."vehicleregistration";

Notice that the results are similar to the QLDB table. However, this view will have multiple revisions of each record if you’ve made updates.

Run the below query to see how many record versions there are per VIN.

SELECT vin,
         ARRAY_JOIN(ARRAY_AGG(version),
         ',') version
FROM "qldb-json-db"."vehicleregistration"
GROUP BY  vin;

Query on non-indexed fields

For this query, let’s query the data on a field that is not indexed in the QLDB table. Let’s see the primary person ID for all vehicles registered in Kent WA.

SELECT owners.primaryowner.personid
FROM "qldb-json-db"."vehicleregistration"
WHERE city = 'Kent' AND
state = 'WA'

Now we have our ID and can further the investigation.