Optional: QLDB Export and PySpark

In this section, we’ll use a dev-endpoint in AWS Glue to query data in S3 that was exported from QLDB. Since we’re querying the data stored in S3, our queries will not impose any load on our QLDB ledger or interfere with ongoing transactions.

Create the Glue end-point

Glue Dev End Points can incur heavy cost if left on. Ensure to disable or delete the end-point as soon as dev work is completed.

We’ll create all of our Glue components with CloudFormation in your desired region.

This template will be used to create the following items:

  • IAM Roles
  • Glue end-point
  • Sagemaker notebook
Region Launch CloudFormation Template
US East (Virginia) Launch Stack in us-east-1

or, download the file to your local workstation and create a CloudFormation stack by uploading the template.

You will see the Quick create stack page as shown below. In the Stack name block, leave the Stack name as qldb-athena-dev-endpoint.

Leave the Parameters block as-is as no inputs are needed.

Check the “I acknowledge that CloudFormation might create IAM resources.” box and click Create stack.

Create Stack

The stack will take several minutes to create. Its status will be updated to CREATE_COMPLETE.

End point and Jupyter Notebook

Return to the main page of the AWS console by clicking the AWS logo in the upper-left hand corner of any console page. Go to the Glue page by typing Glue in the Find Services box or by clicking AWS Glue under the Analytics section under all All Services.

Find Services

On the left of the console, select Dev endpoints

You will see an Endpoint name of QLDBLabEndpoint and it should have the Provisioning status of READY

Now, lets make sure the notebook is ready. On the left of the console, click on Notebooks under Dev endpoints

Make sure SageMaker noteboooks is selected and the “Status” if Ready for the Notebook name of aws-glue-${AWS::StackName}

Now check the box next to the Notebook name and then click Open notebook.

Select OK on the popup.

You should see your Jupyter Notebook as show below. If you are interested in how to use PySpark more, you can go through the examples in the Glue Examples directory.

Let’s add a notebook specific to QLDB exports. Download notebook file linked below and you might need to hold down the option keybaord key to download.

Once downloaded, click Upload on the top right of the main Jupyter page.

Select the qldb-id-lab3.ipynb file just downloaded and then click Upload again. Once uploaded, click on the file name to open the notebook.

You should see the notebook like below. Now go through the notebook and then this lab is completed. Due to the cost of long running glue endpoints, please clean up this lab once complete.