Serverless Databasing with the Aurora Data API

by
Tags:
Category:

Earlier this year I wrote that Amazon Aurora Serverless allows you to implement a fully serverless application with a relational database. In this post I dive into this, calling the Aurora Data API from a Python Lambda.

Before you go down this path …

The Data API is not for everyone, but then, neither is a fully-serverless application.

Perhaps the biggest barrier is that it’s a web-service, accessed via AWS SDK, rather than a library that’s targeted toward use by a particular high-level language. You must build your requests at a very low level, and can’t use higher-level frameworks such as object-relational mappers that expect to call a “native” API.

All operations are synchronous and time out after 45 seconds. You can configure a request such that the query keeps running after this timeout, but there’s no way to check the query status or retrieve results if you do (unlike the Redshift Data API). So this isn’t a good way to, for example, trigger long-running reports.

The response size from a Data API call is limited to 1 MB, so it can’t be used to retrieve lots of data.

Because it’s a web service, you’ll also find that the performance is worse than a native library using a DBMS-specific wire protocol over TCP. In my experiments, a simple select takes between 150 and 200 milliseconds running through the Data API, versus 2-3 ms when using the psycopg2 library directly.

And, as I’ve written in the past, I don’t think that Lambda is a good choice for interactive applications due to cold-start times. For responsiveness there’s no way to beat an always-on server using a pool of live database connections.

So what does that leave?

Basically, any back-end task that’s triggered by an external source and needs to update a database. Especially if the events tend to be bursty, so can benefit from Lambda’s “only pay for what you use” pricing and ability to scale. In other words, webhooks.

For example, if you use an email delivery service, you might receive updates when the service delivers the message to an end user, when that user opens the email, and if the user chooses to unsubscribe from future mailings. These callbacks are extremely bursty: nothing until you send out a mailing, then hundreds or thousands of updates in a short timeframe. And while most services retry if they get an error, eventually they will give up.

Another feature of webhooks that makes them attractive for Lambdas is that the service doesn’t care if you respond within single-digit milliseconds, unlike users. And in some cases, you might get a notification that something happened, but have to go back to the service to get details, which may be slow; this can interfere with request processing if you direct the webhook toward your main server.

In the past I’ve used separate application servers to handle webhooks, so that they don’t overwhelm regular website traffic. However, a combination of Lambda, API Gateway, and Aurora Serverless provides a viable alternative. And by using the RDS Data API, you can run outside of a VPC — very useful for the case where your webhook handler has to talk to some service on the Internet.

An example: handling user unsubscribes

If you’re sending marketing emails, allowing users to unsubscribe is a critical feature: angry users aren’t good for business. For that reason, email services put an unsubscribe link into all of the emails that they send. If a user clicks that link, the service can send you a notification using a webhook.

Here’s an example notification from the Twilio SendGrid documentation:

{
  "email": "example@test.com",
  "timestamp": 1513299569,
  "smtp-id": "<14c5d75ce93.dfd.64b469@ismtpd-555>",
  "event": "unsubscribe",
  "category": "cat facts",
  "sg_event_id": "sg_event_id",
  "sg_message_id": "sg_message_id"
}

When we receive this notification, we need to update the email preferences for this user. For this example, they’re stored in a table that looks like this (Postgres \d output):

                    Table "public.email_preferences"
      Column      |            Type             |       Modifiers        
------------------+-----------------------------+------------------------
 user_id          | text                        | not null
 email_address    | text                        | not null
 marketing_opt_in | boolean                     | not null default false
 updated_at       | timestamp without time zone | not null
Indexes:
    "email_preferences_pkey" PRIMARY KEY, btree (user_id)
    "email_preferences_idx_address" UNIQUE, btree (email_address)

The marketing_opt_in flag will be true if the user opts-in to marketing emails, and we want to set to false if the user unsubscribes or reports an email as spam. In a real-world implementation, you might have multiple types of marketing emails, such as new products, how-tos, special offers, and so on, and allow users to control their preferences individually.

If you’d like to try this out yourself, the sample code is available from GitHub, with a CloudFormation template to deploy the Lambda and database.

Database Configuration

First, and most important, the Data API only works (today) with Aurora Serverless. Not “regular” Aurora, and not “traditional” RDS.

Second, you must enable the HTTP Endpoint. And if you don’t know whether or not you have the endpoint enabled, well, I can’t see any place in the Console that will tell you that! Instead, use the describe-db-clusters CLI command and look for HttpEndpointEnabled, or attempt to modify the cluster in the Console and look for the Data API checkbox in the “Connectivity” section. Or, if you deployed with CloudFormation or a similar tool, look at the configuration file.

Third, you must permit password-based authentication. Behind the scenes, the Data API uses the username and password stored in a Secrets Manager secret. The Data API doesn’t know how to use IAM-generated access tokens.

Lastly, be aware that the cluster name, and therefore its ARN, are lowercase. I call this out because it burned me once: I was using a CloudFormation parameter as the database name, and also to construct the ARN (because CloudFormation does not provide this as an attribute of the resource). The parameter value was mixed case, and that was propagated to the ARN, but the database itself thought that its name was lowercase. As a result, all of my calls to the Data API failed.

The Lambda

For this posting, i’m going to gloss over the top-level operation of the Lambda, and dive into the mechanics of making a Data API call. The short description is that the Lambda is invoked from an HTTP Gateway, which means that it will have an invocation event that looks like this. You’ll need to extract the request body, which should consist of a JSON array containing zero or more individual notifications.

For each notification that’s either an unsubscribe or a spam report, we want to update the flag in the database:

try:
    email_address = rec['email']
    notification_type = rec['event']
    logger.debug(f"{email_address}: {notification_type}")
    if notification_type in ['spamreport', 'unsubscribe']:
        opt_out_user(email_address)
except:
    logger.warn("failed to process record", exc_info=True)

And that brings is to the core of this post:

def opt_out_user(email_address):
    logger.info(f"opt-out user: {email_address}")
    client().execute_statement(
        secretArn=secret_arn,
        resourceArn=db_arn,
        sql="""
            update EMAIL_PREFERENCES
            set    MARKETING_OPT_IN = false,
                   UPDATED_AT = :updated_at
            where  EMAIL_ADDRESS = :email_address
            """,
        parameters=[
            {                
                'name':     'email_address',
                'value':    {'stringValue': email_address}
            },
            {
                'name':     'updated_at',
                'typeHint': 'TIMESTAMP',
                'value':    {'stringValue': datetime.now().isoformat(sep=' ', timespec='seconds')}
            }
        ])

All Data API operations — DML or DDL — use the ExecuteStatement or BatchExecuteStatement API calls. The latter allows you to execute the same insert/update/delete statement multiple times with different parameters (and is arguably the better choice if this was a real-world implementation). There’s also an ExecuteSql operation, but it’s been deprecated.

The code is simple: you provide the information used to authenticate, along with the SQL to execute. The SQL can have “parameter placeholders,” identified by a leading colon (in this example, :updated_at and :email_address). These placeholders must correspond to names in the optional Parameters array.

The parameters array requires more explanation. First, note that parameter values are actually objects: at a minimum the provide the actual value, and the type of that provided value. I’m not sure why the type is needed, as it could be inferred from the JSON data structure, but I suspect it’s used to simplify the behind-the-scenes logic for setting parameters.

More interesting, a parameter can also include a type hint, which I use here to set the updated_at column (a timestamp) from a string value. This appears to be used to validate the parameter before calling the database, but based on query logs the value is passed to the database as-provided (rather than using a database-native type or an explicit cast).

Selects

The Lambda just performs an update; what if you want to perform a select?

response = client().execute_statement(
    secretArn=secret_arn,
    resourceArn=db_arn,
    database=db_name,
    includeResultMetadata=True,
    sql="""
        select *
        from   EMAIL_PREFERENCES
        """
)

data = response['records']
meta = response['columnMetadata']

You call the same API, ExecuteStatement, but now you care about what it returns: a records field that contains your data, and a columnMetadata field that describes that data. Here’s what the records field looks like with two records:

[
    [
        { "stringValue": "1b95e43c-f57f-40b6-a669-c10ff7643068" },
        { "stringValue": "user1@example.com" },
        { "booleanValue": true },
        { "stringValue": "2021-11-09 18:06:19" }
    ],
    [
        { "stringValue": "0c737b9a-cc20-491e-ab03-19e5c5bef18c" },
        { "stringValue": "me@example.com" },
        { "booleanValue": true },
        { "stringValue": "2021-11-09 18:06:19" }
    ]
]

Each record is itself a list, with each element in this list representing a single field in the result. If you know exactly what values you selected (a good habit!), then that’s enough. If not, you’ll need to look at metadata. This is also a list, containing one element per field in the output. Here’s a sample:

{
    "arrayBaseColumnType": 0,
    "isAutoIncrement": False,
    "isCaseSensitive": True,
    "isCurrency": False,
    "isSigned": False,
    "label": "user_id",
    "name": "user_id",
    "nullable": 0,
    "precision": 2147483647,
    "scale": 0,
    "schemaName": "",
    "tableName": "email_preferences",
    "type": 12,
    "typeName": "text"
}

In most cases, all you care about is the name field, so that you can associate names to the values from response['records'] list.

Transactions

For my Lambda I’m not using an explicit transaction, instead relying on the “auto-commit” behavior of the API. In a more complex handler, you might need to perform several queries that must either succeed or fail together. To do this, start by calling the BeginTransaction API, which returns a transaction identifier. You then pass this identifier to all operations within the transaction, and finish by calling CommitTransaction (or, on error, RollbackTransaction).

try:
    rsp = client.begin_transaction(
        secretArn=secret_arn, 
        resourceArn=db_arn)
    txn_id = rsp['transactionId']

    client.execute_statement(
        secretArn=secret_arn,
        resourceArn=db_arn,
        transactionId=txn_id,
        sql=#...
        parameters=#...
        )

    client.commit_transaction(
        secretArn=secret_arn,
        resourceArn=db_arn,
        transactionId=txn_id)
except:
    logger.warning("transaction failed", exc_info=True)
    client.rollback_transaction(
        secretArn=secret_arn,
        resourceArn=db_arn,
        transactionId=txn_id)

As with any database code that involves transactions, failing to either commit or rollback will potentially leave tables locked, blocking other sessions. The Data API imposes two limits to avoid this problem: inactive transactions (those that don’t have queries running) are rolled-back after three minutes, will active transactions are rolled-back after 24 hours (this is for the case where you started a transaction but never commit it).

Permissions

I probably should have put this section first, as you need to configure your IAM permissions before you do anything with the Data API. But if you’re using my CloudFormation template, they’re already set up.

First, you’ll need secretsmanager:GetSecretValue, to access the credentials for connecting to the database. Be sure to lock down the resource to just that secret!

And second, you’ll need rds-data permissions for whatever operations you use. I think that it’s simplest to just grant rds-data:*, but again, specify the database cluster ARN as the resource.

Wrapping Up

OK, I realize that this solution requires a VPC in which to deploy the database cluster. So it doesn’t actually fulfill the promise of “zero availability zones” that I mentioned in my last post. But I think that it does get you closer:when using an Aurora Serverless cluster with the data API, the availability zones seem almost an afterthought.