aws

Analyzing Glue Jobs with AWS X-Ray

It’s possible to analyze your Glue jobs using just the logs they produce. Possible. But it’s not a pleasant task: your log messages are buried in messages from the framework, and in the case of a distributed PySpark job they’ll be spread amongst multiple CloudWatch log streams. In this post I look at an alternative: AWS X-Ray, which captures and aggregates “trace segments” that monitor specific sections of your code. With X-Ray, you can easily see where your jobs are spending their time, and compare different runs.

TechChat Tuesdays #57: AWS Re:Invent Announcements with Keith Gregory, AWS Practice Lead

Today we welcome Keith Gregory to the show! Keith is our AWS Practice Lead here at Chariot. We cover some announcements from AWS re:Invent, and do a deep dive into CodeCatalyst, OpenSearch Serverless, Lambda Snapstart, Redshift streaming ingestion from Kafka/Kinesis, and EventBridge Pipes.

Friends Don’t Let Friends Use JSON (in their data lakes)

I’ve never been a JSON hater, but I’ve recently run into enough pain with JSON as a data serialization format that my feelings are edging toward dislike. However, JSON is a fact of life in most data pipelines, especially those that receive event-stream data from a third-party supplier. This post reflects on some of the problems that I’ve seen, and solutions that I’ve used

Limiting Cross-stack References in CDK

Several years ago I wrote CloudFormation Tips and Tricks, in which I gave the advice to “use outputs lavishly, exports sparingly.” The reason is that when you export a value from one stack and import it into another you bind those stacks tightly together, and can’t change that exported value. For example, you might create … Read More

TechChat Tuesdays #53: Boycotting Wayland, and banishing the leap second

From the Chariot Blog We’ve always got great content on the Chariot blog, written by our developers: it’s got over 20 years of tech reviews, tutorials, and more. We’re celebrating our 20th anniversary here at Chariot. Check out this post written by our fearless leader and CEO, Mike Rappaport, on how Chariot’s unique approach has … Read More

The Serverless Stack (SST) Platform

Serverless Stack (serverless-stack.com) is another rapid serverless application development platform for AWS. SST (as it is also known) promises to streamline development and allow local debug of AWS Lambdas. It uses the AWS CDK and a set of its own constructs and configuration settings to make building serverless applications easier, and provide a more helpful … Read More

Managing Internet Access for AWS Workloads

Two months ago I didn’t give much thought to controlling a program’s access to the Internet. Then Log4Shell happened. This post looks at three ways that you can control what an in-VPC application is allowed to talk to.