AWS Resources

All Blog Posts

Analyzing Glue Jobs with AWS X-Ray

It’s possible to analyze your Glue jobs using just the logs they produce. Possible. But it’s not a pleasant task: your log messages are buried in messages from the framework, and in the case of a distributed PySpark job they’ll be spread amongst multiple CloudWatch log streams. In this post I look at an alternative: AWS X-Ray, which captures and aggregates “trace segments” that monitor specific sections of your code. With X-Ray, you can easily see where your jobs are spending their time, and compare different runs.

Friends Don’t Let Friends Use JSON (in their data lakes)

I’ve never been a JSON hater, but I’ve recently run into enough pain with JSON as a data serialization format that my feelings are edging toward dislike. However, JSON is a fact of life in most data pipelines, especially those that receive event-stream data from a third-party supplier. This post reflects on some of the problems that I’ve seen, and solutions that I’ve used

Featured Videos

All the AWS CodeBuild You Can Stomach in 45 Minutes

In this 45 minute talk, Ken Rimple gives a quick overview of AWS CodeBuild, then dives into a few of the challenges he’s faced, from dealing with build errors properly, configuring CodeBuild to run inside of AWS, testing locally so you don’t go crazy waiting for 15 minutes each time you deploy a new build, how to properly access your build artifacts and reports, running tools like Cypress, to building and deploying Docker containers to ECS, and more.

AWS: Things I Learned the Hard Way

Amazon Web Services (AWS) is a collection of nearly 200 services. They can be intimidating to the newcomer, and offer many opportunities for mistakes: some expensive, some just inconvenient. In this Lunch and Learn, our panel of AWS experts look at some of the mistakes they made, and how these could have been avoided.

All Videos

Know your Costs – IoT on AWS – A Philly Cloud Computing Event

Amazon uses a “pay as you go” pricing model: you pay for the resources that you use, and in most cases don’t need to pre-allocate resources. While this allows your business to scale, it means that each component of your data pipeline will incur a separate charge, which can obscure the overall cost of running the pipeline. This talk will examine those changes, along with strategies for partitioning those costs between your clients or organizational units.

Philly ETE 2016 #34 – Tim Wagner – Server-Less Design Patterns for the Enterprise with AWS Lambda

Apps no longer just run on smartphones and tablets – they process verbal commands we speak to devices like Amazon Echo, run as bots in Slack channels, and are rapidly evolving customer experiences that span a range of IoT devices in homes, cars, offices, and industrial settings. Crucial to the success of all these ecosystems is one central idea: Code has to not just run in the cloud, it has to be easy to get it there and scale it there. Serverless computing – calling AWS Lambda functions instead of managing heavyweight applications on infrastructure – is changing how developers think about backends, event-driving processing, and application design. Infrastructure, deployment, and software platform setup that used to take days or weeks of time vanishes, replaced by microservices that do one thing well, require zero effort to deploy, and scale automatically and implicitly just by using them. At the same time, AWS Lambda and other serverless systems have redefined cloud economics by eliminating the possibility of cold servers, creating a radical new price point for applications running in the cloud and freeing developers and COO’s alike from worrying about paying for unused capacity. In this talk we’ll define Serverless computing, examine the key trends and innovative ideas behind the technology, and look in detail at design patterns for big data, event processing, mobile backends, and more using AWS Lambda.

Looking to discuss an AWS project with our team? Contact us.