Cost Optimizing an ML Feature Store
A client recently started building a new machine learning (ML) architecture with a feature store as one of the key pieces. The feature store was already burning through a lot…
A client recently started building a new machine learning (ML) architecture with a feature store as one of the key pieces. The feature store was already burning through a lot…
We recently explored a project to retrieve data from a third-party service. They didn’t offer any push capabilities such as writing to a Kafka or Kinesis stream, or even a web-hook. But they did offer a WebSocket interface, so we explored whether we could use that as our streaming source. We didn’t go that route, but I was intrigued by the idea enough to make a proof-of-concept.
When I write Lambdas professionally, Python is my preferred language. It offers decent performance, a straightforward syntax, and high developer productivity. I’ve also used Java, both in demonstration apps and actual client work. But while I have some familiarity with other languages supported by the platform, I’ve never used them. So, with some downtime, I decided to implement the same Lambda in four different languages: Python, Java, JavaScript, and Go, to get a better sense of their strengths and weaknesses.
Partitioning is one of the easiest ways to improve the performance of your data lake, because it reduces the amount of data scanned. But implementing partitions can be surprisingly challenging, as can their effective use. In this post I look at several of the issues that you should consider when partitioning your data.
My prior posts used Lambda to do data transformation. But what if we could use a non-programmatic tool, in keeping with the Extract-Load-Transform mindset of the modern data pipeline. As it turns, we can: Amazon Athena can write data as well as query it. There are, of course, a few stumbles along the way. In this blog post I walk through the process of aggregating CloudTrail data using SQL.
In this final part of a three-part series, I add another aggregation step to combine a month’s worth of data and write it as Parquet.
So you want to execute some custom CUDA-based AI processing on a GPU, but don’t have the hardware? Have an AWS account? Try using the DLAMI machine instances. This article explains how to get started if you need OS-level access.
When I ran the Lambda from my previous post against Chariot’s CloudTrail repository, it took almost four minutes to process a single day’s worth of data. That seems like a long time, and as a developer I want to optimize everything I write. In this post I look into analyzing the current runtime, and options for improving it.
In this 45 minute talk, Ken Rimple gives a quick overview of AWS CodeBuild, then dives into a few of the challenges he’s faced, from dealing with build errors properly, configuring CodeBuild to run inside of AWS, testing locally so you don’t go crazy waiting for 15 minutes each time you deploy a new build, how to properly access your build artifacts and reports, running tools like Cypress, to building and deploying Docker containers to ECS, and more.
Amazon Web Services (AWS) is a collection of nearly 200 services. They can be intimidating to the newcomer, and offer many opportunities for mistakes: some expensive, some just inconvenient. In this Lunch and Learn, our panel of AWS experts look at some of the mistakes they made, and how these could have been avoided.
In this tutorial, Ken Rimple explains how to take a new application from concept to production in AWS in eight weeks.
In this 45 minute webinar, Ken Rimple will give a quick overview of AWS CodeBuild, then dive into a few of the challenges he’s faced.
Check out our YouTube playlist to watch all the talks from Emerging Technologies for the Enterprise 2020. Abstract Machine learning and IoT have become commonplace words in the enterprise workplace….
Check out our YouTube playlist to watch all the talks from Emerging Technologies for the Enterprise 2020. Abstract Ah, Serverless. The term that means a dozen different things to a…
Check out our YouTube playlist to watch all the talks from Emerging Technologies for the Enterprise 2020. Abstract One of the chief benefits of cloud computing is the ability to…
Check out our YouTube playlist to watch all the talks from Emerging Technologies for the Enterprise 2020. Abstract In this talk we look at the challenges of making geospatial data…
Check out our YouTube playlist to watch all the talks from Emerging Technologies for the Enterprise 2020. Abstract OpenJS Architect is the fastest and simplest framework for rapidly building web…
Amazon Web Services (AWS) is a collection of nearly 200 services. They can be intimidating to the newcomer, and offer many opportunities for mistakes: some expensive, some just inconvenient. In…
Looking to discuss an AWS project with our team? Contact us.