In this final part of a three-part series, I add another aggregation step to combine a month’s worth of data and write it as Parquet.
As part of our research on LLMs, we started working on a chatbot project using RAG, Ollama and Mistral. Our developer hardware varied between Macbook Pros (M1 chip, our developer machines) and one Windows machine with a "Superbad" GPU running WSL2 and Docker on WSL. All hail the desktop with the big GPU. We planned … Read More
Large Language Model (LLM) chatbots like ChatGPT are all the rage these days. You may be experimenting with building one of your own using a model runtime engine like Ollama, possibly accessing it with the LangChain API, maybe integrating it with a Vector Database for your custom data and using Retrieval Augmented Generation (RAG), or … Read More
As I’ve written in the past, large numbers of small files make for an inefficient data lake. But sometimes, you can’t avoid small files. Our CloudTrail repository, for example, has 4,601,675 files as-of this morning, 44% of which are under 1,000 bytes long. In this post, I develop a Lambda-based data pipeline to aggregate these files, storing them in a new S3 location partitioned by date. Along the way I call out some of the challenges that face such a pipeline.
ViteJS (Vite) has rapidly emerged as one of the most exciting tools in the modern web development ecosystem. Vite offers developers a highly efficient and flexible build process with sane defaults to get a project up and running quickly. One of the many features that makes Vite truly stand out is its extensible Plugin API. … Read More
Earlier this year at PhillyETE, we had a great talk by Avdi Grimm and Jessica Kerr, REPLs All The Way Up: A Rubric For Virtuous Feedback Loops. In this talk, one of the key theses was to find ways to make exploring your code easier, via REPLs, scenario setups and other means. Many years ago … Read More
In this post I walk through several execution plans, explain what Redshift is doing in each, and highlight the parts of plans that indicate problems.
Are you running a database with RDS? Would you like to manage it via migrations? This article explains how to use AWS CodeBuild to keep a database schema updated using Flyway, an open-source data migrations tool. Configuration is outlined via CloudFormation snippets. An AWS example repository is provided.
Introduction Today’s small microcontrollers offer impressive functionality and provide an opportunity to replace older, more expensive software and hardware. Consider the case where a facility wants to have control over devices or equipment, with rules that evaluate telemetry from sensors and activate, deactivate or regulate equipment and other devices. Many large facilities have systems in … Read More
Writing AWS Lambda functions using the Serverless Framework makes it easy to manage dependencies that your functions depend on as far as third-party packages or keeping track of the AWS resources that your service utilizes. The Serverless Framework automates a lot of the resource allocation and packaging of the functions with a CLI tool named … Read More