Blog

Experiences in Fine-Tuning LLMs: Time + Power = Potato?

Embarking on the journey to fine-tune large language models (LLMs) can often feel like setting sail into uncharted waters, armed with hope and a map of best practices. Yet, despite meticulous planning and execution, the quest for improved performance doesn’t always lead to the treasure trove of success one might anticipate. And I know you may be wondering how potatoes come into play here, but I promise that we’ll get to it. From the challenges of data scarcity to resource…

PostgreSQL Text Search

Introduction A common problem in software development is searching through text documents. For example, if you have a database of recipes, you might want to search by one or more ingredients, or if you have a collection of server log files, you might want to search for all errors that did not come from the database. This type of functionality is called “text search”. There are a lot of text search libraries like Lucene, or applications like ElasticSearch (which is…

Aggregating Files in your Data Lake – Part 1

As I’ve written in the past, large numbers of small files make for an inefficient data lake. But sometimes, you can’t avoid small files. Our CloudTrail repository, for example, has 4,601,675 files as-of this morning, 44% of which are under 1,000 bytes long. In this post, I develop a Lambda-based data pipeline to aggregate these files, storing them in a new S3 location partitioned by date. Along the way I call out some of the challenges that face such a pipeline.

Using the JetBrains AI Assistant from WebStorm

This article logs my experiments with the AI Assistant, a Generative AI service from JetBrains that keeps you in the IDE, asking questions of an expert chatbot. The service provides a pane that is docked alongside of your coding tools, so you don’t have to keep jumping out to Google to grab a code snippet. It also provides some refactoring features as well. Read on for more information.

Android: The Next Generation of Accessible Apps for the Enterprise

The Continuing Mission In the evolving landscape of Android App development for the Enterprise, there is an aspect that often takes a back seat – accessibility. In brief, this blogpost will cover: Why it is important to prioritize accessibility as a fundamental aspect of the development process, particularly for large-scale and complex organizations What accessibility features and APIs are available to the modern app developer, along with a quick discussion of some common accessibility problems we often see How an…

Data Engineering is more SRE than SQL

Following my post about the Chariot Data Engineering interview, I received some comments along the lines of “wait, you don’t test their SQL skills?!?” Actually, we do: after loading up the test data into Redshift, the candidate creates three progressively difficult queries. But by then, I’m pretty sure they’ve got the skills we need, because in my experience, SQL is only a small part of a Data Engineer’s job. Site Reliability Engineering (SRE) originated at Google, and focuses on “improv[ing]…

How can we help your company with your development needs?

Contact Us