Blog

PostgreSQL Text Search

Introduction A common problem in software development is searching through text documents. For example, if you have a database of recipes, you might want to search by one or more ingredients, or if you have a collection of server log files, you might want to search for all errors that did not come from the database. This type of functionality is called “text search”. There are a lot of text search libraries like Lucene, or applications like ElasticSearch (which is…

Aggregating Files in your Data Lake – Part 1

As I’ve written in the past, large numbers of small files make for an inefficient data lake. But sometimes, you can’t avoid small files. Our CloudTrail repository, for example, has 4,601,675 files as-of this morning, 44% of which are under 1,000 bytes long. In this post, I develop a Lambda-based data pipeline to aggregate these files, storing them in a new S3 location partitioned by date. Along the way I call out some of the challenges that face such a pipeline.

Using the JetBrains AI Assistant from WebStorm

This article logs my experiments with the AI Assistant, a Generative AI service from JetBrains that keeps you in the IDE, asking questions of an expert chatbot. The service provides a pane that is docked alongside of your coding tools, so you don’t have to keep jumping out to Google to grab a code snippet. It also provides some refactoring features as well. Read on for more information.

Android: The Next Generation of Accessible Apps for the Enterprise

The Continuing Mission In the evolving landscape of Android App development for the Enterprise, there is an aspect that often takes a back seat – accessibility. In brief, this blogpost will cover: Why it is important to prioritize accessibility as a fundamental aspect of the development process, particularly for large-scale and complex organizations What accessibility features and APIs are available to the modern app developer, along with a quick discussion of some common accessibility problems we often see How an…

Data Engineering is more SRE than SQL

Following my post about the Chariot Data Engineering interview, I received some comments along the lines of “wait, you don’t test their SQL skills?!?” Actually, we do: after loading up the test data into Redshift, the candidate creates three progressively difficult queries. But by then, I’m pretty sure they’ve got the skills we need, because in my experience, SQL is only a small part of a Data Engineer’s job. Site Reliability Engineering (SRE) originated at Google, and focuses on “improv[ing]…

Leveraging EKS Pod Identity to Inject ASM Secrets: A Step-by-Step Guide

EKS Pod Identity is a feature that enables applications running on EKS to securely access AWS services, such as AWS Secrets Manager, without the need for hardcoding or managing access credentials. Instead, EKS Pod Identity uses IAM roles to grant permissions to pods, allowing them to interact with AWS services seamlessly. In my last post, I showed an example of a pod fetching objects from S3 using pod identity. But let’s create a more real world example: using pod identity…

How can we help your company with your development needs?

Contact Us