Cost Optimizing an ML Feature Store

A client recently started building a new machine learning (ML) architecture with a feature store as one of the key pieces. The feature store was already burning through a lot of money on AWS Elasticache and it wasn’t even scaled up in production yet! The project was in danger of being shelved without serious cost reductions so I was asked to take a look and see what could be done.

What is a Feature Store?

Simply put, a feature store is centralized platform for organizing and storing features. To an ML engineer, a feature is any kind of data that can be used as input to a model, usually a set of values. The feature store data is used to train models or used as input into a model to make predictions. A common feature store implementation uses Redis for the storage backend with an API frontend. As more models and features go into production, feature stores need to be able to scale up as well as out to store and serve more data.

AWS Elasticache Needs Cash

For the storage backend, AWS Elasticache was chosen for its ease in scaling up and out when more storage and IO capacity is needed. AWS also makes operational maintenance activities such as version updates transparent in Elasticache. Best of all, scaling and maintenance can be performed without any downtime, which is a killer feature. To minimize latency across regions, the Global Datastore feature can be used used to deploy secondary read-only clusters in other regions.

The ease of standing up and using Elasticache with Global Datastore across three regions is countered by a hefty AWS bill. You pay for every cluster node, node replica, and, of course, data transfer costs. The key to driving down Elasticache costs is to store less.

Investigating our Elasticache Usage

When you use less storage, you need smaller clusters. Smaller clusters means a smaller bill. Easy, right? When I first started diving in, I approached this less like data scientist and more like an accountant:

How many keys are in Redis?
What were the keys and what were the values being stored in this Redis?
How many bytes were being used per key?

My first step was to connect to the cluster with redis-cli and run info to see what the memory and keyspace statistics were. From the results, I saw that there were 30 million keys. At 120GB total storage, some back of the napkin math gets me to about ~4KB per key. The key was a delimited value with a unique identifier and the value was a JSON string.

JSON?

Human readable but it is not the ideal encoding format. JSON was used in the proof-of-concept and it didn’t become apparent until later that it would not scale. At 30 million keys, even dropping a kilobyte per key would add up to big savings. Let’s see if we can do better.

Replacing JSON with Protocol Buffers

About half of the JSON blob is string dictionary metadata and the other half is an array of 200 doubles (64-bit precision real numbers). After a conversation with the team, the string dictionary metadata is not necessary in production. Dropping the metadata was a nice win but we knew that there were more savings to be found.

The other half of the payload, an array of 200 doubles encoded as JSON, weighs in at about 2,700 bytes. My first thought was “what would it look like if the value was encoded as a Protocol buffer?” Protocol buffers (also referred to as Protobufs) are a commonly used binary encoding format that is great for serializing structured data for size and speed.

First, I created a Protobuf definition:

syntax = "proto3";

message Payload {
    repeated double values = 1;
}

Then I populated a sample Protobuf and did some back of the napkin math:

Encoding	Per Key	30 Million Keys
JSON	2,707 bytes	77,447 MB
Protobuf	1,603 bytes	45,862 MB

An array of 200 doubles encoded in Protobuf is 41% smaller than the same array encoded in JSON! Over 30 million keys, it’s a big savings. To validate these storage numbers, I set up a local Redis instance and created a small script to generate a million key/value pairs and then inspected the memory statistics:

Encoding	30 Million Keys
JSON	93,877 MB
Protobuf	57,256 MB

With this test harness set up, I was able to try a few experiments like:

Does the length of the key matter? (Sort of, but not a lot)
Does it matter if we store in Redis as hashes instead of key/value pairs? (Sort of, there may be potential storage efficiencies in key storage here when there are more features)

Less Precision = Less Bytes

Protobuf encoding was a big win but there was more to do. I had another idea: can we round the 64-bit double value down to 32-bits and still have enough precision for the model predictions to be effective? While the team tested that theory, I ran an experiment to see what an array of 200 floats (32-bit precision real numbers) would look like in Redis:

Encoding	30 Million Keys
JSON doubles	93,877 MB
Protobuf doubles	57,256 MB
Protobuf floats	32,308 MB

Switching from doubles to floats has almost halved our storage, which makes sense since they are half the size! Can we push even more? Unfortunately, Protobuf doesn’t have a 16-bit real number representation but it does have variable-length encoding for integer types. If we introduce a quantization step to map our real number floats into integers (and vice versa), Protobuf can perform variable-length encoding for additional space savings. With my test harness, I transformed the double values into integers by multiplying them by 10,000 and rounding the result into an 32-bit integer:

Encoding	30 Million Keys
JSON doubles	93,877 MB
Protobuf doubles	57,256 MB
Protobuf floats	32,308 MB
Protobuf integers	21,322 MB

We’ve gone from 93GB to 21GB which is pretty good! By changing how the feature store encodes the data in Redis, we’ve achieved significant savings. And because there is a frontend API in front of the Redis, we can abstract much of this away from clients so they are none the wiser about what’s going on under the hood.

Conclusion

The team reported back that the model predictions were good at 32-bit precision so it was an easy win to round down to lower precision. A big code change was required to switch from JSON to Protobuf but the end result (dramatic Elasticache cost savings) is hard to argue with. In the end, the feature store made it into production under initial budget estimates.

What did we learn? My take is that the assumptions and choices that we make at the start of development may not necessarily hold up in production. And when those assumptions are wrong, we have to be open to finding new solutions and correct the course.