Media Giant Migrates From Bare Metal to AWS

by
Tags: , , ,
Category:

A global media and technology company based in Philadelphia serves tens of millions of customers.

Recommendation engines, machine learning and artificial intelligence are powered by algorithms. A lot of it is now happening in the cloud. A Chariot team led by Eric Snyder is helping to move and process all that data. They’re migrating it to the Amazon Web Services cloud. The data went from relatively inflexible servers to an elastic environment that scales up or down as needed.

The group that Eric works with is responsible for personalizing the customer experience, using a recommendation engine to predict or infer what media to serve them. This kind of algorithm Is well established in ecommerce, and the big media players are now catching up. There are a lot of commonalities across ecommerce businesses, with similar technology platforms. While ecommerce shares a lot with large multiple system operators, much of the tech is very specific to media companies.

Higher Profit, Higher Engagement Through AI

Some internal groups at the MSO are looking for higher revenue; others are looking for more interaction in the form of more tunes (when a viewer tunes into something) and length of engagement. They’re also looking to decrease customer churn. There are many different metrics to optimize for. Eric said, “We produce different models. and then we need to prove that these models are correct. Will they be beneficial or not?”

The Chariot team built a platform to test models, and prove statistically whether or not the changes work. “We can measure a metric, let’s say revenue or tune count, and then we can prove that new recommendations are beneficial.”

Migrating From Bare Metal to AWS

Bare metal servers are in-house equipment to house data centers. They’re not necessarily private cloud servers. Bare metal is more like a company’s own installation and allocation of servers, provisioning and configuration.

The media company has been using a combination of in-house, virtual and cloud servers, but a few years ago, a big transition occurred, with the goal of moving away from bare metal servers. As a result, a lot of groups in the company are moving to AWS.

It’s a growing trend in business. Eric said, “The reason for the move to the cloud is more about the services provided by AWS. A lot of them are quite impressive, and are quite good, but it’s generally related to how flexible you can be.” When you’re working with bare metal servers, there’s a lot of administrative overhead, and resources are allocated for one specific thing for a long period of time. With AWS, said Eric, “we’re able to create resources on the fly as needed. That helps us a lot and it really helps the business quite a lot too. They can get a lot of what they need much more quickly.”

By Tomas Knopp from the Noun Project

What About Security in the Cloud?

As a software engineer building for the cloud, Eric said that it’s important to avoid assumptions that you might have previously made. “You must assume that at any point, any part of your infrastructure can disappear, or go wonky, for lack of a technical term. Your architecture has to reflect that, so you build redundancies into the cloud that maybe you didn’t before.” If a developer wanted additional resources to fail over to, in the past that might’ve been too expensive or too difficult. But with the cloud it’s easy.

“Security in the cloud is quite complicated,” said Eric. AWS in particular takes it quite seriously. There are many different layers, in terms of security groups, roles and different security policies that you can apply.” The most difficult transition to make with any cloud provider is to go through new security protocols and security architecture. And that’s a good thing. You can’t get away with deploying insecure solutions. “It’s more obvious when you’re doing something insecure in the cloud.”

Reducing Dependence on MapReduce

Most of the processes that the media company had in place were built around MapReduce, which as of a year or two ago was an old technology. “It works, it’s reliable, but there were issues with it in terms of how much data they could process, how often, the cost to do so, the flexibility and the amount of code that had to be written. I think it was fairly obvious to everyone that they needed to change,” said Eric.

“They could do what they needed to do. It just required a lot of work. It took longer to do things. So consequently, the platform didn’t evolve as quickly as it does now.” With cloud technologies, the business comes to the team with a request and the team is able to fulfill that request in a matter of weeks instead of months. Time to market has experienced an amazing improvement.

Massive Amounts of Data

MapReduce, when it first was introduced, was a revelation. It allowed teams to work with huge volumes of data. “If you consider a large company providing entertainment services and a nation full of people tuning in to watch something prerecorded or live, on TV or via mobile app, they’re all events that happen and all those events flow through our platform.” There are huge top of the hour spikes of data. When customers do searches and tune into programs, the team uses those signals to help improve the experience.

“We keep track of what’s trending in real time. When customers see what’s trending now, that’s real time and that’s what we do. We collect this data and we massage it and we aggregate it and we use it for input into models to make inferences as to what customers want to watch.”

Collecting signals is almost a pure engineering exercise. The team builds up the platform to collect the data, and then massages and filters and pre-processes that data to generate input to various models. Some of these models are based on machine learning to create inferences and produce recommendations. “We can say, based upon this person’s usage history and the inferences we get from the model we built, we need to resort the programs to show customers the ones they’re most likely to watch.”

Real Time A/B Testing

One of the challenges of building a model is that it’s not concrete imperative programming where if you do X and Y, you’re going to get Z. “You add signals to your model. You tweak weights of different things. You don’t quite know what the results are going to be. You tune parameters, often called hyperparameters, until the results look the way you want.” And that’s one of the reasons the team does a lot of testing, measuring things like precision and recall. Engineers can determine that when a piece of content been recommended, if a viewer would have watched it. “That’s how we know if we’re on the right track or not. We build as many different models as we can, and we also do A/B testing.”

A customer might be participating in some of these A/B tests. They wouldn’t know it, but they might be receiving the results of an algorithm sent to 10,000 or 100,000 other subscribers.

The introduction of automated testing has been a revelation to the media company, which had previously done testing manually. Now, the business has a clean user interface to set up different campaigns and A/B tests, and affect the user experience almost instantly.

Eric Snyder on Talk Tech To Me, a podcast by Chariot Solutions
Listen to Eric Snyder talk about this cloud migration on Chariot’s new podcast series Talk Tech To Me.

Lessons Learned in Cloud Deployment

Engineers worry about being locked in to a specific vendor solution. “AWS, for example, offers a lot of services that are unique to them. They’re early pioneers of the serverless concept. They developed what they call lambdas, but it’s essentially managed functions that you can run in the cloud. You’re sort of beholden to AWS to use them.

It’s not something that’s easily shifted from one platform to another,” said Eric, who keeps a close eye on the AWS bill, which is a big difference for developers. “When everything operated on bare metal, if somebody spent the money then that was it, you’re done. But now, it’s more of an ongoing expenditure. So you know day by day, hour by hour, how much money you spent doing what.”
The team gets reports and emails that indicate what it spent money on. As a result, the developers might implement solutions that are perhaps not as trendy, but are more portable. “We’ve seen costs rise and fall based on what we’re doing. If one day, we have to pull everything out and move it in-house again, we need to be able to do that.”

At one point during the project, the team decided to implement Amazon Redshift, a data warehousing product. “It’s a great product. It really works, but it didn’t quite fit our needs. Given the costs, we determined that it wasn’t really worth it. We probably could have decided that earlier on in the process. But because we had invested a certain amount of time getting it set up, we probably stuck with it longer than we needed to. It’s important to constantly evaluate and not be afraid to pull the plug.”

Developers Keep Track of Spending

With processes and platforms billed by the minute, the hour, or the second, you pay for resources used. “It’s almost what I imagine computing was like in the 1970s, sharing mainframe time and paying for every CPU cycle,” said Eric. “You’re sharing all of Amazon’s resources. Therefore, in order to make a profit and distribute the resources, Amazon charges a certain amount based on what you use or the amount of time that you use it. So you are back to watching what you’re using. It’s like we’re all on a toll road now. These streets aren’t free anymore.”

Eric says the key to cost efficiency is planning. When developers make use of AWS resources to release a feature in three weeks, and that new feature is a money maker, it’s totally worthwhile.

Listen to the full Podcast