Twenty Years of Big Data
More, cheaper, faster: our own Keith Gregory recounts the changes in big data, data storage, and data engineering over the last two decades.
More, cheaper, faster: our own Keith Gregory recounts the changes in big data, data storage, and data engineering over the last two decades.
A well-designed data strategy is critical to success. Here are 3 philosophies to help you design an optimal data strategy for your business.
Amazon Athena is a service that lets you run SQL queries against structured data files stored in S3. It takes a “divide and conquer” approach, spinning up parallel query execution engines that each examine only a portion of your data. The performance of these queries, however, depends on how you consolidate and partition your data. In this post I compare query times for a moderately large dataset, looking for the “sweet spot” between number of files and individual file size.
Our CMO Tracey-Welson Rossman sits down with Leslie Richards, the General Manager of SEPTA, to discuss the extensive role of data in public transit.
In this interview, Lanaya Nelson from Motion Insurance discusses how harnessing drivers’ telematics and GPS data is disrupting the auto industry.
Check out our YouTube playlist to watch all the talks from Emerging Technologies for the Enterprise 2020. Abstract As massive amounts of new geospatial data are collected, it is increasingly challenging to search and find data of interest. New upcoming NASA missions, such as NISAR and SWOT will be generating tens of terabytes a day, … Read More
Check out our YouTube playlist to watch all the talks from Emerging Technologies for the Enterprise 2020. Abstract In this talk we look at the challenges of making geospatial data accessible and rapidly consumable in disaster response scenarios. The wide variety and large volume of commercial and public data available in AWS coupled with scalable … Read More
This talk will review two common use cases for the use of captured metric data: 1) Real-time analysis, visualization, and quality assurance, and 2) Ad-hoc analysis. The most common open source streaming options will be mentioned, however this talk be concerned with Apache Flink specifically. A brief discussion of Apache Beam will also be included in the context of the larger discussion of a unified data processing model.
Apache Spark is one the most popular general purpose distributed systems in the past few years. Apache Spark has APIs in Scala, Java, Python and more recently a few different attempts to provide support for R, C#, and Julia. This talk looks at Apache Spark from a performance/scaling point of view and the work we … Read More
I was lucky enough last week to attend PHLAI, a Comcast-sponsored conference on machine learning and artificial intelligence. The dreary weather did not dampen our spirits as practitioners and business stakeholders met to discuss one of the most important trends in our lifetime.