The Science of Big Data – Dec. 13

h2. Event

p. The Science of Big Data – Dec. 13

h3. Details

p. _This is the re-scheduled event, which is now happening on December 13, 2012._

p. In the last few years, we’ve seen a huge explosion in the amount of data being tracked and stored. Data from purchases, web site browsing patterns, job data, climate and financial statistics, and much more. Your social media interactions are also available from Facebook, Twitter, and other sources. Even governments have exposed data, for example the Open Government Initiative.

h2. Learn about Data Science

p. In this Chariot Solutions event, we turn our focus to the science of processing and understanding huge amounts of data. Join Stuart Sierra, Cliff Moon, Mark Headd and Tom Santero as they show you tools and techniques for accessing data using techniques like Map/Reduce (and the Hadoop open-source project), NoSQL databases such as Riak, and even how to deal with streams of operational data and reacting to network and system outages in real time.

p. Register today and learn about this emerging field of data analysis, trending, and discovery.

h2. Sessions

h3. Breaking Down Big Data with Datomic

_Stuart Sierra, Relevance_

p. Every database technology is a product of its time, optimized for a particular hardware profile. Today, “Big Data” is often synonymous with the Map/Reduce architecture and Hadoop but, as technology continues to evolve, cost/performance tradeoffs are changing. This talk will highlight some recent trends in data storage technology and distributed systems, with a focus on how Datomic takes advantage of those trends. Datomic is a new non-SQL database featuring ACID transactions, distributed storage, and a unique data model that encompasses change over time.

h3. Weathering The Storm

_Cliff Moon, Boundary_

p. It’s 3am and you are woken up by the steely voice of a pager duty robot. The ingress point for your data ingest is failing. You find that it’s interacting poorly with downstream components, causing cascading failures that prevent full recovery. How do you recover when the firehose never stops? How do you keep it from happening again? We’ll discuss flow control issues with high scale data processing systems and how they can be mitigated using some of the tools that the OS and network give us.

h3. Adventures with Eventual Consistency

_Tom Santero, Basho_

p. There are many reasons you would want to run an eventually consistent database (such as Cassandra or Riak) in production, but a knowledge of the underlying system is essential. Unfortunately, many of the concepts surrounding consistency in distributed systems, such as the CAP theorem, are often talked about but largely misunderstood in the general public. While this talk is not another explanation of the CAP theorem, Tom will illuminate its nature from an architectural point of view and then go on to discuss the challenges and considerations one must make when designing applications with concurrency in mind.

h3. Big Data in the Big City

_Mark J. Headd, Chief Data Officer, City of Philadelphia_

p. The way that governments view the vast stores of data they collect and maintain is changing.

p. Through open data and transparency programs, governments are opening up their data sets to outside developers, civic activists, academics and journalist – creating new opportunities for application development, research and policy analysis.

p. This collaboration with outside data consumers is creating unique and unexpected uses of government data, allowing for an unprecedented view into how governments are performing and creating new ways of analyzing social policies.

p. This “open data movement” is also changing the way that government’s themselves think about their data. It’s helping to break down the traditionally siloed and parochial view among government departments about how data is used. Government managers are now looking across the enterprise (and across governments) for opportunities to collaborate with each other, and are starting to think strategically about their data and how it is used to inform policy decisions.

p. This session will provide an in depth analysis of the technology behind open data applications and provide technical details on some of the most compelling applications being built with government open data.

h3. Hadoop at Comcast: a case study

_Andrew Oswald, Chariot Solutions_

p. From system conception to processing trillions of bytes a day, this talk will focus on challenges faced, how those challenges are being solved, and how your organization can benefit from lessons learned.

h2. Speakers

h3. Mark Headd

_CDO, City of Philadelphia_

p. Mark Headd is a writer, speaker, teacher and thought leader on communication technologies and open government. Self taught in programming, he has been developing telephone, mobile, speech recognition and messaging applications for almost 10 years.

p. In August, 2012, the Nutter Administration selected Mark to become the City of Philadelphia’s first Chief Data Officer, to lead Mayor Nutter’s open data and government transparency initiatives.

p. Mark has worked for technology companies from the Delaware Valley to Silicon Valley. He previously worked as Director of Government Relations at Code for America, culminating a period of almost 2 years of collaboration with the organization on open government and civic hacking projects around the country.

p. Mark previously served in government, working for three years as the chief policy and budget advisor for the State of Delaware’s Department of Technology and Information. He has also served as Director of the Delaware Government Information Center, as Technology Adviser to former Delaware Governor Thomas R. Carper, and in the New York State Senate as a budget and finance analyst.

p. For the last several years, Mark has been active in the OpenAccessPhilly initiative which is focused on encouraging citizen engagement, digital inclusion and technology-driven innovation in Philadelphia. He has been active in the support and promotion of the data portal, and helped in organizing the recent OpenData Race.

p. Mark has built open government software applications for the District of Columbia, the Sunlight Foundation, the New York State Senate, and the cities of New York, San Francisco, Toronto, Baltimore and Philadelphia. He is an organizer and participant in civic hacking events across the country, including Philadelphia and Baltimore.

p. He holds a Master’s degree in Public Administration from the Maxwell School of Citizenship and Public Affairs at Syracuse University, and is a former adjunct instructor at the University of Delaware teaching a course in electronic government.

h3. Cliff Moon


Cliff Moon is Founder and Chief Technical Officer at Boundary. Prior to Boundary, Cliff was a lead engineer for Powerset (natural language search engine acquired by Microsoft) where he was instrumental in the design, implementation, launch, and operation of many of the company’s production services. Cliff is an active contributor to open source projects, developing the first open-source implementation of Amazon Dynamo and originating the Dynamo Framework. Cliff is an active and well-regarded member of the NoSQL, Scala, and Erlang communities.

h3. Tom Santero


p. Tom Santero (@tsantero) joined Basho Technologies in March 2012. Prior to Basho, Tom spent several years in the finance industry in equity sales and trading, which eventually lead him to build automated trading systems in his spare time. As Technical Evangelist at Basho, Tom is tasked with educating engineers and architects on all things Riak, supporting the Riak community and hacking on various ad-hoc projects. When he’s not working, Tom can usually be found somewhere in Brooklyn, discussing technology and enjoying a good beer.

h3. Stuart Sierra


p. Stuart Sierra is a developer at Relevance, a member of Clojure/core, and the co-author of Practical Clojure (Apress, 2010). He is heavily involved in the development of Clojure and has written numerous open-source libraries. Prior to 2010, he worked at Columbia Law School on AltLaw, a groundbreaking open-source search engine for legal scholarship, making use of Hadoop, Lucene, and semantic web technologies. Stuart lives in New York City.

h3. Andy Oswald

_Chariot Solutions_

p. Andrew Oswald has been a consultant at Chariot Solutions for seven years, the past two of which having been spent helping define Comcast’s Video on Demand “big datastores presence. He’s designed numerous distributed systems along the way, making pragmatic use of: datastores such as riak and redis, technology agnostic messaging through amqp (rabbit), and implementing polyglot system infrastructure in Erlang/OTP and Java (particularly java.util.concurrent classes).

p. When not strictly focused on work related activities, Mr. Oswald enjoys playing guitar, studying data science, relaxing with his wife, and being entertained by their gray tabby, Olive.

h3. Josh Angotti


p. Josh Angotti joined Comcast in 2005 and became a product manager for the Video On Demand (VOD) platform in 2006. As a Director of Product Management he converged Comcast’s widely dispersed VOD networks, improved centralized administrative tools, and established a unified environment by which Comcast can reach customers more efficiently and with more predictable results. Presently Josh is framing how Comcast will leverage the centralized VOD system and apply the knowledge gained to improve content offerings, navigation options, and timing of program availability.

h2. Event Details

December 13, 2012
8:00 AM – 3:30 PM

Quorum, University City Science Center
3711 Market Street
Suite 800
Philadelphia, PA 19104