Sunday 11:30 a.m.–11:50 a.m. in Terrace
Python and Hadoop: Big Data Application Development with PyCascading
Craig Hawco
- Audience level:
- Intermediate
Description
Big Data is not typically an area talked about when working with Python. We'll discuss some of the options out there, considerations about interfacing with the rest of your data solution, and the advantages and shortcomings of working with Python in the Hadoop ecosystem.
Abstract
MapReduce
What is MapReduce? Where did it come from? We'll talk about the origins of this computational model, where it came from, and what existing implementations are available to us.
Framework Options
Once we've established what the platform options are, we'll discuss some of the framework options. What APIs can I code against? What are they like? How well do they play together?
Overview of Cascading
Next we'll discuss what Cascading is, and why it represents a different computational model to work with when writing data processing applications. We'll discuss some of the options out there, and how they work together.
PyCascading
Finally, we'll go over writing some sample MapReduce applications in PyCascading, how it compares to other Cascading APIs, and what needs improving.