Ticket sales CLOSED!

Sunday 11:30 a.m.–11:50 a.m. in Terrace

Python and Hadoop: Big Data Application Development with PyCascading

Craig Hawco

Audience level:
Intermediate

Description

Big Data is not typically an area talked about when working with Python. We'll discuss some of the options out there, considerations about interfacing with the rest of your data solution, and the advantages and shortcomings of working with Python in the Hadoop ecosystem.

Abstract

MapReduce

What is MapReduce? Where did it come from? We'll talk about the origins of this computational model, where it came from, and what existing implementations are available to us.

Framework Options

Once we've established what the platform options are, we'll discuss some of the framework options. What APIs can I code against? What are they like? How well do they play together?

Overview of Cascading

Next we'll discuss what Cascading is, and why it represents a different computational model to work with when writing data processing applications. We'll discuss some of the options out there, and how they work together.

PyCascading

Finally, we'll go over writing some sample MapReduce applications in PyCascading, how it compares to other Cascading APIs, and what needs improving.

Get in touch