Writing An Hadoop MapReduce Program In Python
by Michael G. Noll on September 21, 2007 (last updated: October 19, 2011)
In this tutorial, I will describe how to write a simple MapReduce program for Hadoop in the Python programming language.
Table of Contents:
- Motivation
- What we want to do
- Prerequisites
- Python MapReduce Code
- Map: mapper.py
- Reduce: reducer.py
- Test your code (cat data | map | sort | reduce)
- Running the Python Code on Hadoop
- Download example input data
- Copy local example data to HDFS
- Run the MapReduce job
- Improved Mapper and Reducer code: using Python iterators and generators
- mapper.py
- reducer.py
- Related Links
- Comments (94)