When you need to analyze terabytes of data, this book shows you how to do it efficiently with Pig. Author Alan Gates is co-founder of Hortonworks and an original member of the engineering team that took Pig from a Yahoo! Labs research project to a successful Apache open source project. This second edition, updated with programming examples, provides comprehensive coverage on key features such as the Pig Latin scripting language and the Grunt shell.
Author: Alan Gates and Daniel Dai
Date: November 2016
Audience: Data programmers
Category: Data Science
- Pig's data model, including scalar and complex data types
- Write Pig Latin scripts to sort, group, join, project, and filter your data Use Grunt to work with the Hadoop Distributed File System (HDFS)
- Build complex data processing pipelines with Pig's macros and modularity features
- Embed Pig Latin in Python for iterative processing and other advanced tasks
- Use Pig with Apache Tez to build high-performance batch and interactive data processing applications
- Create your own load and store functions to handle data formats and storage mechanisms.
Follow @bookwatchiprog on Twitter or subscribe to I Programmer's Books RSS feed for each day's new addition to Book Watch and for new reviews.
To have new titles included in Book Watch contact BookWatch@i-programmer.info