| Hive on Hadoop for MongoDB |
| Written by Kay Ewbank | |||
| Thursday, 22 August 2013 | |||
|
There’s a new version of 10gen’s MongoDB Connector for Hadoop with added support for Apache Hive and incremental MapReduce jobs.
The MongoDB Connector for Hadoop presents MongoDB as a Hadoop-compatible file system so that real-time data from MongoDB can be read and processed by Hadoop MapReduce jobs. It examines the MongoDB collection and calculates a set of splits from the data. Each split is assigned to a node in the Hadoop cluster, and in parallel, Hadoop nodes pull data for their splits from MongoDB (or BSON) and process them locally. Hadoop then merges the results and streams the output back to MongoDB or BSON. The major changes to the new version start with the Apache Hive with SQL-like queries across live MongoDB data sets. Hive is a query engine for Hadoop that provides an alternative to writing MapReduce jobs for analyzing Hadoop Distributed File System (HDFS) datasets. Using Hive with MongoDB won’t be completely straightforward; some MongoDB data types such as ObjectID don’t have direct matches in Hive, and it may be tricky to work out how to express field mappings between Hive fields and MongoDB fields so that all cases are handled correctly because of the different underlying data models. |
Kotlin 2.3 Improves Swift Interop 27/11/2025 Kotlin 2.3 is available now as a release candidate. The new version adds a new checker for unused return values, and changes to context-sensitive resolution. The release candidate adds support for Jav [ ... ] |
Build AI Apps with MCP Servers With DeepLearning.AI 28/11/2025 A new course, thanks to Andrew Ng and his partnership with Box, that shows how you can leverage MCP servers to offload otherwise laborious and custom-made work. |
More News
|


