Hadoop is a open-source software framework distributed under Apache free license which supports data driven distributed applications. This Java language based framework was inspired by Google’s MapReduce and Google File System concepts. It enables applications to work with thousands of nodes and petabytes of data.
Hadoop has multiple sub-projects actively developed and used by many users worldwide. Yahoo is one of the major contributors to the Hadoop project.
Following are some of common Hadoop sub-projects.
- Hadoop Common – Common Utilities
- HDFS – Hadoop Distributed File System
- MapReduce - Framework for Distributed data processing
Other Hadoop related apache products include:
- Avro: A data serialization system.
- Cassandra: A scalable multi-master database with no single points of failure.
- Chukwa: A data collection system for managing large distributed systems.
- HBase: A scalable, distributed database that supports structured data storage for large tables.
- Hive: A data warehouse infrastructure that provides data summarization and ad hoc querying.
- Mahout: A Scalable machine learning and data mining library.
- Pig: A high-level data-flow language and execution framework for parallel computation.
- ZooKeeper: A high-performance coordination service for distributed applications.