Friday, March 18, 2011

Data processing open-source frameworks

A list of open source frameworks which can be used for data processing and data mining

Amplify’d from
·         HDFS – A distributed file system that provides high throughput access to application data
Hadoop Eco-system
Pig – A high – level data-flow language and execution framework for parallel computation
·         ZooKeeper – A high – performance coordination service for distributed applications
·         Hive – A data warehouse  infrastructure that provides data summarization and ad hoc querying
Mahout – A scalable machine learning and data mining library
Hbase – A scalable, distributed database that supports structured data storage for large tables
Avro – A data serialization system.
MapReduce – A software framework for distributed processing of large data sets on compute clusters.
Chukwa – A data collection system for managing large distributed systems
Hadoop Common – The common utilities that support the other Hadoop sub – projects

No comments: