An application need to access data from different big data sets. Based on need, the characteristic of data file are tuned to fulfill several criteria. Framework need to bestow high collective data bandwidth and calibrate service to many nodes in single cluster. Hadoop can be the best approach to support maximum millions of files in each distinct instance. Constant nourishment need to be provided to keep Hadoop Arch flexible and maintain its core value.
To maintain everlasting presence and be strongest competitors, Apache Hadoop have branched out many subjects to own its brand among new technology buddies in daily IT struggle battle world. Many subprojects of Hadoop are cultivated to provide complementary services. Some enhancement were made on the core field to count up higher level of abstractions. To lessen physical barriers and allow free interaction on a global scale, the various subprojects of Hadoop like Hbase, Ambari, Flume, Cassandra, HCatalog, Zookeeper, Spark, Solr, Oozie, Pig , Tez, Slider, Kafka, Mahout have been launched to wide-awake its multifaceted marketing strategy.
Advancement over technologies
Some of the rapid evolution in data processing platforms and technologies are listed below:
2003: Google File system(GFS) becomes bottom line for today Hadoop HDFS
2004: MapReduce(MR): Simplified Data Processing on Large Clusters Hadoop MapReduce
2005: In support of the search engine “Nutch” project, Doug Cutting and Michael J. Cafarella developed Hadoop based on GFS and MR. Yahoo funded for these Hadoop project.
2006: Yahoo gave the project to Apache. Hadoop was able to sorts 1.8 TB of data on 188 cluster nodes in 47.9 hours
2007: First release of Hadoop was made. Release carry Hbase. Labs were creates for Pig.
2008: In the spirit of innovation, Hadoop delighted the market by sorting 1 TB of data in 209 seconds in 910 cluster nodes compared to former 297 seconds sorting time record. Hadoop become world fastest machine and Hadoop won Terabyte Sort Benchmark. YARN JIRA was opened
2009: Hadoop sorts a petabyte of data. Yahoo used Apache Hadoop to sort one terabyte in 62 seconds. Avro and Chukwa associated with Hadoop Framework family
2010: Hadoop subprojects like Apache Hive, Apache Pig, and Apache Hbase were completed. It have greatly boosted up the storage usage and potential processing power of Apache Hadoop Framework.
2011: Zookeeper Hadoop subproject was finished. Zookeeper help to maintain outline configuration details, synchronizing distributed spot and affording group services.
2012: Some resource management burden was separated from MapReduce. Yarn taken care of such management. Apache Hadoop 1.0 Available
2013: YARN deployed in production .Ambari, Cassandra, Mahout have been added .Apache Hadoop 2.2 Available
2014: Operating securely Data Node without need of root access. Faster wire encryption. Support for Archival Storage. Apache Hadoop 2.6 available.
2015: Accelerate File Output Committer for bigger jobs with huge output files. Ability to extent of running Map/Reduce tasks of a job. Apache Hadoop 2.7 Available
2016: Set up Intra-data node balancer. More than two standby nodes were supported. Release 3.0.0-alpha1 available.