Hadoop Design Axioms
Apache Hadoop is a simple coherent and systematic model designed to stream and process large data sets across commodity cluster machines. Hadoop is simple core, modular and extensible. Design run on commodity cost effective hardware. Each machine hold independent local storage data set and perform computation only with data available in that concern data set. By bringing computation closer to data, the latency and network bottleneck can be avoided.
HDFS (Hadoop Distributed File System) will split up the data across cluster local big data store. Store and process huge amounts of data in terabytes and petabytes across the network. Portability across heterogeneous software and hardware has assist in progress and wide adoption of HDFS as a platform of pick for large set of applications.
100% complete parallelism was achieved by Hadoop. Apache Hadoop MapReduce (MR) programming model will process the data in parallel by dividing the work and aggregating the result ideology. MapReduce can independently schedule, process and aggregate. Failures are handled independent thereby achieving acceptable fault tolerance. Thus failure is normal, expected, heal self and manageable. Apache Hadoop is an action-packed of elegant storage, speedy performance, and processing scale linearly
Speculation behind Hadoop Title
Powerfulreliable, efficient and easy to use Hadoop was created by Doug Cutting as an open source implementation of Google GFS and MapReduce. Hadoop was licensed by Apache. Doug usually get inspired from his family while naming his software discoveries and main concern was shown to the fact that no similar name already exist in web domain. The widely used text search library “Apache Lucene” was taken from Doug’s wife middle name and her maternal grandmother’s first name. His son used frequently the word “Nutch” and named his yellow stuffed elephant as Hadoop. During the hunt of word, Doug used those catchy word for naming software projects.