Apache™ Hadoop™

From Oxxus Wiki

Jump to: navigation, search


Apache™ Hadoop™ represents software library that offers a framework for distributed processing of large data sets across clusters of computers using a simple programming model. It's developed under the Apache open source community and can handle distributed processing at single server to several hundred servers, each having ability of local computation and storage. All is set within clusters with emphasis on detecting and handling failures at the application layer rather then relaying on the hardware.

Hosting Hadoop is easy with java VPS hosting.

Entire project, which is presented in this document, is based on several sub-projects.

Hadoop Common: The common utilities that support the other Hadoop subprojects.
Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data.
Hadoop MapReduce: A software framework for distributed processing of large data sets on compute clusters.

Hadoop Common

This subproject contains common utilities needed for support the other Hadoop subprojects. Several requirements have to be met in order to install run it correctly. Java 1.6 has to be installed along with ssh services. Latest release of Hadoop subprojects can be downloaded from one of the Apache mirrors.

As the release comes within archive, it has to be unpacked at desired destination folder. Once unpacked startup script will offer three supported modes:

- Local (Standalone) Mode
- Pseudo-Distributed Mode
- Fully-Distributed Mode

By default, Hadoop will run as a standalone service with single Java process. It can be started at single node but with several Java processes which will provide Pseudo-Distributed Mode. Finally, most used one is Fully-Distributed Mode which works on several hardware nodes where each of them will have several Hadoop daemons with several Java processes. Detailed instructions as well as configuration examples are available from Hadoop's official pages.

Hadoop Distributed File System (HDFS™)

It's primary distributed storage used by Hadoop applications. It consists of NameNode that manages the file system metadata and DataNodes that store the actual data.
Description details can be viewed at official pages.
The HDFS™ can work at single node as well as cluster maintained nodes.

Hadoop MapReduce

Hadoop MapReduce is a framework offering easy programming interface for applications that handle huge amount of data on multiple hardware nodes maintained as clusters. These features are enabled by splitting the input data sets into separate independent tasks running in completely parallel manner. Monitoring and re-executing failed tasks is fully maintained by the framework. Framework consists of two main modules, single master JobTracker and one slave TaskTracker per cluster-node, where master shedules the jobs' component tasks on the slaves, monitoring them and re-executing the failed tasks and slaves executes tasks as instructed by the master. Please take a look at the Examples.

Many companies and organizations are acknowledging Hadoop features uses Hadoop for their research and productions projects.

Contact About Us Support Network Servers Java Hosting Oxxus.net Order Now! Dedicated Servers VPS Hosting Tomcat Hosting Java Hosting Money Back Guarantee Privacy Policy Oxxus.net Terms of 
Service Contact About Us Servers Networks Support Domain Names SSL Certificates Java Wiki Tutorials E-learning 
Platforms