Basis Hadoop Framework Learning


In my last Blogs we have discussed about What is Big data hadoop and its jobs? Apache Spark and Introduction of Apache Spark SQL.Now in this blog we will discussed about the basic frame work of Hadoop. So lets gets started.

Frameworks:

Hadoop:

Hadoop is basically a software library written in java. It is used for processing large amount of data in distributed environment, which allows developers to setup clusters of computers starting with a single node that can scale up to thousand of nodes.

HIVE

Hive is data warehousing framework that's built on Hadoop. It allows for structuring data and querying using a language like SQL called HiveQL. Developers can use Hive and HiveQL to write complex MapReduce over structured data in a distributed file system. Hive is the closest thing to a relational-database in the Hadoop ecosystem.

PIG

Pig is an application for transforming large data sets. Like Hive, Pig has its own language called pig-latin. Pig Latin allows developers to write complex MapReduce jobs without having to write them in Java.

FLUME

Flume is a distributed service that helps collect, aggregate and move around large log data. It's written in Java and typically delivers files directly into HDFS.

DRILL

Apache Drill is a schema-free SQL query engine for data exploration. Drill is listed as real SQL and not just "SQL-like," which allows developers or analysts to use existing SQL knowledge to begin writing queries in minutes. Apache Drill is extendable with User Define Functions.

KAFKA

Another great tool for messaging in Hadoop is Kafka. Kafka is used as a queuing system when working with Storm.

TEZ

Tez allows for building applications that process DAG (directed acyclic graph) tasks. Basically, Tez allows Hive and Pig jobs to be written with fewer MapReduce jobs, which makes Hive and Pig scripts run faster.

ZEPPELIN

Zeppelin is a Web-based notebook for interactive data analytics. It makes data visualization as easy as drag and drop. Zeppelin works with Hive and Spark (all languages) and markdown

SPARK

A real-time general engine for data processing, Spark boasts a speed 100-times faster than Hadoop and works in memory. Spark supports Scala, python and Java. It also contains a Machine Learning Library , which provides scalable machine learning libraries comparable to Mahout.

Also Read : Top Reasons to learn Hadoop

Comments

Popular posts from this blog

Blue Prism Interview Questions and Answers

Selenium Interview Questions - Top MNC's

Trending IT Technologies in 2019