Today's Question:  What does your personal desk look like?        GIVE A SHOUT

 HADOOP


  Hadoop or Spark: Which One is Better?

What is Hadoop?Hadoop is one of the widely used Apache-based frameworks for big data analysis. It allows distributed processing of large data set over the computer clusters. Its scalable feature leverages the power of one to thousands of system for computing and storage purpose. A complete Hadoop framework comprised of various modules such as:Hadoop Yet Another Resource Negotiator (YARNMapReduce (Distributed processing engine)Hadoop Distributed File System (HDFS)Hadoop CommonThese four modules lie in the heart of the core Hadoop framework. There are many more modules available over the interne...

2,415 0       COMPARISON HADOOP SPARK


  Cleansing data with Pig and storing JSON format to HBase with Pig UDF

IntroductionThis post will explain you the way to clean data and store JSON format to HBase. Hadoop architect experts also explain Apache Pig and its advantages in Hadoop in this post. Read more and find out how they do it.This post contains steps to do some basic clean the duplication data and convert the data to JSON format to store to HBase. Actually, we have some built-in lib to parse JSON in Pig but it is important to manipulate the JSON data in Java code before store to HBase.Apache Pig is data flow language and is built on the top of Hadoop, it helps to process, extract, loading, cleans...

8,978 3       HADOOP ARCHITECT JSON APACHE HBASE PIG UDF


  Data governance Challenges and solutions in Apache Hadoop

Do you understand meaning of data governance? This is taken as most critical part of an organization that deals with sensitive data of an enterprise. If organization wanted to know who is accessing their sensitive data and what action has been taken by the viewers then data governance is wonderful solution to consider.In this article, we will discuss on data governance solutions and what are the challenges that are faced by organization during implementation of data governance. We will also discuss on Cloudera Navigator in brief to address the challenges of data governance in Hadoop architect....

5,171 0       HADOOP INTEGRATION HADOOP DEVELOPMENT


  Why to opt for Hadoop?

Hadoop is a open source that stores and processes big data. The framework is written in Java for distributed processing and distributed storage of very large data.Hadoop is Scalable.It is a scalable platform because it stores and distributed large amount of data sets to hundreds and thousands of servers that operate in parallel. Traditional database systems cannot process large amount of data. But, hadoop enable business to run applications involving thousands of Terabyte data.Hadoop is Cost Effective.In traditional database management System, it is cost prohibitive to process mass volume...

2,537 0