Today's Question:  What does your personal desk look like?        GIVE A SHOUT

 BIG DATA


  Is The Cloud Finally Catching Up With Mighty Oracle?

Oracle for years has seemed impervious to cloud computing. First Larry Ellison dismissed it. Then he sort of touted it, his version at least. But all along, Oracle was growing nicely. The industry chatter didn’t seem to matter. Big companies buy big software systems.Something changed this winter.  Oracle’s software license sales limped up just 2% in December, and the company blamed customer budget cuts and fears over the European debt crisis. Sales to Europe, Africa and the Middle East make up a third of Oracle’s revenues. The stock took an instant 8% hit, but perhaps...

27,513 0       CLOUD TREND ORACLE INCORPORATION


  Cleansing data with Pig and storing JSON format to HBase with Pig UDF

IntroductionThis post will explain you the way to clean data and store JSON format to HBase. Hadoop architect experts also explain Apache Pig and its advantages in Hadoop in this post. Read more and find out how they do it.This post contains steps to do some basic clean the duplication data and convert the data to JSON format to store to HBase. Actually, we have some built-in lib to parse JSON in Pig but it is important to manipulate the JSON data in Java code before store to HBase.Apache Pig is data flow language and is built on the top of Hadoop, it helps to process, extract, loading, cleans...

9,020 3       JSON HADOOP ARCHITECT APACHE HBASE PIG UDF


  How Kafka achieves high throughput low latency

Kafka is a message streaming system with high throughput and low latency. It is widely adopted in lots of big companies. A well configured Kafka cluster can achieve super high throughput with millions of concurrent writes. How Kafka can achieve this? This post will try to explain some technologies used by Kafka.Page cache + Disk sequential writeEvery time when Kafka receives a record, it will write it to disk file eventually. But if it writes to disk every time it receives a record, it would not have very good performance. In fact, Kafka has a fantastic design here which is it utilizes the pag...

8,455 0       BIG DATA KAFKA


  Redis Cluster and Common Partition Techniques in Distributed Cache

In this post, I will discuss a few common partition techniques in distributed cache. Especially, I will elaborate on my understanding on the use of Redis Cluster.Please understand that at the time of writing, the latest version of Redis is 4.0.10. Many articles on the same topic have a different idea from this post. This is mainly because, those articles are probably outdated. In particular, they may refer to the Redis Cluster implementation in Redis 3. Redis Cluster has been improved a lot since Redis 4.Common Partition TechniquesHere, we refer to horizontal partitioning, which...

5,497 0       REDIS DISTRIBUTED CACHE CLOUD COMPUTING


  Products born for Cloud

Cloud computing has become increasingly popularity among companies. It greatly saves the investment of infrastructure and training with everything is running on cloud, it also improves the accessibility and flexibility of service provided by companies. With its popularity, many products are born or becoming popular to help build and move the apps to the cloud.Some well known names of these products are Vagrant, Docker/LXC, Chef and OpenStack. These tools can help create, test and deploy applications developed without concerning too much about platform differences. What are the relationship of ...

5,428 0       CLOUD OPENSTACK DOCKER VARGRANT LXC CHEF


  Data governance Challenges and solutions in Apache Hadoop

Do you understand meaning of data governance? This is taken as most critical part of an organization that deals with sensitive data of an enterprise. If organization wanted to know who is accessing their sensitive data and what action has been taken by the viewers then data governance is wonderful solution to consider.In this article, we will discuss on data governance solutions and what are the challenges that are faced by organization during implementation of data governance. We will also discuss on Cloudera Navigator in brief to address the challenges of data governance in Hadoop architect....

5,237 0       HADOOP INTEGRATION HADOOP DEVELOPMENT


  Top 5 Reasons Not to Use Hadoop for Analytics

As a former diehard fan of Hadoop, I LOVED the fact that you can work on up to Petabytes of data.  I loved the ability to scale to thousands of nodes to process a large computation job.  I loved the ability to store and load data in a very flexible format.  In many ways, I loved Hadoop, until I tried to deploy it for analytics.   That’s when I became disillusioned with Hadoop (it just "ain't all that").At Quantivo, we’ve explored many ways to deploy Hadoop to answer analytical queries (trust me – I made every attempt to include it in my day job).&n...

5,164 0       CLOUD COMPUTING HADOOP ANALYTICS


  Make Big Data Collection Efficient with Hadoop Architecture and Design Tools

Hadoop architecture and design is popular to spread small array of code to large number of computers. That is why big data collection can be made more efficient with hadoop architecture and design. Hadoop is an open source system where you are free to make changes and design new tools according to your business requirement. Here we will discuss most popular tools under the category Hadoop development and how they are helpful for big projects.Ambari and Hive– When you are designing a cluster, there is plenty of repetitive tasks take lots of efforts and time. Now Hadoop architecture a...

4,967 0       HADOOP ARCHITECTURE HADOOP HIVE ARCHITECTURE HADOOP ARCHITECTURE AND DESIGN