Today's Question:  What does your personal desk look like?        GIVE A SHOUT

SEARCH KEYWORD -- BIG DATA



  Video website in big data era

Big data initially means the large data set which is not able to be analyzed, but later it was derivatized to the method to analyze huge amounts of data in  order to gain great value.This is a form which gradually gets attention, It's difficult to analyze these data and it's also difficult to store these data and it needs some unprecedented way, Now in China many companies use the open source Hadoop distributed data cluster to meet the needs of data statistics. Since we can get segmented d...

   Netflix,Big data,Data mining     2013-04-11 04:20:40

  Why to opt for Hadoop?

Hadoop is a open source that stores and processes big data. The framework is written in Java for distributed processing and distributed storage of very large data. Hadoop is Scalable. It is a scalable platform because it stores and distributed large amount of data sets to hundreds and thousands of servers that operate in parallel. Traditional database systems cannot process large amount of data. But, hadoop enable business to run applications involving thousands of Terabyte data. Hadoop is ...

       2015-09-22 10:17:43

  Cracking the Data Lineage Code

What is Data Lineage?  Data lineage describes the life-cycle of data, from its origins to how it is manipulated over time until it reaches its present form. The lineage explains the various processes involved in the data flow of an organization and the factors that influence each process. In other words, data lineage provides data about your data.  Data lineage helps organizations of all sizes handle Big Data, as finding the creation point of the data and its evolution provides valuabl...

   BIG DATA, DATA LINEAGE,BUSINESS     2019-08-08 12:41:42

  How Kafka achieves high throughput low latency

Kafka is a message streaming system with high throughput and low latency. It is widely adopted in lots of big companies. A well configured Kafka cluster can achieve super high throughput with millions of concurrent writes. How Kafka can achieve this? This post will try to explain some technologies used by Kafka. Page cache + Disk sequential write Every time when Kafka receives a record, it will write it to disk file eventually. But if it writes to disk every time it receives a record, it would ...

   BIG DATA,KAFKA     2019-03-08 09:42:57

  Why Most of us Get Confuse With Data Quality Solutions and Bad Data?

How to fix this misunderstanding is what Big Data professionals will explain in this post. The C-level executives are using data collected by their BI and analytics initiatives to make strategic decisions to offer the company a competitive advantage. The case gets worse if the data is inaccurate or incorrect. It’s because the big data helps the company to make big bets, and it impacts the direction and future together. Bad Data can yield inappropriate results and losses. Some interesting ...

   BIGDATA     2018-02-21 06:01:35

  Data Scientists and Their Harder Skills than Big Data

The field of data science is often confused with that of big data. Data science is an aid to decision makers in a company with a logical approach.  Who is a Data Scientist?  A Data Scientist reviews a huge collection of data(that may extend to a couple of terabytes of disk space or thousands of excel sheets). This humongous chunk of data is not feasible for being handled, sorted and analyzed by a single person. Here we require the help of data science, and most recently, the field of A...

   BIG DATA     2017-12-13 04:22:55

  Make Big Data Collection Efficient with Hadoop Architecture and Design Tools

Hadoop architecture and design is popular to spread small array of code to large number of computers. That is why big data collection can be made more efficient with hadoop architecture and design. Hadoop is an open source system where you are free to make changes and design new tools according to your business requirement.   Here we will discuss most popular tools under the category Hadoop development and how they are helpful for big projects. Ambari and Hive– When you are designing...

   HADOOP ARCHITECTURE,HADOOP HIVE ARCHITECTURE,HADOOP ARCHITECTURE AND DESIGN     2015-09-17 05:24:44

  Embrace open source

In past few days, there are many tech news which are related to open source. For example, Microsoft enables Linux on its Windows Azure cloud, Facebook open sourced its C++ library Folly and Samsung joined Linux foundation. Now more and more big companies realize the power of open source and are willing to contribute to the open source community. It will benefit not only developers but also these big companies as well.By providing some open source libraries or projects, developer may reduce their...

   Open source,Microsoft,Samsung,Facebook,Linux     2012-06-06 05:37:59

  20 Database Design Best Practices

Use well defined and consistent names for tables and columns (e.g. School, StudentCourse, CourseID ...).Use singular for table names (i.e. use StudentCourse instead of StudentCourses). Table represents a collection of entities, there is no need for plural names.Don’t use spaces for table names. Otherwise you will have to use ‘{‘, ‘[‘, ‘“’ etc. characters to define tables (i.e. for accesing table Student Course you'll write “Student Cour...

   Database design,20 tips,Well defined name,Design pattern     2012-02-07 12:10:48

  Hadoop or Spark: Which One is Better?

What is Hadoop? Hadoop is one of the widely used Apache-based frameworks for big data analysis. It allows distributed processing of large data set over the computer clusters. Its scalable feature leverages the power of one to thousands of system for computing and storage purpose. A complete Hadoop framework comprised of various modules such as: Hadoop Yet Another Resource Negotiator (YARN MapReduce (Distributed processing engine) Hadoop Distributed File System (HDFS) Hadoop Common Thes...

   COMPARISON,HADOOP,SPARK     2018-11-22 07:08:57