Today's Question:  What does your personal desk look like?        GIVE A SHOUT

SEARCH KEYWORD -- Data mining



  Video website in big data era

Big data initially means the large data set which is not able to be analyzed, but later it was derivatized to the method to analyze huge amounts of data in  order to gain great value.This is a form which gradually gets attention, It's difficult to analyze these data and it's also difficult to store these data and it needs some unprecedented way, Now in China many companies use the open source Hadoop distributed data cluster to meet the needs of data statistics. Since we can get segmented d...

   Netflix,Big data,Data mining     2013-04-11 04:20:40

  Top 5 Reasons Not to Use Hadoop for Analytics

As a former diehard fan of Hadoop, I LOVED the fact that you can work on up to Petabytes of data.  I loved the ability to scale to thousands of nodes to process a large computation job.  I loved the ability to store and load data in a very flexible format.  In many ways, I loved Hadoop, until I tried to deploy it for analytics.   That’s when I became disillusioned with Hadoop (it just "ain't all that"). At Quantivo, we’ve explored many ways to deploy H...

   Cloud computing,Hadoop,Analytics     2012-04-17 13:43:26

  Data Scientists and Their Harder Skills than Big Data

The field of data science is often confused with that of big data. Data science is an aid to decision makers in a company with a logical approach.  Who is a Data Scientist?  A Data Scientist reviews a huge collection of data(that may extend to a couple of terabytes of disk space or thousands of excel sheets). This humongous chunk of data is not feasible for being handled, sorted and analyzed by a single person. Here we require the help of data science, and most recently, the field of A...

   BIG DATA     2017-12-13 04:22:55

  What Sort of Projects Can You Work on Using Web Scraping?

Image Source: Unsplash Web scraping is a technique used to extract large amounts of data from websites and save onto your computer in a database or time table format. There are many reasons businesses would use this information for lead generation, SEO, or to better understand their customers.  However, you can do a lot of fun projects in your free time that doesn’t have to be associated with work or a company. Before you start using your programming skills to find and extract inform...

   WEB SCRAPING     2020-09-10 09:16:44

  Why to opt for Hadoop?

Hadoop is a open source that stores and processes big data. The framework is written in Java for distributed processing and distributed storage of very large data. Hadoop is Scalable. It is a scalable platform because it stores and distributed large amount of data sets to hundreds and thousands of servers that operate in parallel. Traditional database systems cannot process large amount of data. But, hadoop enable business to run applications involving thousands of Terabyte data. Hadoop is ...

       2015-09-22 10:17:43

  Google has done more for the world with ngrams

Data is valuable asset for a company in the Internet world. With data of users, a company can gain lots of benefits. They can push specified ads to users by analyzing user behaviors, they can even sell the data to third parties. Data is very important for a company's success, so some companies will keep their data secret in order to gain advantages over competitors. However, Google seems do it in another way. Google shared their ngrams text corpus publicly, which basically contains valuable info...

   Ngram,NLP,Data     2013-12-12 07:56:02

  Exit main thread and keep other threads running in C

In C programming, if using return in main function, the whole process will terminate. To only let main thread gone, and keep other threads live, you can use thrd_exit in main function. Check following code: #include #include #include int print_thread(void *s) { thrd_detach(thrd_current()); for (size_t i = 0; i < 5; i++) { sleep(1); printf("i=%zu\n", i); } thrd_exit(0); } int main(void) { ...

   C LANGUAGE,MULITHREAD,MAIN THREAD     2020-08-14 21:20:04

  Google releases Analytics real time API

According to Tech Crunch, Google finally released its Analytics real time API. Although this feature was launched two years ago, there was no convenient way for webmasters to adjust the data so that they can be viewed properly. Now developers can use the API to get what they want and utilize these data to do what they want to. Developers need to apply for using the API now. Once you get access to this API, then you can search your own real time data and utilize these data as you want to. For ex...

   Google Analytics,API,Real time     2013-08-02 00:05:56

  How does PHP session work?

This article is about how PHP session works internally. Below are the steps : 1. Session in PHP is loaded into PHP core as an extension, we can understand it as an extension. When session extension is loaded, PHP will call core functions to get the session save_handler, i.e interface or functions for reading and writing session data. By default, PHP will handle session data by writing and reading files on the server. But PHP also supplies custom methods for handling session data, we can use sess...

   PHP, session, mechanism     2012-12-28 13:36:49

  How Kafka achieves high throughput low latency

Kafka is a message streaming system with high throughput and low latency. It is widely adopted in lots of big companies. A well configured Kafka cluster can achieve super high throughput with millions of concurrent writes. How Kafka can achieve this? This post will try to explain some technologies used by Kafka. Page cache + Disk sequential write Every time when Kafka receives a record, it will write it to disk file eventually. But if it writes to disk every time it receives a record, it would ...

   BIG DATA,KAFKA     2019-03-08 09:42:57