Google has done more for the world with ngrams

Data is valuable asset for a company in the Internet world. With data of users, a company can gain lots of benefits. They can push specified ads to users by analyzing user behaviors, they can even sell the data to third parties. Data is very important for a company's success, so some companies will keep their data secret in order to gain advantages over competitors. However, Google seems do it in another way.

Google shared their ngrams text corpus publicly, which basically contains valuable information about all the books they scanned and search queries users made. Using this data, researchers or competing firms can create new natural language processing applications. This data is very valuable and sort of unique at this scale, so why does Google give it away for free?

Jon Orwant, a Google Research manager shared the reasons behind this decision:

  1. It would help researchers conduct experiments they couldn't perform any other way.
  2. It would help draw attention to an ancillary benefit of scanning millions of books.
  3. It would help others improve their NLP applications.
  4. Why not share?

Jon is also very pleased with this decision.  He says Google makes many charitable donations; hopefully this can be viewed as an act of charity where the medium is not money but data.

While in contrast, most companies will keep their data privately which they spent much effort on gaining. There are no obligations for them to share these valuable data publicly. However, we still hope that more and more companies will open the door for data sharing to benefit the world.



