Big Data Buzz

Black and white photo overlaid with graphics depicting the relative size of a gigabyte, terrabyte and petabyte
Source: Stefanobe, Flickr

Have you been hearing the term Big Data bandied about a lot of late? Before you dismiss it as yet another buzz word, I suggest you take some time to learn why Big Data is such a big deal.

Big Data refers to the flood of various types of digital information generated daily. This information ranges from tweets to online transactions, from temperatures to traffic congestions. Big Data is often "unstructured," meaning it's simply a collection of words, numbers, streams from digital sensors, images, dates, etc.

On its own, this information may seem meaningless. However, once organized and/or interpreted through techniques like data mining, text analytics, natural language processing and so forth, patterns in these mountains of data can be identified and employed to answer myriad questions.

The newfound ability to utilize computers to collect, organize, analyze and inform decisions based on petabytes of data has been heralded as revolutionary. Check out the following books and decide for yourself if we are in the midst of a revolution or a control-obsessed, "bizarre digitized version of the Enlightenment."

By examining phenomena such as the ability of Google engineers to detect the spread of the flu virus more efficiently than the Center for Disease Control by tracking flu-related search queries, and the ability of retailers to predict a pregnancy based on a woman's purchases, Big Data author Viktor Mayer-Schönberger discusses the possibilities and perils of data mining.

In Dataclysm, Christian Rudder—the creator of the OkCupid dating site—uses data from Twitter, Facebook, Reddit, OkCupid and other sites to look principally at three areas: "the data of people connecting," "the data of division" and the data concerning "the individual alone." Further, Rudder uses data to explore the divisive topics of racism and the loss of privacy.  This title is also available as an eBook, downloadable eAudiobook and audiobook CD.

Aiden Erez and Jean-Baptiste Michel describe their research involving the text mining of over 30 million digitized books in Uncharted. Through their digital detective work, the two discover trends in English word and phrase usage, censorship and fame. The authors also comment on the copyright and privacy issues encountered while developing their tool, the Ngram Viewer.

We welcome your respectful and on-topic comments and questions in this limited public forum. To find out more, please see Appropriate Use When Posting Content. Community-contributed content represents the views of the user, not those of Chicago Public Library