Big Data and the Internet of Things

I recently wrote about the amazing amount of data being produced by people on the internet or mobile web. For example, every minute, YouTube gets 48 hours of video uploads, and nearly a quarter of a billion emails are sent… yes that’s what people do every minute.

However, that pales in comparison to the amount of data that will soon be generated by things – sensors, machines, monitors, switches and so on. Scroll down the infographic (from 2011) below, from CISCO, to see the immensity of big data that will soon be flowing from the Internet of Things.

As you can see, when farmers attach health monitors to cattle, each one sends 200MB of data per year. Apparently the total number of cattle in the US was 89.3 million. If every head of cattle had such a monitor that would result in over 16 petabytes of data collected annually.

A petabyte is about a billion megabytes. That’s big data.

Another example on the graphic, and written about last year, is that we could track every heartbeat of every person, using wearable heart monitors.

When we used to talk about “Data” (with a capital D) we might imagine research data from scientific endeavors, or corporate data tracking sales orders or factory output. Data was tracked because it was intrinsic to the activities that are central to our daily toil. Now data is extrinsic, coming from outside – whether generated by people (YouTube videos and emails) or machines (energy sensors, health sensors, traffic sensors).

The fact that we are interested in, and able to utilize, data coming to us from outside our immediate sphere of influence is what defines the age of big data.


Lots of data, all the time

This infographic is already, no doubt, out of date. It is from last June, but was probably out of date in July! People ask where big data comes from. This is a partial answer – as this is “just” data generated on the web and related web-facing activity. However, it starts to give you a feeling about where all this big data (data growing fast than Moore’s law) is coming from.

  How Much Data is Created Every Minute?
Infographic by Visual News

Big data doesn’t mean right answers

My friend Judah Levine sent me a link to this great article in InfoWorld “Big data's pitfall: Answers that are clear, compelling, and wrong”.

This reminded me of an article which I really enjoyed from The Atlantic “The Data Vigilante” which is about small data as much as big, but clearly is relevant for the big data world.

All this is to say that big data amplifies the problems of garbage-in, garbage-out, but introduces other more complex problems too. Big data, almost by definition, requires statistical analysis (it’s too big to just look at). How many of us, however, really know enough statistics to know the right analysis to conduct. The chart below from the Psychology Department web page at Muhlenberg College shows some simple examples.

If you read about big data proving this, that, or the other result, can you get a feel for whether any of these problems might be obfuscating the truth?