What is the “big data problem”?
“On the night of July 9, 1958 an earthquake along the Fairweather Fault in the Alaska Panhandle loosened about 40 million cubic yards (30.6 million cubic meters) of rock high above the northeastern shore of Lituya Bay. This mass of rock plunged from an altitude of approximately 3000 feet (914 meters) down into the waters of Gilbert Inlet (see map below). The impact generated a local tsunami that crashed against the southwest shoreline of Gilbert Inlet. The wave hit with such power that it swept completely over the spur of land that separates Gilbert Inlet from the main body of Lituya Bay. The wave then continued down the entire length of Lituya Bay, over La Chaussee Spit and into the Gulf of Alaska. The force of the wave removed all trees and vegetation from elevations as high as 1720 feet (524 meters) above sea level. Millions of trees were uprooted and swept away by the wave. This is the highest wave that has ever been known.“ (quoted from http://geology.com/records/biggest-tsunami.shtml)
Now lets use our imagination a bit, and pretend we’re on a digital world, and that an even bigger wave can be seen on the horizon, only that the wave is made up of 1’s and 0’s. That’s the current status of information on the net right now.
A huge wave of data is being generated every second, ranging from user generated information such as tweets, status updates, uploaded pictures, blog posts, comments, text messages, e-mails and so on to machine generated data, like server access logs, error logs, transaction logs, etc.
And that’s not even the problem, the problem is that we need to start thinking in terms of TB or even PB of information, billions of rows instead of millions of them in order to be able to handle this big wave that’s coming.
























