Preface
What are data? This seems like a simple enough question; however, depending on the interpretation, the definition of data can be anything from “something recorded” to “everything under the sun.” Data can be summed up as everything that is experienced, whether it is a machine recording information from sensors, an individual taking pictures, or a cosmic event recorded by a scientist. In other words, everything is data. However, recording and preserving that data has always been the challenge, and technology has limited the ability to capture and preserve data.
The human brain’s memory storage capacity is supposed to be around 2.5 petabytes (or 1 million gigabytes). Think of it this way: If your brain worked like a digital video recorder in a television, 2.5 petabytes would be enough to hold 3 million hours of TV shows. You would have to leave the TV running continuously for more than 300 years to use up all of that storage space. The available technology for storing data fails in comparison, creating a technology segment called Big Data that is growing exponentially.
Today, businesses are recording more and more information, and that information (or data) is growing, consuming more and more storage space and becoming harder to manage, thus creating Big Data. The reasons vary for the need to record such massive amounts of information. Sometimes the reason is adherence to compliance regulations, at other times it is the need to preserve transactions, and in many cases it is simply part of a backup strategy.
Nevertheless, it costs time and money to save data, even if it’s only for posterity. Therein lies the biggest challenge: How can businesses continue to afford to save massive amounts of data? Fortunately, those who have come up with the technologies to mitigate these storage concerns have also come up with a way to derive value from what many see as a burden. It is a process called Big Data analytics.
The concepts behind Big Data analytics are actually nothing new. Businesses have been using business intelligence tools for many decades, and scientists have been studying data sets to uncover the secrets of the universe for many years. However, the scale of data collection is changing, and the more data you have available, the more information you can extrapolate from them.
The challenge today is to find the value of the data and to explore data sources in more interesting and applicable ways to develop intelligence that can drive decisions, find relationships, solve problems, and increase profits, productivity, and even the quality of life.
The key is to think big, and that means Big Data analytics.
This book will explore the concepts behind Big Data, how to analyze that data, and the payoff from interpreting the analyzed data.
Sometimes the best information on a particular technology comes from those who are promoting that technology for profit and growth, hence the birth of the white paper. White papers are meant to educate and inform potential customers about a particular technology segment while gently goading those potential customers toward the vendor’s product.
That said, it is always best to take white papers with a grain of salt. Nevertheless, white papers prove to be an excellent source for researching technology and have significant educational value. With that in mind, I have included the following white papers in the appendix of this book, and each offers additional knowledge for those who are looking to leverage Big Data solutions: “The MapR Distribution for Apache Hadoop” and “High Availability: No Single Points of Failure,” both from MapR Technologies.