Knowing what Big Data is and knowing its value are two different things. Even with an understanding of Big Data analytics, the value of the information can still be difficult to visualize. At first glance, the well of structured, unstructured, and semistructured data seems almost unfathomable, with each bucket drawn being little more than a mishmash of unrelated data elements.
Finding what matters and why it matters is one of the first steps in drinking from the well of Big Data and the key to avoid drowning in information. However, this question still remains: Why does Big Data matter? It seems difficult to answer for small and medium businesses, especially those that have shunned business intelligence solutions in the past and have come to rely on other methods to develop their markets and meet their goals.
For the enterprise market, Big Data analytics has proven its value, and examples abound. Companies such as Facebook, Amazon, and Google have come to rely on Big Data analytics as part of their primary marketing schemes as well as a means of servicing their customers better.
For example, Amazon has leveraged its Big Data well to create an extremely accurate representation of what products a customer should buy. Amazon accomplishes that by storing each customer’s searches and purchases and almost any other piece of information available, and then applying algorithms to that information to compare one customer’s information with all of the other customers’ information.
Amazon has learned the key trick of extracting value from a large data well and has applied performance and depth to a massive amount of data to determine what is important and what is extraneous. The company has successfully captured the data “exhaust” that any customer or potential customer has left behind to build an innovative recommendation and marketing data element.
The results are real and measurable, and they offer a practical advantage for a customer. Take, for example, a customer buying a jacket in a snowy region. Why not suggest purchasing gloves to match, or boots, as well as a snow shovel, an ice melt, and tire chains? For an in-store salesperson, those recommendations may come naturally; for Amazon, Big Data analytics is able to interpret trends and bring understanding to the purchasing process by simply looking at what customers are buying, where they are buying it, and what they have purchased in the past. Those data, combined with other public data such as census, meteorological, and even social networking data, create a unique capability that services the customer and Amazon as well.
Much the same can be said for Facebook, where Big Data comes into play for critical features such as friend suggestions, targeted ads, and other member-focused offerings. Facebook is able to accumulate information by using analytics that leverage pattern recognition, data mash-ups, and several other data sources, such as a user’s preferences, history, and current activity. Those data are mined, along with the data from all of the other users, to create focused recommendations, which are reported to be quite accurate for the majority of users.
Google leverages the Big Data model as well, and it is one of the originators of the software elements that make Big Data possible. However, Google’s approach and focus is somewhat different from that of companies like Facebook and Amazon. Google aims to use Big Data to its fullest extent, to judge search results, predict Internet traffic usage, and service customers with Google’s own applications. From the advertising perspective, Web searches can be tied to products that fit into the criteria of the search by delving into a vast mine of Web search information, user preferences, cookies, histories, and so on.
Of course, Amazon, Google, and Facebook are huge enterprises and have access to petabytes of data for analytics. However, they are not the only examples of how Big Data has affected business processes. Examples abound from the scientific, medical, and engineering communities, where huge amounts of data are gathered through experimentation, observation, and case studies. For example, the Large Hadron Collider at CERN can generate one petabyte of data per second, giving new meaning to the concept of Big Data. CERN relies on those data to determine the results of experiments using complex algorithms and analytics that can take significant amounts of time and processing power to complete.
Many pharmaceutical and medical research firms are in the same category as CERN, as well as organizations that research earthquakes, weather, and global climates. All benefit from the concept of Big Data. However, where does that leave small and medium businesses? How can these entities benefit from Big Data analytics? These businesses do not typically generate petabytes of data or deal with tremendous volumes of uncategorized data, or do they?
For small and medium businesses (SMB), Big Data analytics can deliver value for multiple business segments. That is a relatively recent development within the Big Data analytics market. Small and medium businesses have access to scores of publicly available data, including most of the Web and social networking sites. Several hosted services have also come into being that can offer the computing power, storage, and platforms for analytics, changing the Big Data analytics market into a “pay as you go, consume what you need” entity. This proves to be very affordable for the SMB market and allows those businesses to take it slow and experiment with what Big Data analytics can deliver.
With the barriers of data volume and costs somewhat eliminated, there are still significant obstacles for SMB entities to leverage Big Data. Those obstacles include the purity of the data, analytical knowledge, an understanding of statistics, and several other philosophical and educational challenges. It all comes down to analyzing the data not just because they are there but for a specific business purpose.
For SMBs looking to gain experience in analytics, the first place to turn to is the Web—namely, for analyzing web site traffic. Here an SMB can use a tool like Blekko (http://www.blekko.com) to look at traffic distribution to a web site. This information can be very valuable for SMBs that rely on a company web site to disseminate marketing information, sell items, or communicate with current and potential customers. Blekko fits the Big Data paradigm because it looks at multiple large data sets and creates visual results that have meaningful, actionable information. Using Blekko, a small business can quickly gather statistics about its web site and compare it with a competitor’s web site.
Although Blekko may be one of the simplest examples of Big Data analytics, it does illustrate the point that even in its simplest form, Big Data analytics can benefit SMBs, just as it can benefit large enterprises. Of course, other tools exist, and new ones are coming to market all of the time. As those tools mature and become accessible to the SMB market, more opportunities will arise for SMBs to leverage the Big Data concept.
Gathering the data is usually half the battle in the analytics game. SMBs can search the Web with tools like 80Legs, Extractiv, and Needlebase, all of which offer capabilities for gathering data from the Web. The data can include social networking information, sales lists, real estate listings, product lists, and product reviews and can be gathered into structured storage and then analyzed. The gathered data prove to be a valuable resource for businesses that look to analytics to enhance their market standings.
Big Data, whether done in-house or on a hosted offering, provides value to businesses of any size—from the smallest business looking to find its place in its market to the largest enterprise looking to identify the next worldwide trend. It all comes down to discovering and leveraging the data in an intelligent fashion.
The amount of data in our world has been exploding, and analyzing large data sets is already becoming a key basis of competition, underpinning new waves of productivity growth, innovation, and consumer surplus. Business leaders in every sector are going to have to deal with the implications of Big Data, either directly or indirectly.
Furthermore, the increasing volume and detail of information acquired by businesses and government agencies—paired with the rise of multimedia, social media, instant messaging, e-mail, and other Internet-enabled technologies—will fuel exponential growth in data for the foreseeable future. Some of that growth can be attributed to increased compliance requirements, but a key factor in the increase in data volumes is the increasingly sensor-enabled and instrumented world. Examples include RFID tags, vehicles equipped with GPS sensors, low-cost remote sensing devices, instrumented business processes, and instrumented web site interactions.
The question may soon arise of whether Big Data is too big, leading to a situation in which determining value may prove more difficult. This will evolve into an argument for the quality of the data over the quantity. Nevertheless, it will be almost impossible to deal with ever-growing data sources if businesses don’t prepare to deal with the management of data head-on.
Before 2010, managing data was a relatively simple chore: Online transaction processing systems supported the enterprise’s business processes, operational data stores accumulated the business transactions to support operational reporting, and enterprise data warehouses accumulated and transformed business transactions to support both operational and strategic decision making.
The typical enterprise now experiences a data growth rate of 40 to 60 percent annually, which in turn increases financial burdens and data management complexity. This situation implies that the data themselves are becoming less valuable and more of a liability for many businesses, or a low-commodity element.
Nothing could be further from the truth. More data mean more value, and countless companies have proved that axiom with Big Data analytics. To exemplify that value, one needs to look no further than at how vertical markets are leveraging Big Data analytics, which leads to a disruptive change.
For example, smaller retailers are collecting click-stream data from web site interactions and loyalty card data from traditional retailing operations. This point-of-sale information has traditionally been used by retailers for shopping basket analysis and stock replenishment, but many retailers are now going one step further and mining the data for a customer buying analysis. Those retailers are then sharing those data (after normalization and identity scrubbing) with suppliers and warehouses to bring added efficiency to the supply chain.
Another example of finding value comes from the world of science, where large-scale experiments create massive amounts of data for analysis. Big science is now paired with Big Data. There are far-reaching implications in how big science is working with Big Data; it is helping to redefine how data are stored, mined, and analyzed. Large-scale experiments are generating more data than can be held at a lab’s data center (e.g., the Large Hadron Collider at CERN generates over 15 petabytes of data per year), which in turn requires that the data be immediately transferred to other laboratories for processing—a true model of distributed analysis and processing.
Other scientific quests are prime examples of Big Data in action, fueling a disruptive change in how experiments are performed and data interpreted. Thanks to Big Data methodologies, continental-scale experiments have become both politically and technologically feasible (e.g., the Ocean Observatories Initiative, the National Ecological Observatory Network, and USArray, a continental-scale seismic observatory).
Much of the disruption is fed by improved instrument and sensor technology; for instance, the Large Synoptic Survey Telescope has a 3.2-gigabyte pixel camera and generates over 6 petabytes of image data per year. It is the platform of Big Data that is making such lofty goals attainable.
The validation of Big Data analytics can be illustrated by advances in science. The biomedical corporation Bioinformatics recently announced that it has reduced the time it takes to sequence a genome from years to days, and it has also reduced the cost, so it will be feasible to sequence an individual’s genome for $1,000, paving the way for improved diagnostics and personalized medicine.
The financial sector has seen how Big Data and its associated analytics can have a disruptive impact on business. Financial services firms are seeing larger volumes through smaller trading sizes, increased market volatility, and technological improvements in automated and algorithmic trading.
One of the surprising outcomes of the Big Data paradigm is the shift of where the value can be found in the data. In the past, there was an inherent hypothesis that the bulk of value could be found in structured data, which usually constitute about 20 percent of the total data stored. The other 80 percent of data is unstructured in nature and was often viewed as having limited or little value.
That perception began to change once the successes of search engine providers and e-retailers showed otherwise. It was the analysis of that unstructured data that led to click-stream analytics (for e-retailers) and search engine predictions that launched much of the Big Data movement. The first examples of the successful processing of large volumes of unstructured data led other industries to take note, which in turn has led to enterprises mining and analyzing structured and unstructured data in conjunction to look for competitive advantages.
Unstructured data bring complexity to the analytics process. Technologies such as image processing for face recognition, search engine classification of videos, and complex data integration during geospatial processing are becoming the norm in processing unstructured data. Add to that the need to support traditional transaction-based analysis (e.g., financial performance), and it becomes easy to see complexity growing exponentially. Moreover, other capabilities are becoming a requirement, such as web click-stream data driving behavioral analysis.
Behavioral analytics is a process that determines patterns of behavior from human-to-human and human-to-system interaction data. It requires large volumes of data to build an accurate model. The behavioral patterns can provide insight into which series of actions led to an event (e.g., a customer sale or a product switch). Once these patterns have been determined, they can be used in transaction processing to influence a customer’s decision.
While models of transactional data analytics are well understood and much of the value is realized from structured data, it is the value found in behavioral analytics that allows the creation of a more predictive model. Behavioral interactions are less understood, and they require large volumes of data to build accurate models. This is another case where more data equal more value; this is backed by research that suggests that a sophisticated algorithm with little data is less accurate than a simple algorithm with a large amount of data. Evidence of this can be found in the algorithms used for voice and handwriting recognition and crowd sourcing.
New developments for processing unstructured data are arriving on the scene almost daily, with one of the latest and most significant coming from the social networking site Twitter. Making sense of its massive database of unstructured data was a huge problem—so huge, in fact, that it purchased another company just to help it find the value in its massive data store. The success of Twitter revolves around how well the company can leverage the data that its users generate. This amounts to a great deal of unstructured information from the more than 200 million accounts the site hosts, which generates 230 million Twitter messages a day.
To address the problem, the social networking giant purchased BackType, the developer of Storm, a software product that can parse live data streams such as those created by the millions of Twitter feeds. Twitter has released the source code of Storm, making it available to others who want to pursue the technology. Twitter is not interested in commercializing Storm.
Storm has proved its value for Twitter, which can now perform analytics in real time and identify trends and emerging topics as they develop. For example, Twitter uses the software to calculate how widely Web addresses are shared by multiple Twitter users in real time.
With the capabilities offered by Storm, a company can process Big Data in real time and garner knowledge that leads to a competitive advantage. For example, calculating the reach of a Web address could take up to 10 minutes using a single machine. However, with a Storm cluster, that workload can be spread out to dozens of machines, and a result can be discovered in just seconds. For companies that make money from emerging trends (e.g., ad agencies, financial services, and Internet marketers), that faster processing can be crucial.
Like Twitter, many organizations are discovering that they have access to a great deal of data, and those data, in all forms, could be transformed into information that can improve efficiencies, maximize profits, and unveil new trends. The trick is to organize and analyze the data quickly enough, a process that can now be accomplished using open source technologies and lumped under the heading of Big Data.
Other examples abound of how unstructured, semistructured, and structured Big Data stores are providing value to business segments. Take, for example, the online shopping service LivingSocial, which leverages technologies such as the Apache Hadoop data processing platform to garner information about what its users want.
The process has allowed LivingSocial to offer predictive analysis in real time, which better services its customer base. The company is not alone in its quest for squeezing the most value out of its unstructured data. Other major shopping sites, shopping comparison sites, and online versions of brick-and-mortar stores have also implemented technologies to bring real-time analytics to the forefront of customer interaction.
However, in that highly competitive market, finding new ways to interpret the data and process them faster is proving to be the critical competitive advantage and is driving Big Data analytics forward with new innovations and processes. Those enterprises and many others learned that data in all forms cannot be considered a commodity item, and just as with gold, it is through mining that one finds the nuggets of value that can affect the bottom line.