In 2008, Chris Anderson, the editor of Wired, published a founding article about the Big Data concept in which he announced the coming of a new way of thinking with these terms: “We come from a marketers’ world where there is good and bad, only one hypothesis we test or a good model we want to measure. The big Data does not have anything to do with that. We have to think like Google”. Anderson explains that Google never wanted to understand website content that it indexes nor to know why a page was better than another. His algorithms only focused on statistical analysis of incoming links and other parameters only mathematical. Anderson had concluded that taxonomy, ontology and psychology was now useless and that we just need to notice that people do what they do by following and measuring their behavior rather than trying to understand why people do what they do. “With enough data, figures talk about themselves”.
This is the application field of Big Data that seems to bring Market research as we know it to an end. Actually, at the instantaneous information era about the consumer behavior, what is the value of studies and researches tempting to detect and create truths in a universe where our investigations’ aim is more and more versatile and unfaithful? What is the point of questioning, estimating and extrapolating if we are able to directly observe behaviors and reactions in which we are interested?
To know how the Big Date is placed compared to marketing studies and to see threats or opportunities that it could generate, we need to define the concept, describe its implementation means and explain concrete benefits that we could wait for.
What is Big Data?
Big Data is not a technology per se. This is more a recent phenomenon resulting from masses of raw data that are generated by our society through its networks, it inter-connections and its exchanges. These masses become daily more voluminous, quick and various.
We daily produce about 2,5 billions of digital information bytes into the world. According to Stephen Gold, marketing director of IBM Watson solutions (who before managed the SPSS’s marketing), 90% of the entire available data in the world was produced during these last two years.
Before this digital era, new data production and conservation generally was meant, structured, having a direct meaning and precisely localized. It was made from a minority of individuals, firms or organizations. The arriving of web, the databases interconnection and the cloud computing have changed everything.
Today, each individual, voluntarily or not, produces an important volume of data. Blogs, forum, messaging service, tweets, social networks, sharing websites and other collaborative platforms allow everybody to easily give one’s point of view and to produce a mass of heterogeneous information about everything. These data are stored on servers all over the world. Parallel to that, our behaviors and actions on the web automatically generate a great mass of personal data. Our histories of research or navigation, products that we look for or articles that we buy on internet are trackable. We also generate data through our more and more connected devices (mobile phone, tablets, televisions, cars and even household electrical goods) and electronic chips that we use everyday (bank cards, transport cards, access cards, subscription cards, loyalty cards). All these “captors” can permanently indicate our position, our behaviors, our habits…
It is this data stream or more exactly its utilization that is the base of Big Data concept considered today as a unique opportunity for firms but also for any kind of administrative, scientific, cultural organization…
Some people do not hesitate to compare the Big Data to the oil or to a gold mine. They certainly think both to the wealth it holds and to the needed efforts to access it. Actually, one of the today’s issues of a company or an organization lies on its ability to organize the capture of abundant and non-structured data and then their storage. But the serious challenge is to be able to analyze these data and find a meaning to it in order to in fine improve its performances, its organization and its knowledge.
The 3V of Big Data
In order to delimitate Big Data well, concept was characterized by “The 3 V”: Volume, Velocity and Variety.
When we say volume for the Big Data, we often mention the 7 terabyte (7000 billion of bytes) daily created by Twitter or the 10 terabytes of exchange content everyday on Facebook. We also mention Youtube that takes about 48 hours video per minute. Even Google, Amazon and other big players in the digital economy manage data streams and handle impressive volumes. These firms store the data on thousands aligned servers in many Data Centers and on thousands of square meters. To get an idea of these infrastructures you can have a virtual visit of Google sites on this address: http://www.google.com/about/datacenters/.
This data volume is quickly increasing since the beginning of the Web. Eric Schmidt, Google executive President assured in 2010 that were created as many data as between the beginning of humanity and the year 2003 every two days. This rhythm significantly increased.
In the early 2000s, Web intervening people felt necessity to manage an exponential mass of information. They knew it before other activity sectors discovered it. Companies as Google and then Yahoo or Facebook sought to settle a new storage technology spread on many servers and with a quick access to this strewn information when they felt having too many data compared to what they could take in charge. This Open Source technology was name Hadoop and is used today by many of the great web players.
If web companies were the first having this need of big data volumes management, they are not monopolist in this field anymore. Actually, each big company has today great information quantities to take care and to store. Large retailers, banks, mobile phones’ operators, distance selling and transport companies at any moment receive important data streams. In every sector, information coming from stores, contact centers, sales teams or websites can arrive at any time. Volumes depend on the activity but also on data size and on granularity level chosen for the information management: video captures, pictures, emails, web pages, consumers’ reviews, tweets, scanned documents, RFID chips, electronic counts, elements registered with captors, details of website consultation, log files… Each company can access to non-structured contents taken with observation tools on the web and settled down in order to follow what is said about the company, it competitors or it market segments. With the storage cost decrease people like storing anything that is storable without knowing how this information will be exploited.
Consequence to increasing volume is an important diversification of data sizes and data sources. Today we estimate that more than 80% of available data in the world are non-structured against 20% of structured data. Non-structured data volume rises 15 times faster than structured data volume and this volume comes from more and more various sources.
While traditional companies were used to take structured data in charge, store them in their relational databases and examine them in SQL, the created data mass comes today in various sizes. Data from social medias, automatically captured elements and mobile devices generate elements that do no naturally enter in the existing computer architectures. That is why we need a change in the approach mode and in infrastructures quite difficult to understand. Parallel to this, we need new competences to set up these new architectures.
French deciders and French IT directors begin to realize this issue even if their Big Data view seems quite fragmented. Actually when we interview them about this phenomenon they spontaneous link it to the Variety criterion more than with volume and velocity criteria. This is what is said in a recently survey made by the “Markess International” Institute about the new utilization perspectives of customers data with the Big Data. 110 decision makers of companies and of administrations (of more than 250 employees) were interviewed in France. This survey shows the fact that decision makers see contribution linked to the analysis of new data sources (even social data) in the Big Data. However, the fact that the two other V (Volume and Velocity) are less linked to Big Data probably reveals a misunderstanding of the concept and of its expectations concerning new infrastructures for the data management and data storage (Volume) and management in real time or by streams (Velocity). The survey shows that few interviewed people (even interviewed computing decision makers) know the Hadoop ecosystem (see the box), an Open Source platform that forms the base of Big Data infrastructures of many companies of this field (Facebook, Adobe, Amazon, Google, IBM, Linkedin, Microsoft, etc.).
Velocity is the frequency of the information raising. Massive non-structured data come faster and faster in the companies’ information systems. The coverage velocity of streams, multiple data processing and instantaneous inputs utilization corresponds to an essential need of Big Data. The batch mode, which is central to many CRM, ERP or decisional systems and which regularly produces “refresh” data, seems to be unsuitable with the coverage and the rapid utilization of constant information stream. Data collection and data sharing is becoming an absolute prerequisite in the Big Data approach.
We just have to refer to the way Google Analytics is able to reveal real-time number of people visiting our website, their geographical origin, pages being consulted and many other instantaneous parameters, in order to realize that today we live in an immediacy era by thinking to all underlying levels of technical and organizational expectations. When we know that all this information is real-time worldwide managed for millions of sites, the performance is impressive. Naturally we talk here about the leading web company which manages huge data volumes. Few organizations in the world could affirm to reach these performances on Big Data. Google Initiatives and innovations often are inspiring us.
Big Data philosophy
The 3 V definition is compatible with the Big Data concept but does not explain it well. It can make people thinking that Big Data is similar to what we usually do with our data but in a bigger, more various and faster way. This is absolutely not the case and this concept has its own logic totally different to what we already know.
Chris Anderson, who we mentioned in the introduction, explains it by first saying that traditional science has always be based on models, theories, hypotheses or mechanisms that were needed to be understood and generalized for us to grasp its complexity. Statistics from marketing surveys are an example. Anderson reminds us the sentence of the famous statistician, Georges Box: “All models are wrong, some are useful” and explains that models and theories in all sciences are right according to one context but not to another. We can take the example of the Newtonian mechanics theory that is right with the planetary moves but totally wrong with the infinitely small structures (that deals with quantum mechanics).
Conversely to usual scientific approaches, the Big Data approach uses an empirical process that does not lie on the understanding of underlying mechanisms anymore but lies on the observation of facts. Anderson gives the example of Google translate that translates any language in any other language (it can translate 65 languages!) without having any idea of the grammatical structures of these language. The translation is based on a system named “statistical machine translation”, that analyzes millions of documents translated by human beings (books, ONU documents, websites…) and that makes links with its translations that usually are correctly made even if not perfects. To learn more about this mechanism you can watch a video on: http://translate.google.com/about.
To continue the reflection, Anderson consider this Google translate approach as perfectly illustrating the “Big Data” thought that is found behind models and theories in order to only use existing elements of which the volume enables to dig satisfying results up using statistical calculation. And even if we find links between these elements that do not have anything to do together according to us, the important volume of used data to get to these links let us believe that we deal with real phenomena even if they are not explained (Anderson cites the correlation between Pampers and the Super Bowl!). The Big Data idea is to give up the explanation seeking and to simply enable to take this reality into account in order to concretely use it.
Big Data contribution
According to a McKinsey survey, a wholesaler who optimally uses Big Data can increase his/her operating margin to more than 60%! The same report indicates that 30% of Amazon’s sales are owing to the advices given to the internet users for instance. Others can similarly act if each company or organization has a likelihood of having today an amount of information and indicators that could considerably improve its performances. Some of these data did not exist in the past or could not concretely be taken into account and be analyzed. Marketing, commercial and operational contribution of an analysis of data streams similar to big data can be precious and various. A good capture, a good management, a good simulation, a good analysis and a good utilization of data coming from Big Data enable to reveal hidden relationship, to detect new opportunities, to refine some offers, to lead to organizational changes and to inflect on communication.
In brief, the good utilization of Big Data strengthen the firm’s performances and the firm’s competitiveness by enabling it to faster and better answer to its customers’ needs.
It enables to:
- Replace or support the human decision by automatized algorithms able to manage volume and velocity of data.
- Guarantee more clarity and transparence in decisions mechanisms.
- Faster experiment and faster assess (maybe instantaneously) marketing alternatives.
- Bring new predictable opportunities
- Easier predict moves and tendencies
- Substantially decrease purchase costs and utilization costs of information compared to classical methods (such as marketing research).
How to implement a Big Data project?
Such a promising approach deserves our attention. But how could we concretely do it? When a firm deals with this field for the first time and seek to use data types it does not used to deal with, there are a lot of forthcoming question:
- In which relevant and significant data should we interest?
- Which indicators could we implement?
- What are the quality data issues we are going to deal with?
- How much will it cost?
- How long will it take?
Answer to these questions becomes naturally easier in the light of experience. But as the Big Data always submit new varieties of data, always faster and with increasing volumes, we can permanently face any unknown in the equation and continually have to resolve new issues.
The first Big Data object is usually launched when management realizes that the firm is losing its opportunities by ignoring data at its disposal. The approach naturally consists in soliciting DSI to study the project. Marketing and computer teams exhaustively examine data that could be collected and used by sometimes having precise aims. But the more often, attempts of data exploitation and big promises from Big Data incite to look everywhere in order to avoid misusing something in the new field. At the end of this preliminary study stage that can be long and difficult, computer team tries to develop the subject to obey the established requirements. They develop it by taking the identified data and by organizing them in order for them to be treated. Loyal to their usual approaches and usual methods to what they were formed and that stays the rule to respect according to them, computer engineers try to integrate these taken data in the strict structures that will enable their utilization with tools of common databases management.
These developed indicators will be tested and then used for the production and marketing teams will be allowed to use them. After few weeks, users who have issues with the transformation of the process to real benefits ask for adaptations or even complete change in the approach. The following iterations can last a long time and wear all intervening people out without having brought a good result. Bill Franks, a specialist in Big Data who is the author of “Taming the Big Data Tidal Wave” (that already have been translated in Chinese, Japanese and Korean) analyzes this classical plan by précising that it does not work in the Big Data universe because it is more convenient with cases of all known elements, where risks are identified and where the steps are established. In the Big Data universe, using new data sources in order to respond to new issues by using new means is quite difficult. According to Bill Franks, the right solution would be to “start small”. He advices to start by defining simple indicators that would not need too much data nor too much time to spend to collect data. An online website can begin by identifying the products consulted by each visitor in order to send promotional offers to whom did not buy one of these products in order to get them to make a purchase. To limit the process and make the experiment easier, it is advised to avoid taking all data but to only take a limited parcel of data. For instance we can take the connections of a given month for only some products. This focus enables to avoid having too many volumes of important data and of useless information. It makes the handling of data files by the marketing operational people easier. It enables to more easily define subsets and to make tests by measuring the obtained results. Such small intuitive projects enable the company to get accustomed to the offered opportunities and to find the eventual problems, to know the financial efforts and to know the techniques and possible returns. We cannot begin by listing all the possible things but by entering the Big Data step by step through small initiatives that we will develop and enlarge when we will have all the knowledge.
And what about marketing research?
According to the ESOMAR report about the market of surveys, the marketing research activities that are not linked to the fact of asking questions to people (i.e. it is not like questionnaire, not like focus on groups…) was representing more than the half of the sector’s sales in 2010. These activities correspond to store audits, to audience measurement, to receipts analysis, to web analytics… These sectors already connect to Big Data and are maybe a prelude to it in the studies institutes’ activity.
As Ray Pointer (Director of “Vision Critical University” and regular speaker in ESOMAR) says: “Two Big Data visions are emerging from the marketing research point of view, even if there is an important covering between these two visions.” According to Pointer, we can talk about the brand-centric view in which we are interested in all data internally possessed by the brand: data from CRM, from loyalty cards, from market studies, from social networks… That is the first model. In the second one, firms that manage panels can access to multiple data concerning people and can use them for several brands and several customers.
Both visions leave small room to studies base on questioning people in terms of quantitative or qualitative approaches. We can expect that techniques go step backwards in the next few years. Risk is that firms end to consider questionnaire on customers as useless from a certain amount of data. Actually who would pay thousands euros to ask few questions to a sample of 500 or 1000 people while they can have instantaneous data available on hundreds of thousands of customers?
As a result, firms would quickly agree with Google concept about meaning of things’ inanity (the why from the questioning) for the benefit of instantaneous follow-up of their processing (the what from the observation of events). The challenge for studies institutes here would be to prove that one can light up the other and that the analysis of consumers aims and of consumers motivations can bring a real added value to the understanding of their purchase behaviors’ evolutions.
Anyway, the quote: “Big Data: Too big to ignore”, that was used as title for the work about Big Data and that is often used in American press, should inspire institutes and push them to be quickly interested in the phenomenon. The knowledge of methods, technologies and contributions can give them ideas in order to implement new products and new platforms that need to be invented. Today firms run out for specialists of data utilization from Big Data. In the most advanced countries in data treatment (such as United States) we seek for Data-Analysts and Data-Scientists from everywhere. Some suggest to internally entrust Big Data to a CAO directly linked to the executive management and able to understand data collect and data analysis both regards to the technical point of view (with the computer directors’ collaboration) and regards to the marketing point of view (with the marketing directors’ collaboration). In such organizations, the spokesperson will need external resources and will have authorization to appeal institutes that would offer him/her an expert assessment in the data collect and data analysis. Conversely, if the organization does not directly implement Big Data specialists, the relevant institutes can position themselves as privileged specialists of the field and offer their expert assessment both to computer directors and to marketing directors. Studies institutes fit well in this role. Actually, at the heart of the Big Data we find issues about the optimization of data collection, the utilization of non-structured data (as in the qualitative), the statistical analysis of data and the implementation of dashboards and reports. They already are all of them a part of the institutes’ work even if the Big Data field needs to acquire technical and computing expert assessments in addition. Besides, companies that want to use the Big Data will need spokespersons able to advise them about the technical view but who also understand the underlying marketing challenges. The SSII are less able to provide a frame to such profiles while institutes can implement mixed teams of Big Data specialists who have a technical understanding and studies directors experts in understanding the business of the customer.
For institutes, challenge firstly consists in being interested in the Big Data phenomenon and quickly acquires experience in this field in order to evolve with the market needs (and not being the future Kodaks of the digital era). One of the key points will certainly be to already try to implement more technical profiles able to understand the technological challenges more than what a research analyst would do. Another key point consists in surrounding oneself by technological partners able to assist oneself in projects such as Big Data by giving adapted advices and/or tools.
We can be glad about the recent announcement of Fleur Pellerin, the named minister in charge of digital economy to promote the development of a real Big Data branch in France in order to have the country “playing a leading role on this new market”. This would necessarily result in the implementation of adapted teaching branches able to give ad hoc profiles to the market.
The Big Data can be seen by studies institutes as an opportunity rather than a threat. This is time to position oneself, especially on the French market where last surveys showed that the % of the phenomenon is still distantly considered by decisions makers (a recent IDC survey shows that 70% of the 160 interviewed companies have not initiatives nor thoughts about this subject). So let us quickly implement a proactive approach in order to benefit from the change of paradigm rather than endure deal with it.