Social data is prediction, not marketing
Your social media dashboard is a rearview mirror. The data can be headlights.
In 2010, two researchers at HP Labs published a paper. Sitaram Asur and Bernardo Huberman. They took Twitter chatter about upcoming films, ran it through a model, and predicted box office revenue with startling accuracy. The volume and tone of conversation before a film opened told them how much money it would make. Not focus groups. Not tracking studies. Tweets.
The paper got cited a lot in academic circles. In the marketing departments of actual companies, it changed nothing. Three years later, the average brand is still using social data to count likes and calculate engagement rate. They’ve built dashboards that tell them what happened last week. They’re sitting on a dataset that could tell them what’s going to happen next month, and they’re using it as a rearview mirror.
Google Flu Trends launched in 2008. The idea was simple. Track what people search for, map it against flu outbreaks, predict where the flu is going before the CDC can. For a few years, it worked. Then it stopped working.
In the 2012-2013 flu season, Flu Trends overestimated the prevalence of flu by more than 50%. It predicted that nearly 11 percent of the US population had influenza at the peak in January. The actual number was half that. Media coverage of flu was driving searches. People who didn’t have the flu were googling “flu symptoms” because the news told them to. The signal got swamped by its own echo.
This is the story everyone tells now when they want to argue that social data is unreliable. They’re wrong about the lesson. The lesson isn’t that social data can’t predict. The lesson is that you have to understand what you’re measuring. Google was measuring search interest in flu. It thought it was measuring flu. Those are different things.
The same mistake is everywhere in marketing. Brands measure “engagement” and think they’re measuring interest. They measure “sentiment” and think they’re measuring opinion. They measure “reach” and think they’re measuring attention. The metric is not the thing. It never was.
I worked with a fashion retailer last year. They had three years of social conversation data about their products. Thousands of posts per week. Comments, shares, complaints, requests. They were using all of it to generate two numbers: engagement rate and sentiment score. Two numbers from three years of behavioural signal.
We pulled the raw data apart instead. What were people actually saying? Not the sentiment score, the words. It turned out that when conversation about a product shifted from “I want this” to “where can I find this,” purchases followed within two weeks. When conversation shifted from praise to frustration about sizing, returns followed within a month. The pattern was in the language, not in the sentiment score. The sentiment score couldn’t see it because the sentiment score treats “I want this so bad” and “where can I find this” as equally positive. They’re not the same signal. One is desire. The other is intent.
Intent is predictive. Desire is noise. The dashboard couldn’t tell them apart.
The prediction use cases aren’t theoretical. They’re happening now, mostly in industries that don’t think of themselves as social media companies.
A cluster of posts about stomach complaints in a specific city is a public health signal. People tweet symptoms before they see a doctor. The data is there. It just hasn’t been wired into the system that could use it.
A spike in complaints about a specific product feature on social media predicts what will get returned in stores next month. A spike in photography and sharing of a specific item predicts what will sell out next season. The fashion industry still sends buyers to trade shows and trusts their intuition. The data about what people actually want is already public, free, and updating every second. Nobody’s reading it.
After Nate Silver called every state in the 2012 election, every political operation in America started paying attention to social data. But they used it for messaging. Buy ads. Target demographics. Shape narratives. The prediction signal is different. It’s in what people share, what they stop sharing, who they start arguing with. Network structure predicts voting behaviour better than content does. Volume predicts turnout. Almost nobody in politics is reading the data this way yet.
The problem is structural. Social data lives in the marketing department. Marketing departments think in campaigns. Campaigns have start dates, end dates, budgets, and KPIs. Social data doesn’t care about any of that. It’s continuous. It’s messy. It doesn’t fit into a monthly report because it updates every second and the interesting patterns take months to emerge.
To use social data for prediction, you need to give it to people who think in systems. Data scientists. Strategists. People whose job is to find patterns in noise, not to prove that last month’s campaign hit its targets.
I watched a consumer goods company learn this the hard way. Their social team flagged a spike in negative conversation about a product in Southeast Asia. The social team wrote a report. The report went to marketing. Marketing said “we’ll address it in the next campaign cycle.” Three months later, the product was pulled from shelves in two countries because of a quality issue that social data had caught in week one.
The signal was there. The organisation couldn’t hear it because the signal was sitting in a department that only thinks in campaign cycles.
I’m not arguing that social data replaces surveys or focus groups or sales data. Those have structure, depth, and precision that social data lacks. What social data has is scale, speed, and spontaneity. People don’t curate their social posts the way they curate survey responses. They say what they actually think, in the moment, before the marketing department has a chance to tell them what to say.
That rawness is hard to work with. You have to clean it, contextualise it, separate the signal from the noise. But once you do, you have a real-time map of what millions of people care about, updated every second, at zero marginal cost.
The brands that wire this into their decision-making will know what’s coming before their competitors do. Everyone else will keep building dashboards that show what happened last month and calling it insight.


