Futurist.com welcomes Anne Boysen as a new member of our Think Tank group. Anne is available for presentations, interviews, and consulting and you can read a bit about her here. Anne originally authored this article on how data science enhances foresight at LinkedIn, see the link at the bottom of the page.
2018 generated more data than most of recorded human history
That’s a lot of information! In his book Superforecasting, Phillip Tetlock suggests that good forecasting is more often the accumulated wisdom of heterogeneous groups of well-informed amateurs who peruse information from a variety of sources than predictions of a few subject matter experts. These forecasters have also been trained in recognizing their own biases. Interestingly, some of the key attributes associated with Tetlock’s forecasting skills happen to be the same attributes computers use when processing large amounts of data.
Human processing power is 10 million times slower than computers.
And while we don’t like to admit it, we are all victims of cognitive biases. We prefer to digest information immediately available to us and overlook the more complex. We get inspired by success stories but never hear the stories that could forewarn us of alternative paths. We mistake noise for signals and fail to see the signals in the noise. The truth, is we humans suck at information processing.
“Prediction is difficult, especially about the future”
I often hear the argument in this quote raised against predictive analytics. But the premise is misleading. While the second clause was probably added in jest, specifying The Future might not be so preposterous after all. If you toss only one die, you have about a sixth chance of being right. But try to predict the outcome of several dice tossed in a sequence and you’ll quickly run out of good guesses. Or get more possible permutations than there are stars in the universe. “The Future” is like those sequential dice tosses. That is, if you ever need an univocal account of an unbounded future. (Good luck with that..)
This article is not for those situations. This is about analyzing trends of quickly changing data, applying complex models to see change not visible to most of us, and to predict what is likely to happen next in your specific area of choice.
Unpacking the buzzwords
To envision how data science can be applied to decision making we need to first understand the nature of its components.
Data is at the core of any research related activity whether analyzed by humans or machines. It can be structured, such as the data we find in relational data bases and CRM systems. Or unstructured, meaning it has not been intended for analytics. 80% of all digital data is unstructured. I like to think of the difference as manicured gardens versus natural forests where everything grows organically. Structured data is often easier to work with because it is already organized. Unstructured data requires a bit more preparation but gives us access to an enormous pool of insights. It’s like putting together a bouquet of unusual wildflowers.
Data Mining (DM) is the process of turning raw data into useful information. I wish it would also include data harvesting because it would revive a historical analogy to the Westward expansion in the U.S. – the goldrush in the 19th century and the oil boom in the 20th. If data is the new oil, its value lies in our ability to explore, extract and analyze it. In this context we could get an understanding what “mineral rights” mean in the digital age. Are you in charge of the precious minerals under your own digital turf?
Predictive Analytics (PA) calculates future outcomes for selected target variables. Unlike traditional descriptive statistics, PA processes large amounts of data, often sorted into tables or matrices, and uses cross-validation rather than sampling methods and p-test to find significance. Unlike descriptive statistics, which finds general trends in a population, PA lets us know what each case is likely to do in the future. This is how companies with millions of customers can hyper personalize their offerings to the individual.
Machine Learning (ML) Just like you can train a dog to sniff out things you’d barely notice, data analytics will help you sniff out relationships in your data. ML differs from data analytics the way your dog differs from your child (unless you your dog is your child, of course). But while a dog depends on us its whole life, we teach our children to grow up to make their own decisions (hopefully). Like children, machines learn from both instruction and independent exploration. The learning is supervised when the algorithm is trained on labelled data and it applies classification rules to make sense of new data. Unsupervised methods allow the algorithm itself to find the interesting patterns and associations to cluster similar types of data together. Both children and machines improve themselves via reinforcement learning. And where children learn to tell fact from fiction by playing games of make-believe, machines improve similar abilities with Generative Adversarial Networks. GANs are a type of deep neural network architecture which pits two neural networks against each other in a game of True or False. Imagine if humans could play similar games in social media channels to tell fake from verifiable news!
Text mining and Natural Language Processing (NLP) uses human language as input. User generated content are treasure troves of foresight related information. Unlike numeric data which can readily be put into plots or statistical models to find differences, plain text usually requires some front-loaded data preparation, such as stemming and removing stop words. One of the more exciting NLP applications is being able to derive immediate opinion trends via sentiments of text “in the wild”, such as from social media. Thankfully, linguists and data scientists have already done the meticulous job of turning words and word combinations into formats computers can understand. In my next post I’ll explain how you can find early social trends by mining substantial amounts of real-time text data in several languages in just a few minutes.
Computer vision and image classification. A picture can say a thousand words. Every day 1.8 billion images are being shared, pixels which may forebode the emergence of new trends. Image analytics are probably still some years away, but these types of data will increasingly inform business strategies and planning. Image analytics is a more legally and ethically thorny data type which involves privacy as well as copyright issues (please read about privacy below). Contrary to query based tabulated data such as SQL, visual data require deep neural networks. Artificial Neural Networks mimic the nonlinear highly complex structure of neurons in the human brain. An input layer, such as an image, feeds information that triggers the firing of “neurons” connected in various hidden layers until the output more or less resembles the original. While convolutional neural networks are the most commonly used in image analytics, a brand-new method called capsule networks can better predict the spatial relationship between the individual elements in the vectors. This reduces the misclassification rate of “noisy” images and can make it easier for us to find meaningful information from objects in images and thereby discover new trends.
But what about privacy?
The best part about doing data mining as a strategic forecaster to get “big picture” information is the lack of incentive to possess personally identifiable information. Unlike predictive modelers who optimize offerings for individual users, we are interested in trends that can be measured with anonymized data. Individual cases are interesting only in so far as they can nuance our understandings of collective behaviors. But these records can, actually should, have PII removed or replaced them with neutral values.
Data Science and AI are not new, and our lifestyle would come to a grinding halt without it. In Strategic Foresight though, we’re barely sticking our toes into it. One thing is for sure, we have an exciting journey ahead!
Disclaimer by Anne Boysen, the author: The views expressed on this web site are mine alone and do not necessarily reflect the views of my employer, organizations affiliated with my employer, past nor current clients and customers I have worked with.
Anne Boysen is the newest member of the Futurist.com Think Tank. If you would like to learn more about how you might use data science to enhance foresight, contact Anne here. This article first appeared at LinkedIn