link to original article

Today artificial neural networks are making art, writing speeches, identifying faces and even driving cars. It feels as if we’re riding the wave of a novel technological era, but the current rise in neural networks is actually a renaissance of sorts.

It may be hard to believe, but artificial intelligence researchers were already beginning to see the promise in neural networks during World War II in their mathematical models. But by the 1970s, the field was ready to give up on them entirely.

“[T]here were no impressive results until computers grew up, that is until the past 10 years,” Patrick Henry Winston, a professor at MIT who specializes in artificial intelligence, says. “It remains the most important enabler of deep learning.”

Neural Nets

Today’s neural networks are essentially decision trees that rely on mathematical logic that resembles, for lack of a better analogy, the firing of synapses in the human brain. Several layers of artificial neurons, or nodes, are utilized to arrive at the solution to a problem. As data is fed through the layers, a simple computation occurs at each node, and the solution is passed to the next layer of neurons for another round of computations. All the while, the math that occurs at each neuron is being slightly modified by the previous result. In this way, a neural network can teach itself patterns in data that match a desired solution and optimize the path to it, sort of like tuning a guitar. The more data you feed a neural net, the better it gets at tuning its neurons and finding a desired pattern.

While the field has emerged in recent years as a tour de force for computer experts and even some hobbyists, the history of the neural network stretches back far further to the dawn of computers. The very first map of a neural network came in 1943 in a paper from Warren Sturgis McCulloch and Walter Pitts. But McCulloch’s framework had little at all to do with computing; instead, he was focused on the structure and function of the human brain. The McCulloch-Pitts model of neuron function, of course, arose during a time when the technology to monitor such activity didn’t exist.

McCulloch and Pitts believed each neuron in the brain functioned like an on-off switch (like binary numbers 1 and 0), and that combinations of these neurons firing on or off yielded logical decisions. At the time, there were many competing theories to describe the way the brain operated, but according to a paper by Gualtiero Piccinni of the University of Missouri, St. Louis, the McCulloch-Pitts model did something others hadn’t: It whittled brain function down to something that resembled a simple computer, and that sparked interest in building an artificial brain from scratch.

Early Success

The first successful—and that’s a generous term—neural network concept was the Perceptron algorithm from Cornell University’s Frank Rosenblatt. The Perceptron was originally envisioned to be a machine, though its first implementation was as a class of neural networks that could make fairly rudimentary decisions. Eventually, the algorithm was incorporated into a refrigerator-sized computer called the Mark 1, which was an image recognition machine. It had an array of 400 photocells linked with its artificial neural network, and it could identify a shape when it was held before its “eye.”

A few years later in 1959, ADALINE arrived via researchers at Stanford University, and was at the time the biggest artificial brain. But it, too, could only handle a few processes at a time and was meant as a demonstration of machine learning rather than being set to a specific task.

These small, but tantalizing advancements in computing fueled the hysteria surrounding artificial intelligence in the 1950s, with Science running the headline “Human Brains Replaced?” in a 1958 issue about neural networks. Intelligent robots stormed into science fiction at a swifter clip. This same cycle, though, has repeated itself with many automated processes throughout history. As Adelheid Voskuhl points out in Androids in the Enlightenment, clockwork-run automata constructed in the 18th century were seen as a threat to humanity and proof that machines would rule the world in due time. But these androids of the Enlightenment were nothing more than glorified puppetry.

In the middle of the 20th century, research was slow-moving and couldn’t keep pace with public imagination, as University of Toronto psychology professor Eyal Reingold points out. Reports that the artificial brain was on the verge of replacing the human mind was as far from reality then as could be.

“Unfortunately, these earlier successes caused people to exaggerate the potential of neural networks, particularly in light of the limitation in the electronics then available,” he wrote in a history of artificial intelligence. “This excessive hype, which flowed out of the academic and technical worlds, infected the general literature of the time.”

Winter is Coming

It wasn’t fears of a robot takeover that nearly killed AI research by the early 1970s, though; it was a combination of factors. While MIT computer science professor Marvin Minsky is often credited with providing a death knell to Perceptrons, there was much more to the story.

There was the problem of cuts to government funding. The government was funneling more money into translation programs that could convert Russian to English near-instantaneously. Early neural nets showed these abilities with a 250-word vocabulary, but subsequent research was sluggish at best. In the mid 1960s, a government commission called Automatic Language Processing Advisory Committee deemed machine translation “hopeless.”

As highlighted by Gary Yang, a 1973 report called the Lightfoot Report also pointed out that several areas where machine learning could be applied — like autopilot functions — were actually better served by much less technologically advanced methods.

Nils Nilsson, a retired Stanford University computer science professor, worked on these early generations of artificial intelligence. One of his bigger claims to fame was Shakey, a robot built in the 1960s that could perform rudimentary image recognition. It was so named because it wobbled as it moved, using a TV camera to capture and understand the world around it. It could interpret computer inputs about an object in the room and interact with it in certain ways. It was also an early neural net success, but it wasn’t enough.

Winston says that one of the problems was that neural networks weren’t able to have an all-encompassing approach. He says Marvin Minsky’s Perceptron paper showed that other artificial intelligence research areas were needed — and that the technology wasn’t there yet.

“Minsky’s writings were for a special category of Perceptrons,” Nilsson says. “The main reason that research on neural nets fell out of  favor in the 60s was that no one then could figure out a way to train multi-layer neural nets.”

To boil it down: Minsky’s paper demonstrated that, even at their most complex, the Perceptron class of AI were a too binary in their thinking, hindering the ability of machine learning to attack more complex tasks. In Minsky’s view, you needed different kinds of artificial intelligence to talk to each other, which may have been beyond the capabilities of hardware at the time.

“Minsky was all about thinking you needed multiple representations, methods, and approaches,” he says.

The neural network thus began to recede from the public imagination, ushering in what’s been called “AI Winter,” where artificial intelligence research funding dried up, and many lines of research ground to a halt. This included neural networks, and AI research shifted to other areas of focus.

“People worked on a variety of things: expert systems, the use of logic to do reasoning, speech recognition, computer vision, and robots,” Nilsson says. Expert systems, meant to be vast repositories of knowledge from experts computerized into logic statements, led to a second sort of AI winter when their abilities, too, were overhyped.

Making a Comeback

But in 1974, Paul Werbos, then a Harvard PhD student, introduced a way to improve neural networks. By layering several neural networks on top of one another, you could have certain neurons error check others in a process called backpropagation, a way an artificial brain could “second guess” itself and search for a new decision.

This was important. Previous neural nets could get hung up on the same decision. If you layered several decisions into an ultimate outcome, the machine could essentially use one part of the neural network to double check another part. This, in effect, gave it a layered complexity to its line of thinking. Instead of thinking in the black and white of true / false Perceptron inputs, it could interpret a neutral value to arrive at a decision by weighing several factors.

In effect, it would move beyond logic statements and into complex machine learning.

It was a bold, forward thinking paper—perhaps a little too forward thinking. No computer hardware at the time could handle such complex operations.

Nilsson also points to the 1986 publication of Parallel distributed processing: explorations in the microstructure of cognition by David E. Rumelhart of the University of California at San Diego and James L. McClelland of Carnegie Mellon University. It improved on the work of Werbos by showing one of the best modern maps of the human neural network, creating the best-ever map of the brain.

This map also helped refine Werbo’s ideas, showing how neurons worked in the brain, and how you could apply this to artificial neural nets. You could work around the inability to understand neutral functions by having other connected neural nets work out a more refined “neutral” answer. It just needed a kick from a couple areas: “great advances in computer power and large databases that allowed ‘deep learning,’” as Nilsson said. Those advances in computing have arrived. Today, researchers have the processing power and access to troves of data stored in “the cloud” to teach algorithms new functions.

ADALINE and its primitive cousins may have faded from public perception as machine learning has come into its own over the past decade. But this revolution, decades in the making, wasn’t hampered by these neural networks. Instead, they were somehow both too primitive and too advanced for their time, but their time has certainly come.