I’m very happy that any area of machine learning (aka, statistical inference and decision-making) is beginning to make impact on real-world problems.
I’m also overall happy with the rebranding associated with the usage of the term “deep learning” instead of “neural networks”. In other engineering areas, the idea of using pipelines, flow diagrams and layered architectures to build complex systems is quite well entrenched, and our field should be working (inter alia) on principles for building such systems. The word “deep” just means that to me—layering (and I hope that the language eventually evolves toward such drier words…). I hope and expect to see more people developing architectures that use other kinds of modules and pipelines, not restricting themselves to layers of “neurons”.
With all due respect to neuroscience, one of the major scientific areas for the next several hundred years, I don’t think that we’re at the point where we understand very much at all about how thought arises in networks of neurons, and I still don’t see neuroscience as a major generator for ideas on how to build inference and decision-making systems in detail. Notions like “parallel is good” and “layering is good” could well (and have) been developed entirely independently of thinking about brains.
Before Backpropagation days anything that the brain couldn’t do was to be avoided; we needed to be pure in order to find our way to new styles of thinking. And then Dave Rumelhart started exploring backpropagation—clearly leaving behind the neurally-plausible constraint—and suddenly the systems became much more powerful. This made an impact on me. Let’s not impose artificial constraints based on cartoon models of topics in science that we don’t yet understand.
My understanding is that many if not most of the “deep learning success stories” involve supervised learning (i.e., backpropagation) and massive amounts of data. Layered architectures involving lots of linearity, some smooth nonlinearities, and stochastic gradient descent seem to be able to memorize huge numbers of patterns while interpolating smoothly (not oscillating) “between” the patterns; moreover, there seems to be an ability to discard irrelevant details, particularly if aided by weight- sharing in domains like vision where it’s appropriate. There’s also some of the advantages of ensembling. Overall an appealing mix. But this mix doesn’t feel singularly “neural” (particularly the need for large amounts of labeled data).
Indeed, it’s unsupervised learning that has always been viewed as the Holy Grail; it’s presumably what the brain excels at and what’s really going to be needed to build real “brain-inspired computers”. But here I have some trouble distinguishing the real progress from the hype. It’s my understanding that in vision at least, the unsupervised learning ideas are not responsible for some of the recent results; it’s the supervised training based on large data sets.
One way to approach unsupervised learning is to write down various formal characterizations of what good “features” or “representations” should look like and tie them to various assumptions that seem to be of real-world relevance. This has long been done in the neural network literature (but also far beyond). I’ve seen yet more work in this vein in the deep learning work and I think that that’s great. But I personally think that the way to go is to put those formal characterizations into optimization functionals or Bayesian priors, and then develop procedures that explicitly try to optimize (or integrate) with respect to them. This will be hard and it’s an ongoing problem to approximate. In some of the deep learning learning work that I’ve seen recently, there’s a different tack—one uses one’s favorite neural network architecture, analyses some data and says “Look, it embodies those desired characterizations without having them built in”. That’s the old-style neural network reasoning, where it was assumed that just because it was “neural” it embodied some kind of special sauce. That logic didn’t work for me then, nor does it work for me now.
Lastly, and on a less philosophical level, while I do think of neural networks as one important tool in the toolbox, I find myself surprisingly rarely going to that tool when I’m consulting out in industry. I find that industry people are often looking to solve a range of other problems, often not involving “pattern recognition” problems of the kind I associate with neural networks. E.g., (1) How can I build and serve models within a certain time budget so that I get answers with a desired level of accuracy, no matter how much data I have? (2) How can I get meaningful error bars or other measures of performance on all of the queries to my database? (3) How do I merge statistical thinking with database thinking (e.g., joins) so that I can clean data effectively and merge heterogeneous data sources? (4) How do I visualize data, and in general how do I reduce my data and present my inferences so that humans can understand what’s going on? (5) How can I do diagnostics so that I don’t roll out a system that’s flawed or figure out that an existing system is now broken? (6) How do I deal with non-stationarity? (7) How do I do some targeted experiments, merged with my huge existing datasets, so that I can assert that some variables have a causal effect?
Although I could possibly investigate such issues in the context of deep learning ideas, I generally find it a whole lot more transparent to investigate them in the context of simpler building blocks.
Based on seeing the kinds of questions I’ve discussed above arising again and again over the years I’ve concluded that statistics/ML needs a deeper engagement with people in CS systems and databases, not just with AI people, which has been the main kind of engagement going on in previous decades (and still remains the focus of “deep learning”).