I helped work on a thing last weekend that I can’t write about, yet, and then last week I found my way to San Jose for Nvidia’s GPU Technology Conference, and fine, all right, OK, I’m convinced: now that the smartphone boom is plateauing, AI/deep learning is the new coal face of technology — and, at least for now, Nvidia bestrides it like many parallel colossi.
I use the metaphor “coal face” advisedly. It’s the place where advances are being made, where the most value is being created … but it’s also a messy business, often with little visibility, with many ways to go terribly wrong. Neural networks are still more applied science than they are engineering, although it’s beginning to move along that spectrum. The Nvidia GPU conference featured a sizable zone of scientific posters exploring the cutting edge of GPU usage, something you don’t see at a lot of tech conferences. It turns out some of those smug academic Ph.Ds were onto something after all. Who knew?
I feel slightly awkward writing about this stuff; I suppose it’s how tech journalists who aren’t engineers themselves must feel most of the time. All I have in the hands-on AI/ML/deep-learning/neural-network experience is some time I’ve spent playing around with TensorFlow, a graduate-level neural-network course I took back in the day, and some book research. Normally when I write about software I do with the confidence of someone who’s been paid to write software for a long time. Now, though, I feel a bit like an explorer in a strange land.
What even to call it? None of the names we use actually quite seem to fit. “Artifical intelligence” and “machine intelligence” have become the de facto standard, but it writes a misleading check the technology’s still a long way off from cashing. “Pattern recognition” elide the fact that we’ve gone way past recognition, and into translation and generation of patterns. “Machine learning” includes a host of historical techniques which don’t seem so relevant any more, in the age of neural networks, and yet “neural networks” is both too narrow and too broad. “Deep learning” has something of a specific technical meaning, but it seems the least bad option, and it’s what Nvidia CEO Jensen Huang uses, so let’s go with that. For now. In ten years, though, who knows, maybe “AI” will seem somewhat less presumptuous.
Deep learning opens up entire new categories of problems and makes them solvable. Only 2.5 years ago XKCD could joke that it would take “a research team and five years” to write software to determine whether a picture contained the image of a bird, and most of the industry nodded knowledgeably and laughed. Thirty months later, that joke seems painfully dated. Today we measure the solution not in years of developer time, but in days of GPU time.
So what are the implications, and opportunities, and possible abuses, of computer vision which is as semantically powerful as ours, able — if sufficiently well trained, and not overfitted — to recognize objects, threats, possibilities, human age and expression and social class? Of the ability to generate patterns, and copy them, to reimagine any image in the style of any other, be it Old Master or photo filter? Of the capacity to do this kind of pattern recognition, translation, and generation for any sufficiently broad set of data, not just images? Or of implicit biases, or inexplicable reasoning, that wind up secretly poisoning such systems?
…Yeah, I don’t know either. (Other than: yet more uses for images means yet more cameras in ever more places.) What I can tell you is that the real explosion in deep learning applications will come when this technology is accessible to the average engineer, just as smartphones really started to take off when anyone with a phone and a laptop could start submitting apps to the App Store.
And we’re a ways away from that yet. This technology is moving so fast that the tooling around it seems to be failing to keep up, and also, conceptually, it’s all still a little opaque and forbidding to the average engineer. A whole lot of modern software engineers are autodidacts without much formal training in algorithms, much less higher math. Even if they do have formal training, big-O notation, Turing machines, NP-completeness, etc., are one thing; tensor operations and stochastic gradient descent algorithms are quite another.
What’s more, to do anything really interesting, sheer brainpower won’t be enough; you’ll need massive datasets, to train your system. Does that make deep learning winner-took-all before it even started? Even if you find open data, and you have a whole new novel deep-learning notion, you’ll still need large quantities of very expensive GPUs or very expensive GPU time. This stuff isn’t near as accessible, yet, as apps were during the smartphone boom…
…which is of course part of its charm. App and web coding has descended into something pretty close to plumbing, in places, of late. I’ve stopped asking interviewees what their favorite algorithm is, because even those who understand what I’m asking really struggle to answer, and it doesn’t seem fair to ask a question which may not really be all that germane to their job. So frankly it’s something of a relief to see that the next big thing is limited, for now, to a high priest/priestesshood with a deep fundamental and theoretical grounding. It’s the way things are supposed to be. And it’s extra evidence, if you’ll pardon the circular reasoning, that this is the next big thing.
And yet I sound a note of caution. There’s only so much these high priests and priestesses can do, and when both the science and the tools are still under frantic development, progress tends to be patchy and punctuated rather than pervasive. Like the Internet itself, in its early years, don’t overestimate deep learning’s impact in the short term — but, like the Internet itself, don’t underestimate its impact in the long run.