Transfer Learning

You can never understand one language until you understand at least two.

This statement by the English writer, Geoffrey Willans, feels intuitive to anyone who has studied a second language. The idea is that learning to speak a foreign language inescapably conveys deeper understanding of one’s native language. Goethe, in fact, found this such a powerful concept that he felt moved to make a similar, but more extreme, assertion:

He who does not know foreign languages does not know anything about his own.

As compelling as this may be, what is perhaps surprising is that the essence of this idea—that learning or improvement in one skill or mental function can positively influence another one—is present not only in human intelligence, but also in machine intelligence. The effect is called transfer learning, and besides being an area of fundamental research in machine learning, it has potentially wide-ranging practical applications.

Today, the field of machine learning, which is the scientific study of algorithms whose capabilities improve with experience, has been making startling advances. Some of these advances have led to computing systems that are competent in skills that are associated with human intelligence, sometimes to levels that not only approach man’s capabilities but, in some cases, exceeds it. This includes, for example, the ability to understand, process, and even translate languages. In recent years, much of the research in machine learning has focused on the algorithmic concept of deep neural networks, or DNNs, which learn essentially by inferring patterns—often patterns of remarkable complexity—from large amounts of data. For example, a DNN-based machine can be fed many thousands of snippets of recorded English utterances, each one paired with its text transcription, and from this discern the patterns of correlation between the speech recordings and the paired transcriptions. These inferred correlation patterns get precise enough that, eventually, the system can “understand” English speech. In fact, today’s DNNs are so good that, when given enough training examples and a powerful enough computer, they can listen to a person speaking and make fewer transcription errors than would any human.

What may be surprising to some is that computerized learning machines exhibit transfer learning. For example, let’s consider an experiment involving two machine-learning systems, which for the sake of simplicity we’ll refer to as machines A and B. Machine A uses a brand-new DNN, whereas machine B uses a DNN that has been trained previously to understand English. Now, suppose we train both A and B on identical sets of recorded Mandarin utterances, along with their transcriptions. What happens? Remarkably, machine B (the previously English-trained one) ends up with better Mandarin capabilities than machine A. In effect, the system’s prior training on English ends up transferring capabilities to the related task of understanding Mandarin.

But there is an even more astonishing outcome of this experiment. Machine B not only ends up better on Mandarin, but B’s ability to understand English is also improved! It seems that Willans and Goethe were onto something—learning a second language enables deeper learning about both languages, even for a machine.

The idea of transfer learning is still the subject of basic research, and as such, many fundamental questions remain open. For example, not all “transfers” are useful because, at a minimum, for transfer to work well, there appears to be a need for the learned tasks to be “related” in ways that still elude precise definition or scientific analysis. There are connections to related concepts in other fields, such as cognitive science and learning theory, still to be elucidated. And while it is intellectually dangerous for any computer scientist to engage in “anthropomorphizing” computer systems, we cannot avoid acknowledging that transfer learning creates a powerful, alluring analogy between learning in humans and machines; surely, if general artificial intelligence is ever to become real, transfer learning would seem likely to be one of the fundamental factors in its creation. For the more philosophically minded, formal models of transfer learning may contribute to new insights and taxonomies for knowledge and knowledge transfer.

There is also exceptionally high potential for applications of transfer learning. So much of the practical value of machine learning, for example in search and information retrieval, has traditionally focused on systems that learn from the massive datasets and people available on the World-Wide Web. But what can web-trained systems learn about smaller communities, organizations, or even individuals? Can we foresee a future where intelligent machines are able to learn useful tasks that are highly specialized to a specific individual or small organization? Transfer learning opens the possibility that all the intelligence of the web can form the foundation of machine-learned systems, from which more individualized intelligence is learned, through transfer learning. Achieving this would amount to another step towards the democratization of machine intelligence.