daniel_l_everett's picture
Linguistic Researcher; Dean of Arts and Sciences, Bentley University; Author, How Language Began

"We should really not be studying sentences; we should not be studying language — we should be studying people" Victor Yngve

Communication is the key to cooperation. Although cross-cultural communication for the masses requires translation techniques that exceed our current capabilities, the groundwork of this technology has already been laid and many of us will live to see a revolution in automatic translation that will change everything about cooperation and communication across the world.

This goal was conceived in the late 1940s in a famous memorandum by Rockefeller Foundation scientist, Warren Weaver, in which he suggested the possibility of machine translation and tied its likelihood to four proposals, still controversial today: that there was a common logic to languages; that there were likely to be language universals; that immediate context could be understood and linked to translation of individual sentences; and that cryptographic methods developed in World War II would apply to language translation. Weaver's proposals got off the ground financially in the early 1950s as the US military invested heavily in linguistics and machine translation across the US, with particular emphasis on the research of the team of Victor Yngve at the Massachusetts Institute of Technology's Research Laboratory of Electronics (a team that included the young Noam Chomsky).

Yngve, like Weaver, wanted to contribute to international understanding by applying the methods of the then incipient field that he helped found, computational linguistics, to communication, especially machine translation. Early innovators in this area also included Claude Shannon at Bell Labs and Yehoshua Bar-Hillel who preceded Yngve at MIT before returning to Israel. Shannon was arguably the inventor of the concept of information as an entity that could be scientifically studied and Bar-Hillel was the first person to work full-time on machine translation, beginning the program that Yngve inherited at MIT.

This project was challenged early on, however, by the work of Chomsky, from within Yngve's own lab. Chomsky's conclusions about different grammar types and their relative generative power convinced people that grammars of natural languages were not amenable to machine translation efforts as they were practiced at the time, leading to a slowdown in and reduction of enthusiasm for computationally-based translation.

As we have subsequently learned, however, the principal problem faced in machine-translation is not the formalization of grammar per se, but the inability of any formalization known, including Chomsky's, to integrate context and culture (semantics and pragmatics in particular) into a model of language appropriate for translation. Without this integration, mechanical translation from one language to another is not possible.

Still, mechanical procedures able to translate most contents from any source language into accurate, idiomatically natural constructions of any target language seem less utopian to us now because of major breakthroughs that have led to several programs in machine translation (e.g. the Language Technologies Institute at Carnegie Mellon University). I believe that we will see within our lifetime the convergence of developments in artificial intelligence, knowledge representation, statistical grammar theories, and an emerging field — computational anthropology (informatic-based analysis and modeling of cultural values) — that will facilitate powerful new forms of machine translation to match the dreams of early pioneers of computation.

The conceptual breakthroughs necessary for universal machine translation will also require contributions from Construction Grammars, which view language as a set of conventional signs (varieties of the idea that the building blocks of grammar are not rules or formal constraints, but conventional phrase and word forms that combine cultural values and grammatical principles), rather than a list of formal properties. They will have to look at differences in the encoding of language and culture across communities, rather than trying to find a 'universal grammar' that unites all languages.

At least some of the steps are easy enough to imagine. First, we come up with a standard format for writing statistically-based Construction Grammars of any language, a format that displays the connections between constructions, culture, and local context (such as the other likely words in the sentence or other likely sentences in the paragraph in which the construction appears). This format might be as simple as a flowchart or a list. Second, we develop a method for encoding context and values. For example, what are the values associated with words; what are the values associated with certain idioms; what are the values associated with the ways in which ideas are expressed? The latter can be seen in the notion of sentence complexity, for example, as in the Pirahã of the Amazon's (among others) rejection of recursive structures in syntax because they violate principles of information rate and new vs. old information in utterances that are very important in Pirahã culture. Third, we establish lists of cultural values and most common contexts and how these link to individual constructions. Automating the procedure for discovering or enumerating these links will take us to the threshold of automatic translation in the original sense.

Information and its exchange form the soul of human cultures. So just imagine the possible change in our perceptions of 'others' when we able to type in a story and have it automatically and idiomatically translated with 100% accuracy into any language for which we have a grammar of constructions. Imagine speaking into a microphone and having your words come out in the language of your audience, heard and understood naturally. Imagine anyone being able to take a course in any language from any university in the world over the internet or in person, without having to first learn the language of the instructor.

These will always be unreachable goals to some degree. It seems unlikely, for example, that all grammars and cultures are even capable of expressing everything from all languages. However, we are developing tools that will dramatically narrow the gaps and help us decide where and how we can communicate particular ideas cross-culturally. Success at machine translation might not end all the world's sociocultural or political tensions, but it won't hurt. One struggles to think of a greater contribution to world cooperation than progress to universal communication, enabling all and sundry to communicate with nearly all and sundry. Babel means 'the gate of god'. In the Bible it is about the origin of world competition and suspicion. As humans approached the entrance to divine power by means of their universal cooperation via universal communication, so the biblical story goes, language diversity was introduced to destroy our unity and deprive us of our full potential.

But automated, near-universal translation is coming. And it will change everything.