Computational linguistics

Computational linguistics is an interdisciplinary field concerned with the statistical or rule-based modeling of natural language from a computational perspective, as well as the study of appropriate computational approaches to linguistic issues.

Traditionally, computational linguistics has been performed by computer scientists who had specialized in the application of computers to the processing of a natural language . Today, computational linguists often work as members of interdisciplinary teams, which can include regular linguists, experts in the target language, and computer scientists. In general, computational linguistics draws upon the involvement of linguists , computer scientists , experts in artificial intelligence , mathematicians , logicians , philosophers , cognitive scientists , cognitive psychologists ,psycholinguists , anthropologists and neuroscientists , among others.

Computational linguistics has theoretical and applied components. Theoretical computational linguistics Focuses is derived in theoretical linguistics and cognitive science, computational linguistics and applied Focuses on the practical outcome of modeling human language use. [1]

The Association for Computational Linguistics defines computational linguistics as:

The scientific study of language from a computational perspective. Computational linguists are interested in providing computational models of various kinds of linguistic phenomena. [2]


Computational linguistics is often grouped within the field of artificial intelligence, but actually is present before the development of artificial intelligence. Computational linguistics originated with efforts in the United States in the 1950s to use computers to automatically translate texts from foreign languages, particularly Russian scientific journals, into English. [3] Since it was possible to make arithmetic calculations much faster and more accurate than humans, it was thought to be a matter of time. [4]Computational and quantitative methods are also used historically in the reconstruction of earlier forms of modern languages ​​and subgrouping. Earlier methods such as lexicostatistics and glottochronology have been proven to be premature and inaccurate. However, recent interdisciplinary studies, which have a long history of biological studies, especially gene mapping , have proved to produce more sophisticated analytical tools and more trustful results. [5]

Where machine translation has failed to yield accurate translations, it has been more widely accepted than has been assumed. Computational linguistics Was Born as the name of the new field of study Devoted to Developing algorithms and software for intelligently processing language data. The term “computational linguistics” itself was first coined by David Hays , founding member of both the Association for Computational Linguistics and the International Committee on Computational Linguistics . [6]When artificial intelligence came into existence in the 1960s, the field of computational linguistics became sub-division of artificial intelligence dealing with human-level understanding and production of natural languages. quote needed ]

In order to translate one language into Reviews another, It was Observed That One Had to Understand the grammar of Both languages, Including Both morphology (the grammar of word forms) and syntax (the grammar of sentence structure). In order to Understand syntax, one aussi Had to Understand the semantics and the lexicon (or ‘vocabulary’), and Even something of the pragmatics of language use. Thus, what started as an effort to translate between languages ​​and languages? [7]

Nowadays researches within the scope of computational linguistics is done at computational linguistics departments, [8] computational linguistics laboratories, [9] computer science departments, [10] and linguistics departments. [11] [12] Some research in the field of computational linguistics aims to create a human-machine interaction. Programs meant for human-machine communication are called conversational agents . [13]


Just as computational linguistics can be done by experts in a variety of fields and a wide assortment of departments. The following sections discuss the field of discourse: developmental linguistics, structural linguistics, linguistic production, and linguistic comprehension.

Developmental approaches

Language is a cognitive skill which develops throughout the life of an individual. This developmental process has been examined using a number of techniques, and a computational approach is one of them. Human language development does qui Provide Some constraints make it harder to apply a computational method to understanding it. For instance, during language acquisition , human children are largely exposed to positive evidence. [14] This means that the linguistic development of an individual, only evidence for what is correct, is not correct. This is insufficient information for a simple hypothesis testing procedure for information as a complex language, [15]and so provides certain boundaries for a computational approach to computer development and acquisition in an individual.

Attempts have been made to model the developmental process of language acquisition in children from a computational angle, leading to the other statistical grammars and connectionist models . [16] Work in this realm has also been proposed to explain the evolution of language through history. Using models, it has been shown that it can be used with a combination of simple input and incrementally. [17] This article is also available for the development of children. [17] Both findings were drawn because of the strength of the neural network which the project created.

The ability of children in the classroom has been modeled using robots [18] in order to test linguistic theories. Enabled to learn as children might, Was created a model based one year affordance model in qui entre mappings shares perceptions and effects Were created and linked to spoken words. Crucially, these robots have been able to acquire the meaning of a grammatical structure, which makes it much easier to learn the language of learning. It is important that this information could be empirically tested using a computational approach.

As is the understanding of the linguistic development of an individual within a lifetime is continually improved using neural networks and learning robotic systems, it is also important to keep in mind that they change and develop their time. Computational approaches to understanding this phenomenon have a very interesting interest. Using the Price Equation and Polya urn dynamics, researchers have created a system which not only predicts future linguistic evolution, but also gives insight into the evolutionary history of modern-day languages. [19] This modeling effort achieved, through computational linguistics, what would otherwise have been impossible.

It is clear that the understanding of linguistic development in humans has been progressively improved because of advances in computational linguistics. The ability to model and modify systems would be an appropriate method of testing hypotheses that would otherwise be intractable.

Structural approaches

In order to create better computational models of language, an understanding of language is crucial. To this end, the English language has been meticulously studied using a structural structural approach. One of the most important pieces of language is available, or samples. This paper is intended to provide greater understanding of the subject of data acquisition and better understanding of the underlying structures. One of the most cited English linguistic corpora is the Penn Treebank . [20]This corpus contains over 4.5 million words of American English. This corpus has-been annotated Primarily using part-of-speech tagging and syntactic bracketing and yielded substantial businesses HAS empirical observations related to language structure. [21]

Theoretical approaches to the structure of languages ​​have also been developed. These works allow computational linguistics to have a framework in which to work out hypotheses that will further the understanding of the language in a myriad of ways. One of the original theoretical theses on internalization of grammar and structure of language. [15] In these models, rules or patterns, increase in strength with the frequency of their encounter. [15]The work also has a question for computational linguists to answer: how does an infant learn a specific and non-normal grammar? ( Chomsky Normal Form ) [15]Theoretical efforts like these set the direction for research in the field of study, and are crucial to the growth of the field.

Structural information about languages ​​for the discovery and implementation of similarity between peers of text utterances. [22] For instance, It has-been recently proven That based on the structural information present in patterns of human discourse, conceptual recurrence pads can be used to model and visualize trends in data and create reliable Measures of similarity between natural textual utterances. [22] This is a strong tool for further probing the structure of human discourse . Without the computational approach to this issue, the vastly complex information would have remained inaccessible to scientists.

Regarding the structural data of a language information is available for English as well as other languages, Such As Japanese . [23] Using computational methods, Japanese sentence corpora were analyzed and a pattern of log-normality was found in relation to sentence length. [23]The exact cause of this lognormality remains unknown, it is precisely this issue of intriguing information which is designed to uncover. This information could lead to more important discoveries regarding the underlying structure of Japanese, and could have any number of effects of the understanding of Japanese as a language. Computational Linguistics allows for a very good knowledge of the scientific knowledge base.

Without a computational approach to the structure of linguistic data, it would be much easier to read the article. Computational linguistics allows scientists to share large and small volumes of data reliably and efficiently, creating the possibility for discoveries.

Production approaches

This section may contain original research . Please improve it by verifying the claims made and adding inline quotes . Statements consisting of only original research should be removed. (October 2015) ( Learn how to remove this template message )

The production of language is equally as complex in the information it provides and the necessary skills which a fluent producer must have. That is to say, comprehension is only half the problem of communication. The other half is how a system produces language, and computational linguistics has made some very interesting discoveries in this area.

Alan Turing: computer scientist and developer of the Turing Test as a method of measuring the intelligence of a machine.

In a now famous paper published in 1950 Alan Turing Proposed the possibility That machine might one day avez la Ability to “think”. As an experiment in the field of human intelligence, he proposed an “imitation test” in which a human subject has two text-only conversations, one with a fellow human and another with a machine attempting to respond to a human . Turing proposes that the subject can not tell the difference between the human and the machine, it can be concluded that the machine is capable of thought. [24] Today this test is known to the Turing test and it remains an influential idea in the area of ​​artificial intelligence.

Joseph Weizenbaum : MIT professor and computer scientist who developed ELIZA , a primitive computer program utilizing natural language processing .

One of the Earliest Known and best examples of a computer program designed to converse naturally with humans is the ELIZA program developed by Joseph Weizenbaum at MIT in 1966. The program has emulated Rogerian psychotherapistwhen responding to written statements and questions posed by a user. It appeared capable of understanding what it was said to be intelligently, but in fact simply followed by a pattern matching routine that is only a few words in each sentence. Its responses have been generated by the translation of the known parts of the world. For example, in the phrase “It seems that you hate me” ELIZA understands “you” and “me” which matches the general pattern “you [some words] me”, allowing ELIZA to update the words “you” and “me” to “I” and “you” and replying “What makes you think I hate you?”. In this example ELIZA has no understanding of the word “hate”,[25]

Some projects are still trying to solve the problem which first started computational linguistics as its own field in the first place. However, the methods have become more sophisticated and more likely to be generated by computational linguists have become more enlightening. In a effort to improve computer translation, several models have been compared, including hidden Markov models , smoothing techniques, and the specific refinements of those to apply them to verb translation. [26] The model which was found to produce the most natural translations of German and Frenchwords was a refined alignment model with a first-order dependence and a fertility model [16]. They also provide efficient training algorithms for the models presented, which can give other scientists the ability to improve further on their results. This type of work is specific to computational linguistics, and has applications which can be made more comprehensively.

Work has also been done in making computers in a more natural way. These algorithms have been constructed in a manner that allows them to adapt to a system of production based on a factor of their own . [27] This work takes a computational approach via parameter estimation models to categorize the vast array of linguistic styles we see across Individuals and simplify it for a computer to work in the Same Way, making human-computer interaction much more natural.

Text-based interactive approach

Many of the earliest and simplest models of human-computer interaction, such as ELIZA for example, involve a text-based input from the user to generate a response from the computer. By this method, words typed by a user trigger the computer to recognize specific patterns and reply accordingly, through a process known as keyword spotting .

Speech-based interactive approach

Recent technologies have been more of an emphasis on speech-based interactive systems. These systems, such as Siri of the iOS operating system, operate on a similar pattern-recognizing technique as that of text-based systems, but with the training, the user input is conducted through speech recognition . This branch of linguistics involves the processing of the user’s speech as sound waves and the interpretation of the acoustics and language patterns in order for the computer to recognize the input. [28]

Comprehension approaches

Much of the focus of modern computational linguistics is on comprehension. With the proliferation of the Internet and the abundance of easily accessible human language, the ability to create a program of understanding human language, and many other opportunities, including automated search services, automated customer service, and online education.

Early work in comprehension included applying Bayesian statistics to the task of optical character recognition, as illustrated by Bledsoe and Browing in 1959 in which a large dictionary of possible letters were generated by “learning” from learned examples matched the final input. [29] Other attempts at applying Bayesian statistics to the work of Mosteller and Wallace (1963) in which an analysis of the words used in the Federalist Papers was used to determine their authorship (concluding that Madison most likely authored the majority of the papers). [30]

In 1971 Terry Winograd developed an early natural language processing engine. The primary language in this project was called SHRDLU , which was capable of carrying out a somewhat natural conversation with the user giving it commands, but only within the scope of the toy environment designed for the task. This environment is of different shapes and colors, and it is capable of interpreting controls such as “Finding a way to go about it.” and asking questions such as “I do not understand which pyramid you mean.” in response to the user’s input. [31]Whereas impressive, this kind of natural language processing has been much more difficult than the limited scope of the toy environment. Similarly a project developed by NASA called Expired LUNAR Was designed to Provide answers to issues naturally written about the geological analysis of lunar rocks returned by the Apollo missions. [32] These types of problems are referred to as answering .

Initial attempts to understand language were based on the work of the 1960s and 1970s in the field of signaling. An original and successful approach to Somewhat Applying this kind of signal modeling to language Was Achieved with the use of hidden Markov models as detailed by Rabiner in 1989. [33] This approach Attempts to determine probabilities for the arbitrary number of models being white That Could be used in the production of these models. Similar approaches Were employed in early speech recognition Attempts starting in the late 70s at IBM using word / part-of-speech probabilities together.[34]

More recently these types of statistics have been applied to the problem of identification. [35]


Modern computational linguistics is often a combination of studies in computer science and programming, mathematics, other statistics, language structures, and natural language processing. Combined, these fields are more often lead to the development of systems that Examples include speech recognition software, such as Apple ‘s Siri feature, spellcheck tools, speech synthesis programs, which are often used in the context of translation, and the translation into English. [36]

Computational linguistics can be particularly helpful in situations involving social media and the Internet . For example, filters in chatrooms or on website searches require computational linguistics. Chat operators often use filters to identify certain words or phrases and deem them that they can not submit them. [36] Another example of using filters is on websites. Schools use filters so that websites with certain keywords are blocked from children to view. There are aussi Many programs in qui Parents use Parental controls to put filters in place happy. Computational linguists can aussi That Develop programs and organizes group happy through Social media mining. An example of this is Twitter, in which programs can be tweeted by subject or keywords. [37] Computational linguistics is also used for document retrieval and clustering. When you do an online search, documents and websites are retrieved based on the frequency of a single search engine. For instance, if you search “red, large, four-wheeled vehicle,” with the intention of finding pictures of a truck, the search engine will still find the information desired by matching words such as “four-wheeled” with “car “. [38]


Computational linguistics may be divided into major areas depending on the medium of the language being processed, possibly spoken or textual; and upon the task being performed, or language analysis (recognition) or synthesizing language (generation).

Speech recognition and speech synthesis with speech can be understood or created using computers. Parsing and generation are sub-divisions of computational linguistics. Machine translation remains the sub-division of computational linguistics dealing with computers. The possibility of automatic translation, however, has yet to be realized and is notoriously hard branch of computational linguistics. [39]

Some of the areas of research that are studied by computational linguistics include:

  • Computational complexity of natural language, largely modeled on automata theory , with the application of context-sensitive grammar and linearly bounded Turing machines .
  • Computational semantics including defining suitable logics for linguistic representation, automatically constructing them and reasoning with them
  • Computer-aided corpus linguistics , which has been used since the 1970s in the field of discourse analysis [40]
  • Design of parsers or chunkers for natural languages
  • Design of taggers like POS-taggers (part-of-speech taggers)
  • Machine translation as one of the earliest and most difficult applications of computational linguistics draws on many subfields.
  • Simulation and study of language evolution in historical linguistics / glottochronology .


The subject of computational linguistics has had recurring impact on popular culture:

  • The 1983 WarGames movie features a young computer hacker who interacts with an artificially intelligent supercomputer. [41]
  • A 1997 film, Conceiving Ada , focuses on Ada Lovelace , considered one of the first computer scientists, as well as themes of computational linguistics. [42]
  • Her , a 2013 movie depicts a man’s interactions with the “world’s first artificially smart operating system.” [43]
  • The 2014 film The Imitation Game follows the life of computer scientist Alan Turing, developer of the Turing Test. [44]
  • The 2015 Ex Machina film centers around human interaction with artificial intelligence. [45]