Computational lexicology

Computational lexicology is a branch of computational linguistics , which is concerned with the use of computers in the study of lexicon . It has been more narrowly described by some scholars (Amsler, 1980) as the use of computers in the study of machine-readable dictionaries . It is distinguished from computational lexicography , which more properly would be the use of computers in the construction of dictionaries, but some researchers have used computational lexicography as synonymous .

History

Computational lexicology emerged as a separate discipline within the context of computer-readable dictionaries, starting with the creation of the machine-readable tapes of the Merriam-Webster Seventh Collegiate Dictionary and the Merriam-Webster New Pocket Dictionary in the 1960s by John Olney et al. at System Development Corporation . Today, computational lexicology is best known through the creation and applications of WordNet. Computational lexicology has been applied in the field of computer science. In 1987, among others, Byrd, Calzolari, and Chodorow have developed computational tools for text analysis. In particular the model is designed for coordinating the associations involving the senses of polysemous words. [1]

Study of lexicon

Computational lexicology has contributed to the understanding of the content and limitations of the computational application (ie it has been clarified that the preceding work is not sufficient for the needs of computational linguistics). Through the work of computational lexicologists

  1. what constitutes a headword – used to generate spelling correction lists;
  2. what variants and inflections the headword forms – used to empirically understand morphology;
  3. how the headword is delimited into syllables;
  4. how the headword is pronounced – used in speech generation systems;
  5. the parts of speech the headword takes on – used for POS taggers;
  6. any special subject or use codes assigned to the headword – used to identify text document subject matter;
  7. the headword’s definitions and their syntax – disambiguation of word in context;
  8. the etymology of the headword and its use to characterize vocabulary by languages ​​of origin – used to characterize text vocabulary
  9. the example sentences;
  10. the run-ons (additional words and multi-word expressions that are formed from the headword); and
  11. related words such as synonyms and antonyms .

Many computational linguists have been disenchanted with the literature as a resource for computational linguistics because they lacked sufficient syntactic and semantic information for computer programs. The work on computational lexicology quickly led to efforts in two additional directions.

Successors to Computational Lexicology

First, collaborative activities between computational linguists and lexicographers led to an understanding of the role that corpora played in creating dictionaries. Most computational lexicologists moved on to a large body of data that had been used to create dictionaries. The ACL / DCI (Data Collection Initiative) and the Linguistic Data Consortium (LDC ) went down this path. The advent of the language of the world could be more easily analyzed to create computational linguistic systems. Part-of-speech tagged corpora and semantically tagged corpora were created in order to test and develop POS taggers and word semantic disambiguation technology.

The second direction was towards the creation of Lexical Knowledge Bases (LKBs). A Lexical Knowledge Base is intended for computational linguistic purposes, especially for computational lexical semantic purposes. It was the same, but totally explained to the meanings of the words and the appropriate links between senses. Many began creating the resources they wished to have, if they had been created for use in computational analysis. WordNetCan be considered to be such a development, as can be the newer efforts at describing syntactic and semantic information such as the FrameNet work of Fillmore. Outside of computational linguistics, the ontology of artificial intelligence can be seen as an evolutionary effort to build a lexical knowledge base for AI applications.

Standardization

Optimizing the production, maintenance and extension of computational lexicons is one of the crucial aspects impacting NLP . The main problem is the interoperability : various lexicons are frequently incompatible. The most frequent situation is: how to merge two lexicons, or fragments of lexicons? NLP programs or applications is a NLP program.

To this respect, the various data models of Computational lexicons are studied by ISO / TC37 since 2003 within the framework of the standard lexical markup framework leading to an ISO standard in 2008.

References

  1. Jump up^ Byrd, Roy J., Nicoletta Calzolari, Martin S. Chodorow, Judith L. Klavans, Mary S. Neff, and Omneya A. Rizk. “Tools and methods for computational lexicology.” Computational Linguistics13, no. 3-4 (1987): 219-240.