Though Natural Language Processing today is mostly associated with the bleeding edge of Computer Science, the concept of NLP in reality is over 60 years old. Let us discuss a little regarding the same before answering your question to What is Natural Language Processing?
Since the 1950s, the field of Computer Science has developed in numerous radical strides. Said development, in turn, been one of the key contributing success factors for many incredible technologies whose benefits we enjoy today, such as satellites, weather monitoring systems, the internet, etc. What has also exponentially increased since then, is the computing power that can be fit into a given amount of space. Today, little chips in our handheld devices, barely ever consuming over 4-5 watts, is capable of executing millions of instructions per second.
A direct result of such vast computing power is the rapid rise in the scope and complexity of the techniques applied in computer science. Today, with the advent of Artificial Intelligence and Machine Learning, machines have been built, capable of outmatching a human in various workloads, traditionally believed to have been better suited for humans.
Computer scientists today are working on incorporating the understanding of human languages into machines. This is so that there might come a day in the future that human-human communication and human-machine communication would become indistinguishable from each other. The sub-field of linguistics, computer science and artificial intelligence that is concerned with enabling the aforementioned is called Natural Language Processing. However, before we proceed with delving deeper into the topic, here is a little information on how it all began.
HISTORY OF NATURAL LANGUAGE PROCESSING
Natural Language Processing has its roots in the 1950s. In 1950, Alan Turing published an article titled “Computing Machinery and Intelligence” which proposed what is now called the Turing test as a criterion of intelligence. It involves a task that tests the automated interpretation and generation of natural language.
Initially, Natural Language Processing was performed using a predefined, complex, handwritten set of rules to be followed by the machine to understand the data it is confronted with. This technique came to be known as Symbolic NLP and was prevalent from the 1950s upto the early 90s.
During the 80s a revolutionary new type of learning started to emerge, which was based on Machine Learning algorithms and came to be known as Statistical NLP. It proved to be far more manageable and produced many early successes.
Today, Natural Language Processing is performed using representational learning and deep neural networks. They have been proved to deliver state-of-the-art results in many Natural Language tasks such as language modelling, parsing, etc. This method of NLP is known as Neural NLP.
WHY IS NLP IMPORTANT?
Imagine you are employed by a firm to analyze the public sentiment regarding their new product, i.e. if the majority of the customers like or dislike the same. You might want to start with scouring through people’s opinions in social media sites like Facebook, Instagram or Twitter, watch reviews of the product online, read articles, etc. It starts becoming really tedious once you factor that there might be millions of customers around the world with individual opinions as well as hundreds of reviews and articles, for you to go through one by one.
A machine, on the other hand, does not fatigue and can, given the proper tools, read, interpret and analyze sentiment from a piece of text, in a concise amount of time. It is also inherently free of human bias in its analysis. Considering the staggering amount of natural language data generated online every day, it proves crucial to automate the process to comprehensively analyze text and speech data efficiently.
Human language is also vast and too complicated. We have seemingly infinite ways of expressing ourselves, both in words and in writing. Furthermore, there are hundreds of languages and dialects. Each language has its own rules of speaking, writing, grammar, terms and slangs, as well as various regional accents. We also mumble, stutter and borrow words from other languages. NLP helps remove ambiguity to a language and introduces useful numeric structure to the data, which is very helpful for other applications such as speech recognition and text analytics.
WHAT IS NATURAL LANGUAGE PROCESSING USED FOR?
The various tasks where NLP is used can be broadly categorized into the following. Text and Speech Processing, Morphological Analysis, Syntactic Analysis, Lexical Semantics, Relational Semantics, Discourse and other Advanced NLP Tasks. Let’s go through them one by one.
Text and Speech Processing:
Text and Speech processing include functions such as,
- Optical Character Recognition: extracting the text from an image containing said text.
- Speech Recognition and Segmentation: converting words from a sound clip into text.
- Text-to-speech: converting words from text into their spoken representation.
Morphological Analysis
This type of analysis consists of tasks such as returning the base dictionary form of a word, separating words into their smallest units called morphemes. The difference between the two being, a word and a morpheme is that a word is complete on its own, which might not always be the case for a morpheme. It might also be used for determining how a term is used in a sentence, i.e. as a noun or verb, etc.
Syntactic Analysis
The syntactic analysis deals mostly with grammar and sentence forming or analysis, such as generating grammar that describes a language’s syntax, determining where sentences end and analyzing the grammar of a sentence.
Lexical Semantics
Lexical semantics helps in understanding the computational meaning of words in a sentence or in gathering information regarding semantic representations from data. It is also used for a function called Named Entity Recognition (NER). NER fundamentally means that the machine can draw correlations from named real-world entities like places, people, organizations, etc. Lexical Semantics can also be used for analyzing the sentiment from a text, extracting relevant terms. It might also be used for disambiguating words which might have more than one meaning to best suit the context where it is used.
Relational Semantics
Relational semantics is used to derive relational meaning out of the text, such as among names entities. E.g. Who possesses what, etc. Often it is performed through a combination of more straightforward NLP tasks and relations are realized in the form of graphs or per logical formalism.
Discourse
What is meant by discourse is the understanding of how utterances might stick together to form something meaningful. For example, if we take sentences from random pages of a book, the chunk of text formed from the conjugation of said sentences would probably be non-sensical, or incoherent. A coherent discourse must have the following characteristics:
- Coherent relations between utterances
- Relationships between entities
Advanced NLP Applications
Advanced NLP Applications include the following:
- Automatic Summarization: Automatically producing a readable summary from a chunk of text.
- Dialogue Management: Computer systems having the capability to converse with a human.
- Machine Translation: Automatically translating text from one human language to another.
- Natural Language Generation (NLG):Converting information from computer databases into human-readable languages.
- Natural Language Understanding (NLU), e. “the comprehension by computers of the structure and meaning of human language (e.g., English, Spanish, Japanese), allowing users to interact with the computer using natural sentences” – Gartner
- Question Answering: A computer that can automatically determine an answer based on a natural language question.
CONCLUSION
Since the 50s, Natural Language Processing has come a long way in terms of the complexity of problems that it can handle and the variation in the tasks it can perform. It still holds great promise in the field of cognitive and AI applications. Said potential is being increasingly experimented with in the form of a sub-field of Natural Language Processing called Natural Language Understanding (NLU). NLU goes beyond the structural understanding of language. It might be used to derive a conveyed intent, resolve context and ambiguity and perhaps generate well-formed natural language on its own.