Ever since the evolution of Artificial Intelligence (AI), humankind had dreamt of machines that can converse in natural languages. With the suprise of Deep learning (DL) techniques, High-performance computers, and big data, that dream today appears to be closer than ever before. This article gives an overview of Natural Language Processing (NLP), its challenges, and the recent DL trends in NLP research.
What is NLP?
Natural language processing (NLP) is an interdisciplinary field of linguistics and computer science. The goal of NLP research is to design/develop efficient computational models to analyze, understand, and generate human language. Its application ranges from simple spell checkers in text editors to sophisticated chat-bots in virtual assistants like Alexa and Siri.
Why is it challenging?
Unlike programming languages that have fixed syntax and befitting semantic rules, human languages are inherently ambiguous. A thought can be expressed with different sets of words, arranged in multiple ways. A word can have different meanings in different contexts. A collection of words, when arranged differently, may convey the same or different thought. These challenges make natural language understanding (NLU), an exciting research topic. Over the past few decades, NLP has evolved from traditional rule-based systems to shallow machine learning models and further to Deep Neural Network models.
Deep Learning in NLP
The first step in any DL-NLP model is to convert the input text into a format that your machine learning algorithm can understand, which are vectors (or tensors). Word-embeddings are real-valued dense vectors used to represent word meaning. Word2Vec (by Google), GloVe (by Standford), and fastText (by Facebook AI) are some of the widely used word-embeddings. These word embedding are generated from the vast text corpus using shallow neural networks. The models, as well as their trained word vectors for nearly 2 million words in different languages, are freely available for download. They also have support on most DL libraries. Researchers around the globe have been trying to improve these word embeddings.
1. How to automatically deskew (straighten) a text image using OpenCV
2. Explanation of YOLO V4 a one stage detector
3. 5 Best Artificial Intelligence Online Courses for Beginners in 2020
4. A Non Mathematical guide to the mathematics behind Machine Learning
The next step is to combine these word vectors to get an abstract representation of larger text required for the task at hand. The model uses specialized deep neural network architectures, which are powerful enough to extract relevant features from these inputs and generate the desired output. Recursive Neural Networks (RNN), Long Short Term Memory Network (LSTM), Encoder-Decoder models, and Transformers are the favorite DL architectures of NLP researchers. Are there better ways to compose sentence vectors from these word embedding, is an open research problem.
In the past, NLP models were designed and trained for a single specific task and were good at only that particular task. Recently the paradigm has shifted to Transfer learning and pre-trained model. The idea behind transfer learning is to extensively train large neural networks on generalized language understanding tasks using large datasets. These pre-trained models have a general understanding of the language and can be fine-tuned for a wide variety of NLP tasks with little or no extra training. Some of the popular pre-trained models that stand out in 2020 are Google’s T5, BERT, XLNet, ALBERT, Facebook’s ROBERTa, OpenAI’s GPT-2, and Nvidia’s Megatron. Some of these models are publicly available and can be used off-the-shelf for many NLP applications. Pre-training allows researchers to work on models trained on datasets that are not accessible to the public or are computationally expensive to train.
Well-known NLP tasks used to train and evaluate a model’s language understanding include:
- Named Entity Recognition (NER): Which words in a sentence are a proper name, organization name, or entity?
- Recognizing Textual Entailment (RTE) / Natural Language Inference (NLI): Given two sentences, does the first sentence entail or contradict the second sentence?
- Coreference Resolution: Given a pronoun like “it” in a sentence that discusses multiple objects, to which object does “it” refer?
- Acceptability: Is the given sentence grammatically acceptable or not?
- Sentiment Analysis: Is the given review positive, negative, or neutral?
- Sentence similarity measure: How similar are the given two sentences in their meaning?
- Paraphrase Identification: Is sentence B a paraphrase of sentence A?
- Question NLI: Given a question-paragraph pair, does the paragraph contain the answer to the given question?
- Question Answering: Does the sentence B correctly answer question A?
NLP tasks like Machine Translation and Dialogue systems require not just language understanding but also generation. Common Sense Reasoning (CSR) is also one such task recently added to the DL-NLP benchmark. Another interesting recent research topic is summarizing programming language code in natural language text. It could be useful for automatic documentation of source codes. Refer GLUE or superGLUE benchmark for more challenges, best models, and free resources.
NLP also has applications in other domains like Biomedical text mining, Healthcare, Business, Recruitment, Defence and National security, Finance, and Education, to name a few.
Tips for beginners
For beginners in DL-NLP research, I did recommend Linear Algebra and Probability and Statistics courses as prerequisites. Basic knowledge of Machine Learning and a bit more detailed understanding of Deep Learning can give you a better start. It would be good to learn Pytorch as most of the source codes published in Github are in PyTorch. If your research does not involve building any special kind of neural network, Keras is your best option. TensorFlow and Theano are also attractive choices.
Note: The article was originally published in the 2020 Yearbook edition of Threads — the official newsletter of the Computer Science and Engineering Department of NIT Calicut
Credit: BecomingHuman By: Jeena KK