The Generative Pre-Trained Transformer (GPT) is an innovation in the Natural Language Processing (NLP) space developed by OpenAI. These models are known to be the most advanced of its kind and can even be dangerous in the wrong hands. It is an unsupervised generative model which means that it takes an input such as a sentence and tries to generate an appropriate response, and the data used for its training is not labelled.
GPT-2 is an unsupervised deep learning transformer-based language model created by OpenAI back in February 2019 for the single purpose of predicting the next word(s) in a sentence. GPT-2 is an acronym for “Generative Pretrained Transformer 2”. The model is open source, and is trained on over 1.5 billion parameters in order to generate the next sequence of text for a given sentence. Thanks to the diversity of the dataset used in the training process, we can obtain adequate text generation for text from a variety of domains. GPT-2 is 10x the parameters and 10x the data of its predecessor GPT.
Language tasks such as reading, summarizing and translation can be learned by GPT-2 from raw text without using domain specific training data.
There are limitations that must be accounted for when dealing with natural language generation. This is an active area of research, but the field is too much into its infancy to be able to overcome its limitations just yet. Limitations include repetitive text, misunderstanding of highly technical and specialized topics and misunderstanding contextual phrases.
Language and linguistics are a complex and vast domain that typically requires a human being to undergo years of training and exposure to understand not only the meaning of words but also how to form sentences and give answers that are contextually meaningful and to use appropriate slang. This is also an opportunity to create customized and scalable models for different domains. An example provided by OpenAI is to train GPT-2 using the Amazon Reviews dataset to teach the model to write reviews conditioned on things like star rating and category.
1. Write Your First AI Project in 15 Minutes
2. Generating neural speech synthesis voice acting using xVASynth
3. Top 5 Artificial Intelligence (AI) Trends for 2021
4. Why You’re Using Spotify Wrong
Simply put, GPT-3 is the “Generative Pre-Trained Transformer” that is the 3rd version release and the upgraded version of GPT-2. Version 3 takes the GPT model to a whole new level as it’s trained on a whopping 175 billion parameters (which is over 10x the size of its predecessor, GPT-2). GPT-3 was trained on an open source dataset called “Common Crawl”, and other texts from OpenAI such as Wikipedia entries.
GPT-3 was created to be more robust than GPT-2 in that it is capable of handling more niche topics. GPT-2 was known to have poor performance when given tasks in specialized areas such as music and storytelling. GPT-3 can now go further with tasks such as answering questions, writing essays, text summarization, language translation, and generating computer code. The ability for it to be able to generate computer code is already a major feat unto itself. You can view some GPT-3 examples here.
For a long time, many programmers have been worried at the thought of being replaced with artificial intelligence and now that looks to be turning into reality. As deepfake videos gain traction, so too is speech and text driven by AI to mimic people. Soon it may be difficult to determine if you’re talking to a real person or an AI when speaking on the phone or commuincating on the Internet (for example, chat applications).
While it remains a language prediction model, a more precise description could be it is a sequential text prediction model. The algorithmic structure of GPT-3 has been known to be the most advanced of its kind thanks to the vast amount of data used to pre-train it. To generate sentences after taking an input, GPT-3 uses the field of semantics to understand the meaning of language and try to output a meaningful sentence for the user. The model does not learn what is correct or incorrect as it does not use labelled data; it is a form of unsupervised learning.
These models are gaining more notoriety and traction due to their ability to automate many language-based tasks such as when a customer is communicating with the company using a chatbot. GPT-3 is currently in a private beta testing phase which means that people must sign on to a waitlist if they wish to use the model. It is offered as an API accessible through the cloud. At the moment, the models seem to be only feasible in the hands of individuals/companies with the resources to run the GPT models.
Credit: BecomingHuman By: James Montantes