Skip to main content

GPT-3 Explained in Under 3 Minutes

 


So you've seen some incredible GPT-3 demos on Twitter (if you haven't, where have you been?). OpenAI's massive machine learning model is capable of writing its own op-eds, poetry, essays, and even working code:

To use GPT-3 right now, you must first seek to be whitelisted by OpenAI. However, the model's possibilities seemed to be limitless–you could presumably use it to query a SQL database in plain English, automatically comment code, automatically generate code, make hot article headlines, post viral Tweets, and much more.

But what exactly is going on behind the hood of this remarkable vehicle? Here's a (short) peek inside.

GPT-3 is a language model based on neural networks. A language model is a model that predicts how likely a statement is to appear in the real world. A language model can, for example, categorise the statement "I take my dog for a walk" as more likely to exist (i.e. on the Internet) than the statement "I take my banana for a stroll." This holds true for both sentences and phrases, as well as any sequence of letters in general.

GPT-3 is beautifully trained on an unlabeled text dataset, as do other language models (in this case, the training data includes among others Common Crawl and Wikipedia). Words or phrases are removed at random from the text, and the model must learn to fill in the gaps using only the context provided by the surrounding words. It's a straightforward training exercise that yields a powerful and generalizable model.

The GPT-3 model is a transformer-based neural network in and of itself. The prominent NLP model BERT and GPT-3's predecessor, GPT-2, are based on this architecture, which gained popularity roughly 2–3 years ago. GPT-3 isn't really innovative in terms of architecture! So, what makes it so enchanted?

IT'S VERY Huge. I'm talking about huge. It's the largest language model ever constructed, with 175 billion parameters (an order of magnitude more than its nearest competitor! ), and it was trained on the largest dataset of any language model. This appears to be the primary reason GPT-3 sounds so intelligent and human.

But now comes the truly wonderful part. GPT-3 can accomplish what no other model can (well): do certain jobs without any extra tuning thanks to its enormous size. You can ask GPT-3 to be a translator, a programmer, a poet, or a famous novelist, and it can do it with fewer than 10 training examples provided by the user (you). Damn.

This is why machine learning practitioners are so enthusiastic about GPT-3. Other language models, such as BERT, necessitate a lengthy fine-tuning process in which you collect thousands of samples of French-English sentence pairings in order to educate it how to translate. To adapt BERT to a given task (such as translation, summarization, or spam detection), you must first locate a big training dataset (on the range of thousands or tens of thousands of examples), which can be difficult or impossible depending on the task. You won't have to conduct any fine-tuning with GPT-3. This is the crux of the matter. People are thrilled about GPT-3 because it allows them to create new language problems without having to use any training data.

Today, GPT-3 is in private beta, you can get the API key by applying here: OpenAI GPT-3

Comments

Post a Comment

Popular posts from this blog

Types of Machine Learning problems

In the previous blog, we had discussed brief about What is Machine Learning? In this blog, we are going to learn about the types of ML.  ML is broadly classified into four types: Supervised Learning Unsupervised Learning Semi-supervised Learning Reinforcement Learning 1. Supervised Learning Supervised learning is where there are input variables, say X and there are corresponding output variables, say Y. We use a particular algorithm to map a function from input(X) to output(Y). Mathematically, Y=f(X). Majority of the ML models use this type of learning to feed itself and learn. The goal of supervised learning is to approximate the said function so well that whenever we enter any new input, it's output is accurately predicted. Here, we can say that there is a teacher who guides the model if it generates incorrect results and hence, the machine will keep on learning until it performs to desired results. Supervised Learning can be further classified into: Classification : Here, the ou

Statistics in Data Science

Introduction Statistics is one of the popularly regarded disciplines this is particularly centered on records collection, records organization, records analysis, records interpretation and records visualization. Earlier, facts become practiced through statisticians, economists, enterprise proprietors to calculate and constitute applicable records of their field. Nowadays, facts have taken a pivotal position in diverse fields like records technology, system learning, records analyst position, enterprise intelligence analyst position, pc technology position, and plenty more. Statistics is a type of mathematical analysis that uses quantified models and representations to analyze a set of experimental data or real-world research. The fundamental benefit of statistics is that information is provided in an easy-to-understand style. Statistical & Non-Statistical Analysis Statistical analysis is used to better understand a wider population by analyzing data from a sample. Statistical analy