There’s a couple of things to note:
- There is a significant improvement in the cohesiveness and the structure of the output as the model is fine-tuned.
- After many iterations, you start to see that the model outputs <|startoftext|> and <|endoftext|> in the beginning and end of each output. This shows us that the model will be able to output distinct descriptions using the rules we discussed in the previous section.
After the model has finished training, the final weights are available in the “checkpoints” directory.
Now for the fun part; generating brand new movie/tv descriptions! First, here is the code:
Because of Max Woolf’s work, we are able to generate samples using one simple, elegant function. With that in mind, the best way to provide an explanation would be to dive into the function’s hyperparameters. All of their definitions can be found the gpt-2-simple GitHub page but I wanted to include a few key parameters here as well:
- run_name: This must be the same as the run_name in the finetune function. A folder with this name will be referenced so it is key to make sure this is correct.
- temperature: This determines how “creative” the output will be. The optimal range is from 0.7 to 1.2. The lower, the more tame and the higher, the more creative.
- nsamples: the number of outputs to generate.
- prefix: The text prompt that is used to generate new text. Since our training consistently starts with “<|startoftext|>”, the model will know to start generating a new content description after this prompt.
- include_prefix: When False, the output won’t include the prefix we’ve included above. You could use this or build a regex after.
- truncate: This forces the model to stop outputting after the truncate text is shown. When <|endoftext|> is outputted by the model, it will stop outputting results.
Now, let’s take a look at an example. I’ve taken a screenshot from my website, thismoviedoesnotexist.co, to showcase what GPT-2 can generate. In my opinion, I could actually see this being produced by Netflix!
This site is still up and running so I encourage you to explore and find more promising and/or ridiculous content.
Ultimately, I want this article to be another resource that lowers the barrier to entry into the cutting edge in NLP. With just a few lines of code, you can prepare your data, fine-tune a GPT-2 model and generate brand new content. For me, the natural next step would be to leverage some of the larger GPT-2 models to see how the results vary.
Now, this article focused on the Data Science portion of this project but I wanted to highlight that this was the first time I took a model and exposed it on my very own server/website. As a next step, I’ll be sharing similar instructions on how to do just that.