Automation is the process of making a system operate automatically without human intervention. As the days pass by and life gets busier each day, automation and automated systems intrigue the commune more. By applying automation, unsafe and repetitive tasks could be made self-sufficient, saving two of the most quintessential and precious things in a human’s life, Time and Money. With many tasks these days being labour-intensive and time-consuming, the creation of automated systems has improved efficiency and led to greater quality control. Although Automation may or may not be completely based on Artificial Intelligence, with the rise of automation and artificial intelligence simultaneously in the last decade, the use of automation collaborating with artificial intelligence might just be the next big thing to ponder upon. One of the most breakthrough discoveries in recent times for automation using artificial intelligence is AI Natural Language Generation.
What is Natural Language Generation?
Natural Language Generation, also known as NLG, uses artificial intelligence to produce written or spoken text content. It is a subsidiary of artificial intelligence and is a process that automatically transforms input data into plain-English content. The fascinating thing about NLG is that the technology can help tell a story using human-like creativity and intelligence, writing long sentences and paragraphs for you. Some of the uses of NLG are to generate product or service descriptions, content curation, creating portfolio summaries, or being used in customer communications through certain implementations in chatbots. Natural-language generation can be a bit complicated and require layers of language knowledge to work. These days, NLG is being integrated into tools to help with content strategy quickly, hence increasing productivity.
About Hugging Face
Hugging Face is an NLP focused startup that shares a large open-source community and provides an open-source library for Natural Language Processing. Their core mode of operation for natural language processing revolves around the use of Transformers. This python based library exposes an API to use many well-known architectures that help obtain the state of the art results for various NLP tasks like text classification, information extraction, question answering, and text generation. All the architectures provided come with a set of pre-trained weights utilizing deep learning that help with ease of operation for such tasks. These transformer models come in different shape and size architectures and have their ways of accepting input data tokenization. A tokenizer takes an input word and encodes the word into a number, thus allowing faster processing.
Getting Started with Creating a Paragraph Auto Generator
This article will try to implement a natural language generator that generates paragraphs from a single line of input text. For that, we will first set up all our dependencies using Hugging Face transformers for Natural Language Processing, then load our GPT2 model. This pre-trained model generates coherent paragraphs of text, encodes our input, and decodes our output to generate a paragraph.
So let’s get started with it!
The following code implementation is inspired by the official implementation, whose video tutorial you can find here.
Installing our Libraries
The first step would be to install our dependent libraries for this model. To do this, we will first install the Hugging Face Transformers. You can install Transformers by using the following command :
!pip install transformers #install the library from hugging face
Next, we will import our GPT2 Model and Tokenizer, collaborating with Tensorflow.
import tensorflow as tf from transformers import GPT2LMHeadModel, GPT2Tokenizer #importing the main model and tokenizer
We will first encode the input sentence into tokens using the tokenizer, then generate a new sequence of tokens from the GPT-2 model and then decode the generated tokens into a sequence of words using the tokenizer again, which will provide us with our output.
Loading our Model
Create a new variable for the tokenizer and passing it through the GPT parameter.
tokenizer = GPT2Tokenizer.from_pretrained("gpt2-large")#using the large parameter from GPT to generate larger texts
Instantiate the pre-trained model and padding with the tokenizer.
Testing the model by tokenizing our First sentence
Now that the model has been created, we will test it by providing our first input sentence to tokenize.
sentence = 'You will always succeed in Life' #input sentence
Encode it into a sequence of numbers and return them as PyTorch tensors.
input_ids = tokenizer.encode(sentence, return_tensors='pt')#using pt to return as pytorch tensors
Checking current progress
input_ids # checking the tesors returned
We will get the following output as number representation,
tensor([[1639, 481, 1464, 6758, 287, 5155]])
Decoding the text and Generating the Output
Creating a new variable called output to decode and setting our hyperparameters,
output = model.generate(input_ids, max_length=50, num_beams=5, no_repeat_ngram_size=2, early_stopping=True)
With this line, we have called the input and set the maximum length of the paragraph to be generated as 50 words. We are also using a beam search technique to find the most appropriate word to be generated from the input sentence. We have also set no-repeat ngram as 2, which will prevent our model from repeating similar words more than twice and early stopping as true so that when the model does not find appropriate words, it stops the generation process.
Printing our results :
print(tokenizer.decode(output, skip_special_tokens=True))#printing results
We got the following output:
You will always succeed in Life, but you will never be successful in Death." "I am not afraid of death, because I know that I am going to be with you when you die. I will be waiting for you, and I.
Cross validating our Model
We can also do the same and tune our hyperparameters to generate larger paragraphs with a new sentence. Beware this may take a longer time to generate output.
sentence = 'Artificial intelligence is the key' input_ids = tokenizer.encode(sentence, return_tensors='pt') output = model.generate(input_ids, max_length=500, num_beams=5, no_repeat_ngram_size=2, early_stopping=True) #setting length as 500 to generate larger output text print(tokenizer.decode(output, skip_special_tokens=True))
We will get the following as output :
Artificial intelligence is the key to unlocking the mysteries of the universe, but it's also the source of a lot of our problems. In a new paper published in the journal Science Advances, a team of researchers from the University of California, Berkeley, and the National Institute of Standards and Technology (NIST) in Gaithersburg, Maryland, describes a way to create an artificial intelligence (AI) system that can learn from its mistakes and improve its performance over time. The system, which they call a "neural network," is capable of learning to recognize patterns in images, recognize objects in a video, or even learn how to play a musical instrument. In the paper, the researchers describe how they created the neural network and how it can be used to train an AI system to perform a variety of tasks, such as recognizing objects and playing musical instruments. Neural networks, also known as deep neural networks or deep learning, are a type of machine learning algorithm that is based on the idea that a network of neurons is like a computer's processor. Each neuron is connected to a number of other neurons to form a larger network. When a neuron receives an input, it sends a signal to the next neuron, who in turn sends an output to another neuron. This process continues until all the neurons have received the input and have processed it. As a result, each neuron has its own unique set of inputs and outputs, making it possible for the network to learn and adapt to changes in its environment. Neural networks have been used for decades to solve a wide range of problems,including image recognition, speech recognition and natural language processing. However, they have also been criticized for their poor performance when it comes to learning from their own mistakes. For example, in 2013, researchers at Google's DeepMind AI research lab published a paper in Nature that showed that they were unable to improve the performance of their network when they made a series of mistakes while training it on images of human faces. They also found that the system was not able to distinguish between a human face and a dog face, even though the images were similar in terms of size and shape. These problems have led some researchers to argue that neural nets are not as effective as they are made out to be. But the new study suggests that this is not necessarily the case. "We show that it is possible to build a neural net that learns from mistakes," said lead author and UC Berkeley professor of electrical engineering and computer.
We can clearly notice the difference through hyperparameter tuning this time!
You can save it as a text file using the following lines of code :
text = tokenizer.decode(output,skip_special_tokens = True) with open('AIBLOG.txt','w') as f: f.write(text)
We have now learned how to create a model to generate long lines of text from a single sentence utilizing AI and Hugging Face Library by performing the following steps. You can tune the hyperparameters further to make the model more intelligent to provide better text content. The full Colab file for the following can be accessed from here.
Subscribe to our NewsletterGet the latest updates and relevant offers by sharing your email.
Victor is an aspiring Data Scientist & is a Master of Science in Data Science & Big Data Analytics. He is a Researcher, a Data Science Influencer and also an Ex-University Football Player. A keen learner of new developments in Data Science and Artificial Intelligence, he is committed to growing the Data Science community.