MITB Banner

How To Scrape, Summarize & Convert News Articles Into Text Files

In this post, we will discuss a very basic approach to scrap a news article on the web page and summarize it, along with a few more key information. We will also explore how we can save this scraped and summarized result into a text file. This can be saved for future study or for research purposes.

Share

News Scraping

There are a variety of news articles available online. While smaller articles are easier to read, longer ones are time-consuming, and hence, are often left unread. However, if there was a solution that could summarize that long-form article in a single paragraph with keywords, it would be easier to learn the context of that article quickly.

In this post, we will discuss a very basic approach to scrape a news article on the web page and summarize it, along with a few more key information. We will also explore how we can save this scraped and summarized result into a text file. This can be saved for future study or for research purposes.

It is expected that you have basic knowledge of web scraping and natural language processing (NLP). For more information, you may refer to the following articles:

  1. Guide to Web Scraping with Python Libraries Selenium and Beautifulsoup
  2. Natural Language Processing Vs Natural Language Understanding: What’s the Difference

The task discussed above is implemented in Python. Following is a step-by-step approach for this implementation.

1.For scraping and downloading contents from a news website, the newspaper library is required to be installed. You may use ‘pip install newspaper’ in command prompt or ‘conda install newspaper’ for installing in anaconda. Once installed, import the required libraries. Since the task requires several natural language processing steps, the nltk library will also be required. 

from newspaper import Article
import nltk

2. The punkt of nltk library is used to tokenize the sentences in order to be used for NLP. So we need to download punkt sentence tokenizer.

nltk.download('punkt')

3. Whichever the news article you want to scrap and summarize, pass its URL here.

url= 'https://timesofindia.indiatimes.com/business/india-business/rbi-reduces-repo-rate-rate-by-75-basis-points-to-4-4-key-points/articleshow/74840356.cms'

4. Set the language of the article which is to be scraped and summarized. Define an object for further use.

article = Article(url, language="en") # en for English 

5. Download, parse and perform NLP on the news article

article.download() 
article.parse()
article.nlp()

6. The article is now scraped and downloaded. We can print useful information on the console.

print("Article Title:") 
print(article.title) #prints the title of the article
print("\n")
print("Article Text:")
print(article.text) #prints the entire text of the article
print("\n")
print("Article Summary:")
print(article.summary) #prints the summary of the article
print("\n")
print("Article Keywords:")
print(article.keywords) #prints the keywords of the article

7. The above result can be written in a text file. The following lines of codes are used to write tt into a text file

file1=open("NewsFile.txt", "w+")
file1.write("Title:\n")
file1.write(article.title)
file1.write("\n\nArticle Text:\n")
file1.write(article.text)
file1.write("\n\nArticle Summary:\n")
file1.write(article.summary)
file1.write("\n\n\nArticle Keywords:\n")
keywords='\n'.join(article.keywords)
file1.write(keywords)
file1.close()

8. Finally, we will get the following result with the URL used in this example saved into a text file.

Title:
RBI rate cut: RBI reduces repo rate by 75 basis points to 4.4%: Key points


Article Text:
RBI governor Shaktikanta Das (File photo)


Here are key points from Das's announcements:

*

*

More on Covid-19

Download The Times of India News App for Latest Business News

Subscribe Start Your Daily Mornings with Times of India Newspaper! Order Now


NEW DELHI: RBI governor Shaktikanta Das on Friday announced a series of steps to boost liquidity in a stimulus worth 3.2% of GDP to counter the economic impact of the coronavirus outbreak.All lending institutions can allow three-month moratorium on EMI payments.Deferment on loan and interest repayments will not be classified as defaults and will not impact credit history of borrowers.* Policy repo rate has been reduced by 75 basis points from 5.15% to 4.4%.* Reverse repo rate reduced by 90 basis points to 4%.* Monetary Policy meet scheduled for March 31-April 3 was advanced to March 25-27.* Monetary policy committee voted 4:2 majority to cut repo rate by 75 basis points.* Reverse repo rate cut more so that banks are incentivised to lend, RBI governor said.* Cash Reserve Ratio (CRR) of all banks have been reduced by 100 basis points to 3 per cent of net demand and time liabilities with effect from the fortnight beginning March 28 for a period of 1 year.* RBI to inject liquidity worth Rs 3.74 lakh crore into the system.* Banking system in India safe; deposits safe in private bank; public should not resort to panic withdrawal, Das said.* Monetary policy committee refrained from giving out growth, inflation outlook for coming fiscal on uncertain outlook.* India has locked down economic activity and financial markets are under severe stress.* Global slowdown can deepen with adverse implications for the country, Das said.* Slump in crude oil prices upside for India; foodgrain prices may soften further on back of record production, RBI governor said.* COVID-19 related volatility in stock market has impacted share prices of banks as well resulting in some panic withdrawal of deposits from a few private sector banks.* It would be fallacious to link share prices to the safety of deposits. Depositors of commercial banks including private sector banks need not worry on the safety of their funds, the RBI governor said.* RBI governor said all instruments -- conventional and unconventional -- are on table to support financial stability and revive growth.



Article Summary:
* Policy repo rate has been reduced by 75 basis points from 5.15% to 4.4%.
* Monetary policy committee voted 4:2 majority to cut repo rate by 75 basis points.
* Reverse repo rate cut more so that banks are incentivised to lend, RBI governor said.
Depositors of commercial banks including private sector banks need not worry on the safety of their funds, the RBI governor said.
* RBI governor said all instruments -- conventional and unconventional -- are on table to support financial stability and revive growth.


Article Keywords:
repo
basis
44
governor
das
key
cut
prices
banks
points
india
policy
reduces
75
rate
rbi

Share
Picture of Dr. Vaibhav Kumar

Dr. Vaibhav Kumar

Dr. Vaibhav Kumar is a seasoned data science professional with great exposure to machine learning and deep learning. He has good exposure to research, where he has published several research papers in reputed international journals and presented papers at reputed international conferences. He has worked across industry and academia and has led many research and development projects in AI and machine learning. Along with his current role, he has also been associated with many reputed research labs and universities where he contributes as visiting researcher and professor.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.