README.md

newsfetch.py

Python scrapper to make 1 big HTML from NewsPaper RSS

Dependencies

sudo apt-get install libxml2-dev libxslt-dev
sudo pip install bs4 feedparser lxml slimmer

Usage

newsfetch.py -u <rss url> -o <output filename>

Default Parameters

  • url : http://www.lemonde.fr/rss/une.xml
  • output : default.html

How it works

The feed is parsed and a list of available article is created. The article content (i.e. Feed link) is fetch automatically and the content is extracted : - <article>...</article>