newsfetch.py / README.md /
c793a4a 7 years ago
2 contributor
33 lines | 0.957kb

newsfetch.py

Python scrapper to make 1 big HTML from NewsPaper RSS

Dependencies

sudo apt-get install libxml2-dev libxslt-dev
sudo pip install bs4 feedparser lxml slimmer

Usage

newsfetch.py -u <rss url> -o <output filename>

Default Parameters

  • url : http://www.lemonde.fr/rss/une.xml
  • output : default.html

How it works

The feed is parsed and a list of available article is created. The article content (i.e. Feed link) is fetch automatically and the content is extracted : - <article>...</article>

Examples

./newsfetch.py --url http://www.vice.com/fr/rss -o vice.fr.html
./newsfetch.py --url http://www.lemonde.fr/rss/une.xml -o lemonde.html
./newsfetch.py --url https://www.slate.fr/rss.xml -o slate.fr.html
./newsfetch.py --url http://www.lesinrocks.com/feeds/feed-a-la-une/ -o lesinrocks.html
./newsfetch.py --url http://www.numerama.com/rss/news.rss -o numerama.html