Name: news-please download for Linux
Brand: OnWorks
SKU: 2c187a598ca2c980421203b7575bab7d
Availability: OnlineOnly
Rating: 4.72 (2327 reviews)

This is the Linux app named news-please whose latest release can be downloaded as news-pleasesourcecode.zip. It can be run online in the free hosting provider OnWorks for workstations.

Download and run online this app named news-please with OnWorks for free.

Follow these instructions in order to run this app:

- 1. Downloaded this application in your PC.

- 2. Enter in our file manager https://www.onworks.net/myfiles.php?username=XXXXX with the username that you want.

- 3. Upload this application in such filemanager.

- 4. Start the OnWorks Linux online or Windows online emulator or MACOS online emulator from this website.

- 5. From the OnWorks Linux OS you have just started, goto our file manager https://www.onworks.net/myfiles.php?username=XXXXX with the username that you want.

- 6. Download the application, install it and run it.

Download App Run in Ubuntu Run in Fedora Run in Windows Sim Run in MACOS Sim

SCREENSHOTS

news-please

DESCRIPTION

news-please is an open source news crawler and information extraction tool designed to collect and structure articles from online news websites. It provides an integrated pipeline that crawls news sites, retrieves article pages, and extracts structured information such as headlines, authors, publication dates, and article text. news-please can recursively follow internal links and read RSS feeds to gather both recent and archived articles from a news outlet when given only the root URL of a site. It combines several established technologies and libraries to perform web crawling and content extraction, enabling reliable processing across a wide range of news sources. Developers can use the software either as a standalone command line application or integrate it into their own Python applications through its library interface. Extracted article data can be stored in different formats and systems, including JSON files or database-backed storage solutions.

Features

Crawls news websites and extracts structured article information
Recursively follows internal links and RSS feeds to discover articles
Extracts metadata such as headline, authors, language, images, and dates
Supports command line usage and integration as a Python library
Can retrieve and process large news archives from Common Crawl datasets
Stores extracted data in formats such as JSON or database backends

Programming Language

Python

news-please download for Linux

SCREENSHOTS

DESCRIPTION

Features

Programming Language

Categories