This is the Linux app named mlscraper whose latest release can be downloaded as mlscraperv1.0.0rc3sourcecode.zip. It can be run online in the free hosting provider OnWorks for workstations.
Download and run online this app named mlscraper with OnWorks for free.
Follow these instructions in order to run this app:
- 1. Downloaded this application in your PC.
- 2. Enter in our file manager https://www.onworks.net/myfiles.php?username=XXXXX with the username that you want.
- 3. Upload this application in such filemanager.
- 4. Start the OnWorks Linux online or Windows online emulator or MACOS online emulator from this website.
- 5. From the OnWorks Linux OS you have just started, goto our file manager https://www.onworks.net/myfiles.php?username=XXXXX with the username that you want.
- 6. Download the application, install it and run it.
SCREENSHOTS
Ad
mlscraper
DESCRIPTION
mlscraper is a Python library designed to automatically extract structured data from HTML pages without requiring developers to manually write CSS selectors or XPath rules. Instead of defining extraction logic by hand, users provide a few examples of the data they want to retrieve from a webpage. It analyzes those examples within the HTML document and determines patterns or rules that can be used to extract the same type of information from similar pages. Once trained, the generated scraper can process new pages and return the extracted data in structured formats such as dictionaries or lists. This approach simplifies web scraping tasks by shifting the focus from rule-writing to example-based training. Internally, the project processes HTML documents, identifies relevant elements in the DOM, and builds extraction logic based on statistical or heuristic analysis of the training samples. The result is a developer-oriented tool that aims to automate common scraping workflows.
Features
- Learns how to extract data from HTML pages using example outputs
- Automatically identifies relevant nodes within the HTML DOM
- Generates reusable scraping rules after a training phase
- Extracts structured data such as dictionaries, lists, or values
- Works with common HTML parsing libraries for document processing
- Designed for integration into Python-based data collection workflows
Programming Language
Python
Categories
This is an application that can also be fetched from https://sourceforge.net/projects/mlscraper.mirror/. It has been hosted in OnWorks in order to be run online in an easiest way from one of our free Operative Systems.