This is the Linux app named unfluff whose latest release can be downloaded as node-unfluffv3.2.0sourcecode.tar.gz. It can be run online in the free hosting provider OnWorks for workstations.
Download and run online this app named unfluff with OnWorks for free.
Volg deze instructies om deze app uit te voeren:
- 1. Download deze applicatie op uw pc.
- 2. Voer in onze bestandsbeheerder https://www.onworks.net/myfiles.php?username=XXXXX in met de gebruikersnaam die u wilt.
- 3. Upload deze applicatie in zo'n bestandsbeheerder.
- 4. Start de OnWorks Linux online of Windows online emulator of MACOS online emulator vanaf deze website.
- 5. Ga vanuit het OnWorks Linux-besturingssysteem dat u zojuist hebt gestart naar onze bestandsbeheerder https://www.onworks.net/myfiles.php?username=XXXXX met de gewenste gebruikersnaam.
- 6. Download de applicatie, installeer hem en voer hem uit.
SCREENSHOTS
Ad
ontploffing
PRODUCTBESCHRIJVING
unfluff is a Node.js library designed to automatically extract the main content from an HTML document — stripping away navigation bars, ads, footers and other boilerplate to leave you with the “body content”, metadata (title, author, date) and other useful fields. It’s a tool very much aimed at content-analysis, web scraping, building datasets, or repurposing article text for downstream processing (like machine-learning or summarization). The API is simple: you feed in raw HTML and it returns a structured object with the extracted text and other fields. It supports caching internal representations to speed up repeated extractions. While its language support is best for English, it is still widely used in web-content-processing pipelines. The repository notes some limitations (e.g., languages like Chinese/Arabic/Korean may not be well-supported). Because of its simplicity and focused purpose, it can be a reliable building block in backend services or CLI tools.
Kenmerken
- Extracts main textual content (body) from an HTML document
- Parses and returns metadata (title, author, date, language detection etc)
- Caches intermediate representations for performance when extracting multiple fields
- CLI / module support: can be installed globally or used programmatically
- Suitable for building datasets, article-scraping, republishing workflows
- Open-source under Apache-2.0 license, easy to integrate in Node.js stacks
Categorieën
This is an application that can also be fetched from https://sourceforge.net/projects/unfluff.mirror/. It has been hosted in OnWorks in order to be run online in an easiest way from one of our free Operative Systems.
