This is the Linux app named unfluff whose latest release can be downloaded as node-unfluffv3.2.0sourcecode.tar.gz. It can be run online in the free hosting provider OnWorks for workstations.
Download and run online this app named unfluff with OnWorks for free.
اتبع هذه التعليمات لتشغيل هذا التطبيق:
- 1. قم بتنزيل هذا التطبيق على جهاز الكمبيوتر الخاص بك.
- 2. أدخل في مدير الملفات الخاص بنا https://www.onworks.net/myfiles.php؟username=XXXXX باسم المستخدم الذي تريده.
- 3. تحميل هذا التطبيق في هذا الملف.
- 4. ابدأ تشغيل OnWorks Linux عبر الإنترنت أو محاكي Windows عبر الإنترنت أو محاكي MACOS عبر الإنترنت من هذا الموقع.
- 5. من نظام تشغيل OnWorks Linux الذي بدأته للتو ، انتقل إلى مدير الملفات الخاص بنا https://www.onworks.net/myfiles.php؟username=XXXXX مع اسم المستخدم الذي تريده.
- 6. قم بتنزيل التطبيق وتثبيته وتشغيله.
لقطات الشاشة:
غير منفوش
الوصف:
unfluff is a Node.js library designed to automatically extract the main content from an HTML document — stripping away navigation bars, ads, footers and other boilerplate to leave you with the “body content”, metadata (title, author, date) and other useful fields. It’s a tool very much aimed at content-analysis, web scraping, building datasets, or repurposing article text for downstream processing (like machine-learning or summarization). The API is simple: you feed in raw HTML and it returns a structured object with the extracted text and other fields. It supports caching internal representations to speed up repeated extractions. While its language support is best for English, it is still widely used in web-content-processing pipelines. The repository notes some limitations (e.g., languages like Chinese/Arabic/Korean may not be well-supported). Because of its simplicity and focused purpose, it can be a reliable building block in backend services or CLI tools.
شرح المميزات:
- Extracts main textual content (body) from an HTML document
- Parses and returns metadata (title, author, date, language detection etc)
- Caches intermediate representations for performance when extracting multiple fields
- CLI / module support: can be installed globally or used programmatically
- Suitable for building datasets, article-scraping, republishing workflows
- Open-source under Apache-2.0 license, easy to integrate in Node.js stacks
التصنيفات
This is an application that can also be fetched from https://sourceforge.net/projects/unfluff.mirror/. It has been hosted in OnWorks in order to be run online in an easiest way from one of our free Operative Systems.