NeMo Curator download for Windows

This is the Windows app named NeMo Curator whose latest release can be downloaded as NVIDIANeMoCurator0.9.0sourcecode.zip. It can be run online in the free hosting provider OnWorks for workstations.

 
 

Download and run online this app named NeMo Curator with OnWorks for free.

Follow these instructions in order to run this app:

- 1. Downloaded this application in your PC.

- 2. Enter in our file manager https://www.onworks.net/myfiles.php?username=XXXXX with the username that you want.

- 3. Upload this application in such filemanager.

- 4. Start any OS OnWorks online emulator from this website, but better Windows online emulator.

- 5. From the OnWorks Windows OS you have just started, goto our file manager https://www.onworks.net/myfiles.php?username=XXXXX with the username that you want.

- 6. Download the application and install it.

- 7. Download Wine from your Linux distributions software repositories. Once installed, you can then double-click the app to run them with Wine. You can also try PlayOnLinux, a fancy interface over Wine that will help you install popular Windows programs and games.

Wine is a way to run Windows software on Linux, but with no Windows required. Wine is an open-source Windows compatibility layer that can run Windows programs directly on any Linux desktop. Essentially, Wine is trying to re-implement enough of Windows from scratch so that it can run all those Windows applications without actually needing Windows.

SCREENSHOTS:


NeMo Curator


DESCRIPTION:

NeMo Curator is a Python library specifically designed for fast and scalable dataset preparation and curation for large language model (LLM) use-cases such as foundation model pretraining, domain-adaptive pretraining (DAPT), supervised fine-tuning (SFT) and paramter-efficient fine-tuning (PEFT). It greatly accelerates data curation by leveraging GPUs with Dask and RAPIDS, resulting in significant time savings. The library provides a customizable and modular interface, simplifying pipeline expansion and accelerating model convergence through the preparation of high-quality tokens. At the core of the NeMo Curator is the DocumentDataset which serves as the the main dataset class. It acts as a straightforward wrapper around a Dask DataFrame. The Python library offers easy-to-use methods for expanding the functionality of your curation pipeline while eliminating scalability concerns.



Features

  • Data download and text extraction
  • Language identification and separation with fastText and pycld2
  • Text reformatting and cleaning to fix unicode decoding errors via ftfy
  • Document-level deduplication
  • Multilingual heuristic-based filtering
  • Distributed data classification


Programming Language

Python


Categories

Large Language Models (LLM)

This is an application that can also be fetched from https://sourceforge.net/projects/nemo-curator.mirror/. It has been hosted in OnWorks in order to be run online in an easiest way from one of our free Operative Systems.



Latest Linux & Windows online programs


Categories to download Software & Programs for Windows & Linux