This is the Linux app named KSUCCA Corpus whose latest release can be downloaded as Corpus.zip. It can be run online in the free hosting provider OnWorks for workstations.
Download and run online this app named KSUCCA Corpus with OnWorks for free.
Follow these instructions in order to run this app:
- 1. Downloaded this application in your PC.
- 2. Enter in our file manager https://www.onworks.net/myfiles.php?username=XXXXX with the username that you want.
- 3. Upload this application in such filemanager.
- 4. Start the OnWorks Linux online or Windows online emulator or MACOS online emulator from this website.
- 5. From the OnWorks Linux OS you have just started, goto our file manager https://www.onworks.net/myfiles.php?username=XXXXX with the username that you want.
- 6. Download the application, install it and run it.
SCREENSHOTS:
KSUCCA Corpus
DESCRIPTION:
King Saud University Corpus of Classical Arabic (KSUCCA) is a pioneering 50 million tokens annotated corpus of Classical Arabic texts from the period of pre-Islamic era until the fourth Hijri century (equivalent to the period from the seventh until early eleventh century CE), which is the period of pure classical Arabic. The main aim of this corpus is to be used for studying the distributional lexical semantics of The Quran words. However, it can be used for other research purposes, such as:
•	Arabic linguistics, which includes: lexical, morphological, syntactic, semantic and pragmatic research.
•	Arabic computational linguistics, which includes: lexical, morphological, syntactic, semantic and pragmatic research including their various applications.
•	Arabic language teaching for both Arabs and non Arabs.
•	Artificial intelligence.
•	Natural language processing.
•	Information retrieval.
•	Question answering.
•	Machine translation.
Features
- An electronic corpus: allowing faster and more accurate investigation of written Arabic.
- A synchronic corpus: including Arabic texts from the period of pre-Islamic era until the fourth Hijri century (equivalent to the period from the seventh until early eleventh century CE), which is the period of pure classical Arabic.
- A general corpus: covering a wide range of genres making it suitable for various research subjects.
- A representative corpus: it can be used as the basis for generalizations concerning Classical Arabic.
- A balanced corpus: the number of text samples taken from each genre is proportional to that genre.
- A monolingual corpus: containing written text of classical Arabic.
- An unvowelized corpus: only the words of the holy Quran are vowelized.
- A raw corpus: containing no tagging, lemmatization nor any further type of annotation, just plain text.
- An automatically annotated version of the corpus with lemma, stem, POS tag, gender and number annotations is also available.
Categories
This is an application that can also be fetched from https://sourceforge.net/projects/ksucca-corpus/. It has been hosted in OnWorks in order to be run online in an easiest way from one of our free Operative Systems.