Show simple item record

 
dc.contributor.author Sarma, Prof. Shikhar Kr.
dc.date.accessioned 2019-01-08T11:05:51Z
dc.date.available 2019-01-08T11:05:51Z
dc.date.issued 2019-01-08
dc.identifier.uri http://hdl.handle.net/11321/619
dc.description Assamese Corpus was developed in the NLP Lab of Gauhati University. Total size of Assamese Corpus (in terms of words) is 1.6 million (1613551 words). The Corpus is prepared following the guidelines of Corpus Encoding Standard and is UNICODE encoded. Corpus for Assamese language was developed keeping in mind certain issues like- Size of the corpus, Genre or domain selection, Range of writers, data collection, Computerization of data, Validation of corpus. The types of genre/domain used in the corpus were- Literature, Learned-Material and Media which includes Newspapers. ------- 1. These Assamese NLP resources including the Tools and Applications are developed during Research and Development Projects as well as Masters and Ph.D. thesis works. 2. These are mainly developed or generated at Gauhati University Department of Computer Science and Department of Information Technology. 3. These resources are used by students and researchers for further studies, researches, as well as for design and development of tools and applications. 4. Computational Linguistics in Assamese is not rich, and Natural Language Processing works have mainly started during last two decades, and most of the resources are first generation resources, and with ample scope for upgrading, enriching, and purifying. 5. These are very good and essential resources for all the researchers in Assamese NLP, as the language requires more and more NLP works to make Assamese a rich media for the digital world. 6. Anyone interested, or in need of such resources may express their interest for the required resources, and the way of availability will be advised/informed accordingly. 7. These are purely research materials and could only be used for further research only. 8. Researchers may visit the NLP Lab of Department of Information Technology, Gauhati University, Guwahati, India or contact us. 9. Researchers interested in collaborative works, and also students for project works, are welcome. 10. Contact person is Professor Shikhar Kr. Sarma, Department of Information Technology, Gauhati University, Guwahati 781014, Assam, India. Email- sks@gauhati.ac.in
dc.language.iso asm
dc.publisher Department of Information Technology, Gauhati University, Assam, India
dc.subject Assamese NLP
dc.subject Assamese Corpus
dc.subject Assamese Corpora
dc.subject Gauhati University
dc.title Assamese Corpus
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files no
branding CLARIN-PL
contact.person Prof. Shikhar Kr. Sarma sks@gauhati.ac.in Gauhati University
sponsor Department of Electronics and IT, Govt. of India NE-LTDP NE Language Technology Development Project nationalFunds
size.info 1600000 words
files.size 0
files.count 0


Show simple item record