<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
<channel>
<title>Assamese NLP Resources</title>
<link>http://hdl.handle.net/11321/610</link>
<description>These Assamese NLP resources including the Tools and Applications are developed during Research and Development Projects as well as Masters and Ph.D. thesis works. These are mainly developed or generated at Gauhati University Department of Computer Science and Department of Information Technology.</description>
<pubDate>Sun, 19 Apr 2026 14:46:05 GMT</pubDate>
<dc:date>2026-04-19T14:46:05Z</dc:date>
<item>
<title>Assamese POS-Tagged Text</title>
<link>http://hdl.handle.net/11321/621</link>
<description>Assamese POS-Tagged Text
Sarma, Prof. Shikhar Kr.
Assamese POS tagger is a CRF++ based POS Tagger. Raw text is given to this CRF++ based POS tagger to get POS tagged data. Standard POS tagset is used.&#13;
&#13;
---&#13;
&#13;
1. These Assamese NLP resources including the Tools and Applications are developed&#13;
during Research and Development Projects as well as Masters and Ph.D. thesis&#13;
works.&#13;
2. These are mainly developed or generated at Gauhati University Department of&#13;
Computer Science and Department of Information Technology.&#13;
3. These resources are used by students and researchers for further studies, researches, as&#13;
well as for design and development of tools and applications.&#13;
4. Computational Linguistics in Assamese is not rich, and Natural Language Processing&#13;
works have mainly started during last two decades, and most of the resources are first&#13;
generation resources, and with ample scope for upgrading, enriching, and purifying.&#13;
5. These are very good and essential resources for all the researchers in Assamese NLP, as&#13;
the language requires more and more NLP works to make Assamese a rich media for&#13;
the digital world.&#13;
6. Anyone interested, or in need of such resources may express their interest for the&#13;
required resources, and the way of availability will be advised/informed accordingly.&#13;
7. These are purely research materials and could only be used for further research only.&#13;
8. Researchers may visit the NLP Lab of Department of Information Technology, Gauhati&#13;
University, Guwahati, India or contact us.&#13;
9. Researchers interested in collaborative works, and also students for project works, are&#13;
welcome.&#13;
10. Contact person is Professor Shikhar Kr. Sarma, Department of Information&#13;
Technology, Gauhati University, Guwahati 781014, Assam, India. Email-&#13;
sks@gauhati.ac.in
</description>
<pubDate>Tue, 15 Jan 2019 00:00:00 GMT</pubDate>
<guid isPermaLink="false">http://hdl.handle.net/11321/621</guid>
<dc:date>2019-01-15T00:00:00Z</dc:date>
</item>
<item>
<title>Assamese POS Tagger</title>
<link>http://hdl.handle.net/11321/620</link>
<description>Assamese POS Tagger
Sarma, Prof. Shikhar Kr.
Assamese POS tagger is a CRF++ based POS Tagger. CRF++ is a customizable open source Conditional Random Fields for tagging/labeling continuos text. CRF++ is implemented for generic purpose and can be applied to any natural language provided the tagset. CRF++ tool is designed in C++ language.&#13;
&#13;
-------&#13;
1. These Assamese NLP resources including the Tools and Applications are developed&#13;
during Research and Development Projects as well as Masters and Ph.D. thesis&#13;
works.&#13;
2. These are mainly developed or generated at Gauhati University Department of&#13;
Computer Science and Department of Information Technology.&#13;
3. These resources are used by students and researchers for further studies, researches, as&#13;
well as for design and development of tools and applications.&#13;
4. Computational Linguistics in Assamese is not rich, and Natural Language Processing&#13;
works have mainly started during last two decades, and most of the resources are first&#13;
generation resources, and with ample scope for upgrading, enriching, and purifying.&#13;
5. These are very good and essential resources for all the researchers in Assamese NLP, as&#13;
the language requires more and more NLP works to make Assamese a rich media for&#13;
the digital world.&#13;
6. Anyone interested, or in need of such resources may express their interest for the&#13;
required resources, and the way of availability will be advised/informed accordingly.&#13;
7. These are purely research materials and could only be used for further research only.&#13;
8. Researchers may visit the NLP Lab of Department of Information Technology, Gauhati&#13;
University, Guwahati, India or contact us.&#13;
9. Researchers interested in collaborative works, and also students for project works, are&#13;
welcome.&#13;
10. Contact person is Professor Shikhar Kr. Sarma, Department of Information&#13;
Technology, Gauhati University, Guwahati 781014, Assam, India. Email-&#13;
sks@gauhati.ac.in
</description>
<pubDate>Tue, 08 Jan 2019 00:00:00 GMT</pubDate>
<guid isPermaLink="false">http://hdl.handle.net/11321/620</guid>
<dc:date>2019-01-08T00:00:00Z</dc:date>
</item>
<item>
<title>Assamese Corpus</title>
<link>http://hdl.handle.net/11321/619</link>
<description>Assamese Corpus
Sarma, Prof. Shikhar Kr.
Assamese Corpus was developed in the NLP Lab of Gauhati University. Total size of Assamese Corpus (in terms of words) is 1.6 million (1613551 words). The Corpus is prepared following the guidelines of Corpus Encoding Standard and is UNICODE encoded. Corpus for Assamese language was developed keeping in mind certain issues like- Size of the corpus, Genre or domain selection, Range of writers, data collection, Computerization of data, Validation of corpus. The types of genre/domain used in the corpus were- Literature, Learned-Material and Media which includes Newspapers.&#13;
&#13;
-------&#13;
&#13;
1. These Assamese NLP resources including the Tools and Applications are developed&#13;
during Research and Development Projects as well as Masters and Ph.D. thesis&#13;
works.&#13;
2. These are mainly developed or generated at Gauhati University Department of&#13;
Computer Science and Department of Information Technology.&#13;
3. These resources are used by students and researchers for further studies, researches, as&#13;
well as for design and development of tools and applications.&#13;
4. Computational Linguistics in Assamese is not rich, and Natural Language Processing&#13;
works have mainly started during last two decades, and most of the resources are first&#13;
generation resources, and with ample scope for upgrading, enriching, and purifying.&#13;
5. These are very good and essential resources for all the researchers in Assamese NLP, as&#13;
the language requires more and more NLP works to make Assamese a rich media for&#13;
the digital world.&#13;
6. Anyone interested, or in need of such resources may express their interest for the&#13;
required resources, and the way of availability will be advised/informed accordingly.&#13;
7. These are purely research materials and could only be used for further research only.&#13;
8. Researchers may visit the NLP Lab of Department of Information Technology, Gauhati&#13;
University, Guwahati, India or contact us.&#13;
9. Researchers interested in collaborative works, and also students for project works, are&#13;
welcome.&#13;
10. Contact person is Professor Shikhar Kr. Sarma, Department of Information&#13;
Technology, Gauhati University, Guwahati 781014, Assam, India. Email-&#13;
sks@gauhati.ac.in
</description>
<pubDate>Tue, 08 Jan 2019 00:00:00 GMT</pubDate>
<guid isPermaLink="false">http://hdl.handle.net/11321/619</guid>
<dc:date>2019-01-08T00:00:00Z</dc:date>
</item>
<item>
<title>Assamese Root Words</title>
<link>http://hdl.handle.net/11321/618</link>
<description>Assamese Root Words
Sarma, Prof. Shikhar Kr.
This list comprises of Assamese root words. Size of the Assamese Root Word List  is 15,750 words&#13;
&#13;
---&#13;
&#13;
1. These Assamese NLP resources including the Tools and Applications are developed&#13;
during Research and Development Projects as well as Masters and Ph.D. thesis&#13;
works.&#13;
2. These are mainly developed or generated at Gauhati University Department of&#13;
Computer Science and Department of Information Technology.&#13;
3. These resources are used by students and researchers for further studies, researches, as&#13;
well as for design and development of tools and applications.&#13;
4. Computational Linguistics in Assamese is not rich, and Natural Language Processing&#13;
works have mainly started during last two decades, and most of the resources are first&#13;
generation resources, and with ample scope for upgrading, enriching, and purifying.&#13;
5. These are very good and essential resources for all the researchers in Assamese NLP, as&#13;
the language requires more and more NLP works to make Assamese a rich media for&#13;
the digital world.&#13;
6. Anyone interested, or in need of such resources may express their interest for the&#13;
required resources, and the way of availability will be advised/informed accordingly.&#13;
7. These are purely research materials and could only be used for further research only.&#13;
8. Researchers may visit the NLP Lab of Department of Information Technology, Gauhati&#13;
University, Guwahati, India or contact us.&#13;
9. Researchers interested in collaborative works, and also students for project works, are&#13;
welcome.&#13;
10. Contact person is Professor Shikhar Kr. Sarma, Department of Information&#13;
Technology, Gauhati University, Guwahati 781014, Assam, India. Email-&#13;
sks@gauhati.ac.in
</description>
<pubDate>Tue, 08 Jan 2019 00:00:00 GMT</pubDate>
<guid isPermaLink="false">http://hdl.handle.net/11321/618</guid>
<dc:date>2019-01-08T00:00:00Z</dc:date>
</item>
<item>
<title>Assamese-English Bilingual Dictionary</title>
<link>http://hdl.handle.net/11321/617</link>
<description>Assamese-English Bilingual Dictionary
Sarma, Prof. Shikhar Kr.
The Bilingual dictionary is created for Assamese-English.. In the Bilingual dictionary English meaning of Assamese words are given with POS of the words.&#13;
&#13;
---&#13;
&#13;
1. These Assamese NLP resources including the Tools and Applications are developed&#13;
during Research and Development Projects as well as Masters and Ph.D. thesis&#13;
works.&#13;
2. These are mainly developed or generated at Gauhati University Department of&#13;
Computer Science and Department of Information Technology.&#13;
3. These resources are used by students and researchers for further studies, researches, as&#13;
well as for design and development of tools and applications.&#13;
4. Computational Linguistics in Assamese is not rich, and Natural Language Processing&#13;
works have mainly started during last two decades, and most of the resources are first&#13;
generation resources, and with ample scope for upgrading, enriching, and purifying.&#13;
5. These are very good and essential resources for all the researchers in Assamese NLP, as&#13;
the language requires more and more NLP works to make Assamese a rich media for&#13;
the digital world.&#13;
6. Anyone interested, or in need of such resources may express their interest for the&#13;
required resources, and the way of availability will be advised/informed accordingly.&#13;
7. These are purely research materials and could only be used for further research only.&#13;
8. Researchers may visit the NLP Lab of Department of Information Technology, Gauhati&#13;
University, Guwahati, India or contact us.&#13;
9. Researchers interested in collaborative works, and also students for project works, are&#13;
welcome.&#13;
10. Contact person is Professor Shikhar Kr. Sarma, Department of Information&#13;
Technology, Gauhati University, Guwahati 781014, Assam, India. Email-&#13;
sks@gauhati.ac.in
</description>
<pubDate>Tue, 08 Jan 2019 00:00:00 GMT</pubDate>
<guid isPermaLink="false">http://hdl.handle.net/11321/617</guid>
<dc:date>2019-01-08T00:00:00Z</dc:date>
</item>
<item>
<title>Assamese Multi Word Expressions</title>
<link>http://hdl.handle.net/11321/616</link>
<description>Assamese Multi Word Expressions
Sarma, Prof. Shikhar Kr.
Multiword Expressions are sequence of words, separated by space delimiter (or any) which determines a unique meaning instead of words' individual meanings. A list comprising of 927 Multi-word Expressions have been identified for Assamese language. Example of the Assamese MWEs are “মাটিৰ মানুহ”, “ খটক খটক ” etc.
</description>
<pubDate>Tue, 08 Jan 2019 00:00:00 GMT</pubDate>
<guid isPermaLink="false">http://hdl.handle.net/11321/616</guid>
<dc:date>2019-01-08T00:00:00Z</dc:date>
</item>
<item>
<title>Assamese Named Entities</title>
<link>http://hdl.handle.net/11321/615</link>
<description>Assamese Named Entities
Sarma, Prof. Shikhar Kr.
A list comprising of 104138 Assamese named entities was developed. The list also comprises of NEs which are categorized as Organization(সদৌ অসম ছাত্ৰ সন্থা), Person Names(পঙ্কজ), Festival(দুৰ্গা পূজা), Flower(গোলাপফুল), Folk Instruments(বাঁহী), Food habits(ভাত), Games(ঢোপ খেল), Honorific title(জয়াল), Measurement(যোগ), Place Name(তেজপুৰ), Plants(আঁহত গছ), Birds(ভাটৌ), Religious Places(পোৱা মক্কা), Tourist Places(কাজিৰঙা), Institution name(গুৱাহাটী বিশ্ববিদ্যালয়)&#13;
&#13;
---&#13;
&#13;
1. These Assamese NLP resources including the Tools and Applications are developed&#13;
during Research and Development Projects as well as Masters and Ph.D. thesis&#13;
works.&#13;
2. These are mainly developed or generated at Gauhati University Department of&#13;
Computer Science and Department of Information Technology.&#13;
3. These resources are used by students and researchers for further studies, researches, as&#13;
well as for design and development of tools and applications.&#13;
4. Computational Linguistics in Assamese is not rich, and Natural Language Processing&#13;
works have mainly started during last two decades, and most of the resources are first&#13;
generation resources, and with ample scope for upgrading, enriching, and purifying.&#13;
5. These are very good and essential resources for all the researchers in Assamese NLP, as&#13;
the language requires more and more NLP works to make Assamese a rich media for&#13;
the digital world.&#13;
6. Anyone interested, or in need of such resources may express their interest for the&#13;
required resources, and the way of availability will be advised/informed accordingly.&#13;
7. These are purely research materials and could only be used for further research only.&#13;
8. Researchers may visit the NLP Lab of Department of Information Technology, Gauhati&#13;
University, Guwahati, India or contact us.&#13;
9. Researchers interested in collaborative works, and also students for project works, are&#13;
welcome.&#13;
10. Contact person is Professor Shikhar Kr. Sarma, Department of Information&#13;
Technology, Gauhati University, Guwahati 781014, Assam, India. Email-&#13;
sks@gauhati.ac.in
</description>
<pubDate>Tue, 08 Jan 2019 00:00:00 GMT</pubDate>
<guid isPermaLink="false">http://hdl.handle.net/11321/615</guid>
<dc:date>2019-01-08T00:00:00Z</dc:date>
</item>
<item>
<title>Assamese Stopwords</title>
<link>http://hdl.handle.net/11321/614</link>
<description>Assamese Stopwords
Sarma, Prof. Shikhar Kr.
The most frequently occurring words in a context are the stopwords. They do not play an important role in retrieving information. As Stopwords do not contribute any important information towards the context and so they should be removed before processing. These words have very low discrimination value and are sometimes referred to as noise words. Assamese stopword list is created which contains 264 words. Examples are: যেতিয়া, যেন, যেনিবা, যেনে, যোগে, লগ, লৈ etc.&#13;
&#13;
---&#13;
&#13;
1. These Assamese NLP resources including the Tools and Applications are developed&#13;
during Research and Development Projects as well as Masters and Ph.D. thesis&#13;
works.&#13;
2. These are mainly developed or generated at Gauhati University Department of&#13;
Computer Science and Department of Information Technology.&#13;
3. These resources are used by students and researchers for further studies, researches, as&#13;
well as for design and development of tools and applications.&#13;
4. Computational Linguistics in Assamese is not rich, and Natural Language Processing&#13;
works have mainly started during last two decades, and most of the resources are first&#13;
generation resources, and with ample scope for upgrading, enriching, and purifying.&#13;
5. These are very good and essential resources for all the researchers in Assamese NLP, as&#13;
the language requires more and more NLP works to make Assamese a rich media for&#13;
the digital world.&#13;
6. Anyone interested, or in need of such resources may express their interest for the&#13;
required resources, and the way of availability will be advised/informed accordingly.&#13;
7. These are purely research materials and could only be used for further research only.&#13;
8. Researchers may visit the NLP Lab of Department of Information Technology, Gauhati&#13;
University, Guwahati, India or contact us.&#13;
9. Researchers interested in collaborative works, and also students for project works, are&#13;
welcome.&#13;
10. Contact person is Professor Shikhar Kr. Sarma, Department of Information&#13;
Technology, Gauhati University, Guwahati 781014, Assam, India. Email-&#13;
sks@gauhati.ac.in
</description>
<pubDate>Tue, 08 Jan 2019 00:00:00 GMT</pubDate>
<guid isPermaLink="false">http://hdl.handle.net/11321/614</guid>
<dc:date>2019-01-08T00:00:00Z</dc:date>
</item>
<item>
<title>Assamese spell variation list</title>
<link>http://hdl.handle.net/11321/613</link>
<description>Assamese spell variation list
Sarma, Prof. Shikhar Kr.
A spelling variant of a word occurs when a word may not have only a single correct spelling. There are many different ways in which it can be spelled in linguistics. A spell variation list comprising 5000 words, mainly named entities was compiled for Assamese language.&#13;
&#13;
---&#13;
&#13;
1. These Assamese NLP resources including the Tools and Applications are developed&#13;
during Research and Development Projects as well as Masters and Ph.D. thesis&#13;
works.&#13;
2. These are mainly developed or generated at Gauhati University Department of&#13;
Computer Science and Department of Information Technology.&#13;
3. These resources are used by students and researchers for further studies, researches, as&#13;
well as for design and development of tools and applications.&#13;
4. Computational Linguistics in Assamese is not rich, and Natural Language Processing&#13;
works have mainly started during last two decades, and most of the resources are first&#13;
generation resources, and with ample scope for upgrading, enriching, and purifying.&#13;
5. These are very good and essential resources for all the researchers in Assamese NLP, as&#13;
the language requires more and more NLP works to make Assamese a rich media for&#13;
the digital world.&#13;
6. Anyone interested, or in need of such resources may express their interest for the&#13;
required resources, and the way of availability will be advised/informed accordingly.&#13;
7. These are purely research materials and could only be used for further research only.&#13;
8. Researchers may visit the NLP Lab of Department of Information Technology, Gauhati&#13;
University, Guwahati, India or contact us.&#13;
9. Researchers interested in collaborative works, and also students for project works, are&#13;
welcome.&#13;
10. Contact person is Professor Shikhar Kr. Sarma, Department of Information&#13;
Technology, Gauhati University, Guwahati 781014, Assam, India. Email-&#13;
sks@gauhati.ac.in
</description>
<pubDate>Tue, 08 Jan 2019 00:00:00 GMT</pubDate>
<guid isPermaLink="false">http://hdl.handle.net/11321/613</guid>
<dc:date>2019-01-08T00:00:00Z</dc:date>
</item>
<item>
<title>Assamese WSD List</title>
<link>http://hdl.handle.net/11321/612</link>
<description>Assamese WSD List
Sarma, Prof. Shikhar Kr.; Sarma, Jumi
WSD is the process of identifying the proper sense of an ambiguous word depending on the particular context. Assamese WSD list comprise of more than 100 words with their multiple senses. Also English meaning of each of the senses are given.&#13;
&#13;
---&#13;
&#13;
1. These Assamese NLP resources including the Tools and Applications are developed&#13;
during Research and Development Projects as well as Masters and Ph.D. thesis&#13;
works.&#13;
2. These are mainly developed or generated at Gauhati University Department of&#13;
Computer Science and Department of Information Technology.&#13;
3. These resources are used by students and researchers for further studies, researches, as&#13;
well as for design and development of tools and applications.&#13;
4. Computational Linguistics in Assamese is not rich, and Natural Language Processing&#13;
works have mainly started during last two decades, and most of the resources are first&#13;
generation resources, and with ample scope for upgrading, enriching, and purifying.&#13;
5. These are very good and essential resources for all the researchers in Assamese NLP, as&#13;
the language requires more and more NLP works to make Assamese a rich media for&#13;
the digital world.&#13;
6. Anyone interested, or in need of such resources may express their interest for the&#13;
required resources, and the way of availability will be advised/informed accordingly.&#13;
7. These are purely research materials and could only be used for further research only.&#13;
8. Researchers may visit the NLP Lab of Department of Information Technology, Gauhati&#13;
University, Guwahati, India or contact us.&#13;
9. Researchers interested in collaborative works, and also students for project works, are&#13;
welcome.&#13;
10. Contact person is Professor Shikhar Kr. Sarma, Department of Information&#13;
Technology, Gauhati University, Guwahati 781014, Assam, India. Email-&#13;
sks@gauhati.ac.in
</description>
<pubDate>Tue, 08 Jan 2019 00:00:00 GMT</pubDate>
<guid isPermaLink="false">http://hdl.handle.net/11321/612</guid>
<dc:date>2019-01-08T00:00:00Z</dc:date>
</item>
</channel>
</rss>
