Uzbek spellchecker

Murzintcev Nikita1   Allayarov Piratdin1   Yuldasheva Shahlo2   Kurbaniyazov Gulmurza2   Primbetov Abbaz3

1 Tashkent State University of Economics, Uzbekistan

2 Nukus State Pedagogical Institute, Uzbekistan

3 Hunan University, China

Overview

This page represents results of the research intended for development of general NLP tools and spellchecking software for Uzbek language. In the repository, you can find files in Hunspell format and an add-on for MS Office. Read the paper for description of Uzbek morphology.

Install

Warning: the provided software is a proof of concept, and it is not recommended for end-users.

Download and install add-on for Microsoft Office. Use it as a usual proofing tool. The software was tested for compatibility with Windows versions of MS Office 2016-2019 64-bit.

To add Uzbek language spellcheking to the applications supporting Hunspell dictionaries, such as LibreOffice, copy two files uz_Latn_UZ.aff and uz_Latn_UZ.dic to the next folders:

Some applications could require additional steps, see the corresponding documentation. For example, Adobe InDesign.

Lemmatization and PoS-tagging

The provided dictionaries are fully compatible with Hunspell, and can be used with standard tools or incorporated into the code (see supported interfaces).

Download two files uz_Latn_UZ.aff and uz_Latn_UZ.dic, and execute the following examples:

Interactive analyzes providing lemmas and morphological information:

> hunspell -d uz_Latn_UZ -m

Lemmatization:

> hunspell -d uz_Latn_UZ -s

Acknowledgements

This material is based upon work supported by the Modernizing Uzbekistan National Innovation System (MUNIS) project under Grant REP-2/6 in 2022-2024 years.

Website template was adapted from Brent Yi's project page for TILTED.