[OPEN-ILS-DEV] Issues on Thai Localization

Nanthapume Toonkam poom.happy at hotmail.com
Fri Oct 26 03:59:59 EDT 2018


Hello everyone,


My name is Nanthapume (Poom), I'm undergrad student and now working part-time as assistant librarian at a primary school in Thailand. Past weeks, I've installed EvergreenILS and testing it through this domain ; 'library.panyasakbangbon.ac.th' and I found some technical problems issued with i18n in Thai language.


1) Thai orthography and collation isn't match perfectly, it results when using utf-8, word listing tends to be dysfunctional and off the dictionary . For simple program we have traditional solution by converting strings into another encoding system (usually  'tis-680' ,maybe also 'iso 8859-11') before ordering and then parse them back afterwards to make the text displaying normally on users' screens.


2) Thai writing system doesn't have "spacing" between words in the same sentence ,but we usually use spacing to indicate sentences' ending instead of full stop "." .  This is like writing "The apple is red. The bird is flying." as "theappleisred thebirdisflying". This reflect a problem when I was trying to search '¾ÃФÃÔʵ¸ÃÃÁ¤ÑÁÀÕÃìÀÒ¤¾Ñ¹¸ÊÑ­­ÒãËÁè' with '¾ÃФÃÔʵ¸ÃÃÁ¤ÑÁÀÕÃì' the search cannot find the results despite writing full title. Normally developers will create a simple program from a concise dictionary (and a list of common names) to tokenize words. So, when we search something the words will be separated apart before finding the matched strings (which was contiguously written also).


3) Thai users normally use Thai fragments along with English in searching. So it's something like 'ÀÒÉÒ C++', 'Bible ¾Ñ¹¸ÊÑ­­ÒãËÁè' instead of using one writing system. I think sometimes a set of  romanization, and language distinguishing programs may required.


So I have searched google for solution and found that issue number 2) and 3) can be fixed by using open-source Python library available online, eg. "https://github.com/PyThaiNLP/pythainlp" , "https://pypi.org/project/PyICU/". This might works but I haven't test it yet. ( Although, I cannot find Evergreen docs on configurations and criteria of searching. )


I'm trying to find a way out;  Now I have a list of words from 'Royal Institute Dictionary' (old free version) and  concise TH/EN Dictionary found online. From my side, I have translated 70% of OPAC .po files and 90% of MARC keys, including list of countries (mostly taken from Wikipedia) and some extras (that data field can be found in en_GB seed file.) If you have any suggestion for me please reply back, I'm looking forward for them.


Regards,
Nanthapume Toonkam (Poom)


Ps.#1  I have a project on making printing page for DATE DUE, which it'll be printed on small cards, if you have any technical advice please inform me.


Ps.#2 Before semester break I gave student some hints about coming changes in our library.The children were so curious about it. Hope this would come out fine. Thank you all, especially ones who helped me on Evergreen IRC chat.


Ps.#3 I've fixed my previous installation problem by setting new password for postgresql it seems like when using number as a password (started with 0) it cannot recognize the password properly.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://libmail.georgialibraries.org/pipermail/open-ils-dev/attachments/20181026/7fbc3bcd/attachment.html>


More information about the Open-ils-dev mailing list