Dear Stojan, On 26 September 2017 at 09:44, Стоян Димитров wrote: > Here is the information I could gather so far: > > The author of the proposed patterns is the renown dr. Anton Zinoviev [1]. In > a private communication I was assured that his work covers the official > hyphenation algorithm as published from Institute for Bulgarian Language [2] > in Official Spelling Dictionary [3] which is the official normative > reference book on spelling Bulgarian language. Additionally I was assured > that full coverage of the algorithm is possible for Bulgarian without > defects because of the simple nature of the hyphenation rules. > > Links to sources: > > upstream (supplied by the author) [4] > converted dictionary for Open/LibreOffice [5] > > ___ > [1] http://lml.bas.bg/staff.html > [2] http://ibl.bas.bg/en/ > [3] http://ibl.bas.bg/en/struktura/savremenen-balgarski-ezik/publikatsii/ > [4] http://logic.fmi.uni-sofia.bg/zinoviev/bgtex-v3.tgz > [5] https://sourceforge.net/p/bgoffice/code/HEAD/tree/trunk/OOo-hyph-bg/
Thank you very much. I took a quick glimpse. Both patterns (the ones we currently use and the ones from your link) seem to be auto-generated from a script rather than via patgen. That should make it much easier to compare them. For the patterns from your link it would help to: - convert them to UTF-8 - create a script that generates those patterns (we should be able to help with that if help is needed; but I assume the author already has some script unless this was assembled manually; ideally this could/should be done for the existing patterns as well) - (Ideally get an agreement for MIT licence?) Once we get a generating script (or rather: exact rules), it should be straightforward to compare both sets and point out the exact differences. I would then suggest to contact both authors to comment on the differences, ideally agree which patterns are better and why and then we could "discard" one of the two sets and take the best set. Given that this is not a random set from patgen, this should be doable. (It would also be nice to publish an article in English describing those rules.) Mojca
