Re: Tagging text as being in arbitrary complex-script languages

2019-04-23 Thread Richard Wordingham
On Tue, 23 Apr 2019 17:35:10 +0200 Eike Rathke wrote: > Hi Richard, > > On Thursday, 2019-04-18 20:40:01 +0100, Richard Wordingham wrote: > > It sounds as though one has to specify the script where there is > > doubt as to what type of script will dominate. Is it an issue

Re: Tagging text as being in arbitrary complex-script languages

2019-04-23 Thread Richard Wordingham
On Tue, 23 Apr 2019 18:00:22 +0200 Eike Rathke wrote: > On Friday, 2019-04-19 03:32:34 +0100, Richard Wordingham wrote: > > In answer to what was intended to be a rhetorical question, I > > suppose und-Latn-t-sa-m0-iast and und-Latn-t-sa-m0-iso would work > > fo

Re: Tagging text as being in arbitrary complex-script languages

2019-04-18 Thread Richard Wordingham
On Thu, 18 Apr 2019 20:40:01 +0100 Richard Wordingham wrote: > On Thu, 18 Apr 2019 12:25:11 +0200 > Eike Rathke wrote: > > Though with sa-Latn > > I doubt there's a use case, so I wouldn't call that "correct" in > > common sense. > > So

Re: Tagging text as being in arbitrary complex-script languages

2019-04-18 Thread Richard Wordingham
On Thu, 18 Apr 2019 12:25:11 +0200 Eike Rathke wrote: > What I usually did is, lookup the language at SIL and the Ethnologue > and use the most prevalent script as implied default script. Which > here https://www.ethnologue.com/language/san would lead to > Devanagari, but in this case more import

Re: Tagging text as being in arbitrary complex-script languages

2019-04-17 Thread Richard Wordingham
On Wed, 17 Apr 2019 13:53:25 +0200 Eike Rathke wrote: > > > On 4/15/19 12:26 PM, Eike Rathke wrote: > > > > Adding arbitrary dictionary languages (as long as they strictly > > > > follow the BCP 47 language tag specification) works since quite > > > > a while (2014?) already. > > An interest

Re: Tagging text as being in arbitrary complex-script languages

2019-04-16 Thread Richard Wordingham
On Mon, 15 Apr 2019 15:14:49 + jonathon wrote: > On 4/15/19 12:26 PM, Eike Rathke wrote: > > Adding arbitrary dictionary languages (as long as they strictly > > follow the BCP 47 language tag specification) works since quite a > > while (2014?) already. Only if you hacked the text to declar

Re: Tagging text as being in arbitrary complex-script languages

2019-04-10 Thread Richard Wordingham
On Wed, 10 Apr 2019 15:13:52 +0200 Eike Rathke wrote: > Hi Richard, > > On Wednesday, 2019-04-10 04:02:53 +0100, Richard Wordingham wrote: > > > I was also able to get SIL's oxttools to work sufficiently > > What are those oxttools and where to get them? Tools

Re: Tagging text as being in arbitrary complex-script languages

2019-04-09 Thread Richard Wordingham
On Mon, 8 Apr 2019 16:17:38 +0200 Eike Rathke wrote: > ScriptType value 3 here means CTL. The values are explained in > officecfg/registry/schema/org/openoffice/VCL.xcs under > Thank you for the information, and thanks to Stephan Bergmann for the localisation information. For plodders like me,

Tagging text as being in arbitrary complex-script languages

2019-04-06 Thread Richard Wordingham
https://wiki.documentfoundation.org/ReleaseNotes/5.4 says, "The language list for text attribution now also displays BCP47 language tags provided by dictionaries if a language is not known in the predefined set of languages. (Eike Rathke (Red Hat, Inc.)) Such additional language tags are plac

Special Fonts for Spell Checking Northern Thai in Lanna Script

2017-10-15 Thread Richard Wordingham
I am trying to put together a workable solution for spell-checking Northern Thai in the Lanna (a.k.a. Tai Tham) script. I have a good idea how to do it, and it is already working in Firefox. The solution may not be suitable for run of the mill users, but I don't believe run of the mill users need

Version of gcc for LibreOffice

2015-10-09 Thread Richard Wordingham
On Wed, 07 Oct 2015 11:10:08 +0200 Jan-Marek Glogowski wrote: (when topic was 'Can't track flow of characters in from Input Method Editor') > Am 06.10.2015 um 23:51 schrieb Richard Wordingham: > > I think my compiler (gcc > > Version 4.6.3) is too old to compile

Re: Can't track flow of characters in from Input Method Editor

2015-10-08 Thread Richard Wordingham
On Thu, 08 Oct 2015 10:18:15 +0100 Caolán McNamara wrote: > On Thu, 2015-10-08 at 08:52 +0100, Richard Wordingham wrote: > > The intent of the call is to delete one Unicode character; On reading the GTK documentation, it is clear that the arguments are in terms of Unicode characters

Re: Can't track flow of characters in from Input Method Editor

2015-10-08 Thread Richard Wordingham
On Thu, 8 Oct 2015 01:17:14 +0100 Richard Wordingham wrote: > Thank you all for your inputs. I've finally found where the problem materialises. There is a callback of GtkSalFrame::IMHandler::signalIMDeleteSurrounding() to delete one 'character'. I now need to work out where

Re: Can't track flow of characters in from Input Method Editor

2015-10-07 Thread Richard Wordingham
Thank you all for your inputs. On Wed, 7 Oct 2015 09:57:14 +0200 Miklos Vajna wrote: > Writer "main text" gets all keyboard input in SwEditWin::KeyInput(), > sw/source/uibase/docvw/edtwin.cxx. It's VCL that calls that member > function, and in your case it's probably the VCL KDE backend in > par

Can't track flow of characters in from Input Method Editor

2015-10-06 Thread Richard Wordingham
On Sunday I raised bug report 94753 about the apparent generation of lone surrogates in response to the use of Keyman for Linux under ibus as the input method editor. I have compiled Version 4.4.4.3.0+ with debug to facilitate my investigation; I think my compiler (gcc Version 4.6.3) is too old to

Re: Unicode 8.0?

2015-07-16 Thread Richard Wordingham
On Thu, 16 Jul 2015 17:40:06 +0100 Caolán McNamara wrote: > On Thu, 2015-07-16 at 11:53 +0200, Viktor Kovács wrote: > > I would like to ask when will be adopted Old Hungarian fonts. It is > > defined in the UNICODE 8.0, central-europe subgroup, and it must be > > typed right to left writing. > >

Re: Univerbation

2015-07-07 Thread Richard Wordingham
On Tue, 07 Jul 2015 09:55:38 +0100 Caolán McNamara wrote: > On Mon, 2015-07-06 at 09:13 +0100, Richard Wordingham wrote: > > What mechanisms does ODF have to indicate that a sequence of word > > characters constitutes a word? > But generally we follow the rules of the unde

Univerbation

2015-07-06 Thread Richard Wordingham
What mechanisms does ODF have to indicate that a sequence of word characters constitutes a word? Having such a mechanism is useful for spell-checking Thai and other languages where the boundaries between words are not marked. At present, one can cancel spurious boundaries by inserting U+2060 WORD

Re: Adding Languages to Writer's Character, Font Menu

2015-07-02 Thread Richard Wordingham
On Wed, 24 Jun 2015 23:40:10 +0200 Michael Stahl wrote: > On 24.06.2015 23:26, toki wrote: > > That is part of the reason why I think the whole Western/CJKV/CTL > > split should be thrown out, and replaced with language/writing > > system, supplemented by locale data. > that's a great idea in t

Re: Adding Languages to Writer's Character, Font Menu

2015-06-30 Thread Richard Wordingham
On Tue, 30 Jun 2015 17:48:05 +0200 Eike Rathke wrote: > On Monday, 2015-06-29 20:40:46 +0200, Khaled Hosny wrote: > > We already handle this at the text shaping level in VCL for > > platforms where HarfBuzz is used. > I think we talk about two different things here. Yes. Khaled and I are foc

Licence to Convert Dictionary to Spell-Checker Dictionary

2015-06-29 Thread Richard Wordingham
One way of producing a spelling dictionary is to take the words from a near-normal dictionary and use them. Does publishing such a dictionary require the permission of the dictionary's copyright holder? If it's relevant, the dictionary was published in Thailand. I appreciate that one ought to do

Re: Adding Languages to Writer's Character, Font Menu

2015-06-29 Thread Richard Wordingham
On Wed, 24 Jun 2015 21:26:50 + toki wrote: > I'll simply point to the current version of Microsoft Office, which is > claimed, by Microsoft, to support more than 7,000 languages. > > As far as UI design goes, there are at least four options. > 1) Offer everything, listed alphabetically; > 2)

Re: Adding Languages to Writer's Character, Font Menu

2015-06-29 Thread Richard Wordingham
On Mon, 29 Jun 2015 20:40:46 +0200 Khaled Hosny wrote: > On Mon, Jun 29, 2015 at 12:14:44PM +0200, Eike Rathke wrote: > > Hi Richard, > > > > On Wednesday, 2015-06-24 20:54:54 +0100, Richard Wordingham wrote: > > > > > The script is generally implicit in th

Re: Adding Languages to Writer's Character, Font Menu

2015-06-25 Thread Richard Wordingham
On Wed, 24 Jun 2015 20:54:54 +0100 Richard Wordingham wrote: > On Wed, 24 Jun 2015 12:31:16 +0200 > Eike Rathke wrote: > > Simply in a css::lang::Locale set the Language field to "qlt" and in > > the Variant have the language tag, see > > http:

Re: Adding Languages to Writer's Character, Font Menu

2015-06-24 Thread Richard Wordingham
On Wed, 24 Jun 2015 12:31:16 +0200 Eike Rathke wrote: > > >* Allow arbitrary lang tags to be used in a text anywhere > > OpenDocument allows these - it is just a question of how much > > LibreOffice supports this. > It does. > > I believe the UNO interface supports this, > > but I won't be s

Re: Adding Languages to Writer's Character, Font Menu

2015-06-24 Thread Richard Wordingham
On Wed, 24 Jun 2015 11:52:49 +0200 Eike Rathke wrote: > > If I have some text with khb-CN as the language and > > region and then try to set the language for a greater expanse of > > text, khb-CN does not come up in the menu. N.B. By 'language' and 'region', I mean language and region for comple

Re: Adding Languages to Writer's Character, Font Menu

2015-06-24 Thread Richard Wordingham
On Tue, 23 Jun 2015 21:07:12 + toki wrote: > On 06/22/2015 07:30 PM, Richard Wordingham wrote: > > > How do I add a language to this menu so that fonts that can will > > render text in the style appropriate to the language? I've been getting a fair bit of informatio

Re: Adding Languages to Writer's Character, Font Menu

2015-06-23 Thread Richard Wordingham
(Copy to list for reference - I accidentally replied to Caolán alone.) On Tue, 23 Jun 2015 08:59:04 +0100 Caolán McNamara wrote: > The language combo-box allows you to enter arbitrary language tags. > What happens if you just enter khb-CN in there. Using vanilla Version: 4.3.3.2, Build ID: 9bb7

Adding Languages to Writer's Character, Font Menu

2015-06-22 Thread Richard Wordingham
How do I add a language to this menu so that fonts that can will render text in the style appropriate to the language? I am reconciled to having to create a bespoke version of LibreOffice, though I'd rather not. Manually editing a document's XML files would be the last resort - it seems to work!

Re: Adding Extension for Experimental Thai Spelling

2012-09-27 Thread Richard Wordingham
On Thu, 27 Sep 2012 21:08:13 +0700 Nathan Wells wrote: > Firstly, you are right, I was mistaken about ICU and the breakiterator > working for sentences (I just tried it right now and it does work, > but just not with the normal "khan" or "period" of Khmer rather it > works with Latin sentence mar

Re: Adding Extension for Experimental Thai Spelling

2012-09-27 Thread Richard Wordingham
On Thu, 27 Sep 2012 11:52:26 +0700 Nathan Wells wrote: >> 1. If you are shutting off the ICU breakiterator for text following, >> we >> should probably also do it for text preceding. Thus if there is a >> ZWSP or ZWNBSP (U+2060 WJ) anywhere in a text then ICU break >> iteration is disabled for th

Re: Adding Extension for Experimental Thai Spelling

2012-07-27 Thread Richard Wordingham
On Thu, 26 Jul 2012 16:33:00 +0700 Martin Hosken wrote: > 1. use of U+2060 makes string searching and spell checking harder > (unless WJ chars are stripped for searching and spell checking). They > are not part of the spelling of a word, so their introduction in the > underlying text stream is pr

Re: Adding Extension for Experimental Thai Spelling

2012-02-17 Thread Richard Wordingham
On Fri, 17 Feb 2012 14:10:21 + Caolán McNamara wrote: > On Thu, 2012-02-16 at 23:24 +0000, Richard Wordingham wrote: > Indeed, yeah, I suppose, assuming its as complicated as "Thai", that > the right direction would be for someone to write for icu new > dictionary-ba

Re: Adding Extension for Experimental Thai Spelling

2012-02-16 Thread Richard Wordingham
On Tue, 14 Feb 2012 16:19:17 + Caolán McNamara wrote: > I think this change: > http://cgit.freedesktop.org/libreoffice/core/commit/?id=475d0c59c66fb7752d230f76130b17145aad0c12 > should improve matters a lot. It's a vast improvement - it gives LibreOffice a real Thai spell-checker. Thank you

Re: Adding Extension for Experimental Thai Spelling

2012-02-13 Thread Richard Wordingham
Thank you to every one who's offered me advice. On Mon, 13 Feb 2012 15:08:20 + Caolán McNamara wrote: > I don't think we have any way to override our breakiterators from > extensions. Ah well, I'll just have to try to get Thai spell-checking working for myself and then worry about sharing m

Adding Extension for Experimental Thai Spelling

2012-02-11 Thread Richard Wordingham
As I understand it, the lack of a usable Thai spell-checker for LibreOffice (unlike, say, a Khmer spell-checker) is due to the Thai break iterator. (I had expected Thai and Khmer to face similar problems, for neither has a visible word separator and syllable boundaries are often unclear in both.)