Re: [sword-devel] Improvements in dictionary collation. was Re: AbbottSmith module question

David Troidl Fri, 15 Jan 2016 08:42:15 -0800

Hi,

I've been working on a rudimentary Greek lexicon, covering both the NewTestament and the Septuagint. In the process, I was faced with thisissue. After all the discussion and work in the Hebrew lexicon, Ireevaluated my approach. What I finally decided on was usingunaccented, lower case forms of the lemmas in the lexicon. This thenautomatically sorts properly. In the relatively few cases ofduplication, I append a .1, .2, etc. This represents a small percentageof the entries in the lexicon. So the keys of the lexicon are relatedto the lemmas, lexically, but the lemmas retain the form to be displayedin listing the lexicon.


Hope this helps.

Peace,

David

On 1/15/2016 9:36 AM, DM Smith wrote:

On Jan 15, 2016, at 8:01 AM, Jonathan Morgan <jonmmor...@gmail.com<mailto:jonmmor...@gmail.com>> wrote:
Hi DM,
On Fri, Jan 15, 2016 at 1:40 AM, DM Smith<dmsm...@crosswire.org<mailto:dmsm...@crosswire.org>>wrote:
    I’ve been trawling through the code. Seems that there is support
    for Strong’s Numbers that are not padded. If a module contains
    Strong’s Numbers that are not padded, it is to use
    StrongsPadding=false. (Actually any value other than “true” will
    be false. TRUE is false.) This module does not have it.

    Not having StrongsPadding in a conf is the same as
    StrongsPadding=true. There’s a note in the wiki that says that
    we’ll probably reverse that in the future. I doubt it. We still
    have LZSS as the default compression though no module has used it
    for years (other than experimental modules).

    I’m not sure how a Bible with a reference to G0001 will find G1
    as it doesn’t unpad the user’s input. But at least the dictionary
    should work. BTW, there’s a missing "if (strongsPadding)” in
    rawLD. It is present in zLD. I think this is a bug. Need to
    verify, report and submit a patch for it. (BTW, I don’t have
    write permissions either on the main repo, but I’m not
    discouraged in contributing and submitting patches.)
Sorry if I'm missing something, but surely keys without paddingwouldn't appear in the correct (numeric) order in the dictionary?
Jon
Jon,
Right. They will be in collation order, not numerical order. Itdoesn’t work as a SWORD module for that reason and was my primarymotivation for moving it to the Experimental repository. The tei2modprogram needs to add support for Strong’s numbers as imp2ld has. Itdoesn’t pad the values as it puts them into the module.
The ordering problem is a more general problem. Our collation order isgood for ASCII. It is not good for Latin-1 as the byte value foraccented letters is not adjacent to unaccented counterparts.
Each language, script combination has its own collation order. Somelanguages use multiple glyphs for a single letter. This was notedearlier this month on this mailing list.
In a past job, I had to implement a sort routine that would accountfor numbers occurring anywhere in a string. What I discovered in theprocess of doing this was that there is a need for an internalrepresentation that differs from an external representation androutines that would normalize an external representation to aninternal representation. Basically that routine would look at a stringas an alternating sequence of numbers and non-numbers. The routineexternal2internal would create a string where numbers were zero paddedto 10 digits. (It also did other things like strip noise words fromthe string, normalize dotted acronyms, normalize casing, …).
Also in an earlier posting this month, I mentioned that ICU hascollation routines that are language and script sensitive. Thecollation values that these produce are good for byte-order sorting,but are not intended for external use.
What we need is a dictionary that stores the case-insensitive keys andthat the frontend can collate as it sees fit. That collation orderwould be used to sort and show the case-insensitive keys. Basicallyanother layer of indirection with a mapping from external presentationto the internal storage of the module.
We’ve talked about this before. I think Troy suggested a mechanism.
I’m going to survey the lexdict modules in all the repos in the Masterlist (and a few others) to see where we stand with those modules andthe StrongsPadding flag. If any key starts with a number and isn’tzero padded, it will have difficulty if StrongsPadding=false is not inthe conf. If a module has some that are zero padded and others thatare not, this also is a problem.
DM



_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page




---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] Improvements in dictionary collation. was Re: AbbottSmith module question

Reply via email to