Re: [sword-devel] TEI formatting, duplicated key (BDB Glosses)

DM Smith Mon, 30 Apr 2012 10:26:18 -0700

On 04/30/2012 10:36 AM, Jonathan Morgan wrote:

Hi DM,

On Tue, May 1, 2012 at 12:00 AM, DM Smith <dmsm...@crosswire.org<mailto:dmsm...@crosswire.org>> wrote:



    On 04/30/2012 09:37 AM, Daniel Owens wrote:



        On 04/30/2012 06:54 AM, Chris Little wrote:

            On 4/30/2012 4:39 AM, David Troidl wrote:

                Hi Chris,

                I'm certainly no expert on your TEI dictionaries, but
                wouldn't it make
                sense to have the first key be one that would sort
                properly, and present
                the dictionary in true alphabetical order? I'm
                thinking of Middle
                Liddell, as well as the Hebrew. This key wouldn't even
                necessarily have
                to be shown to the user. The second key, the title,
                could then maintain
                the proper accents for display, without hindering
                sorting, searching or
                navigation.


            I confess, I don't understand what you're proposing this
            as an alternative to.

            In the example Karl cites, there's just one actual key per
            entry. It is an uppercased version of the entryFree's n
            attribute. This is the key that is sorted.

            The un-uppercased version from the n attribute is being
            rendered as part of the entry text via the TEI filters.
            This is the part I'm proposing we retain, but render
            somewhere else, e.g. right-justified at the bottom of the
            entry.

            We also render all the text of the entry, which in these
            cases includes the text from a title element.

            I don't know what 'true alphabetical order' means, but if
            you mean localized sort order, it's not possible with the
            current implementation of this module type.

            --Chris


        I think David's concern is something that needs to be dealt
        with. A number of possibilities could be pursued, some of them
        together:

           1. The current implementation is to sort by unicode code
        points. This works particularly well with numeric keys. A
        quick solution for languages for which such sorting is not
        alphabetical would be to follow David's suggestion of using
        keys that the user does not even see. This has the advantage
        of providing a workable solution right away, but there are
        some problems with this. First, we could create a new
        "strongs" standard because the current implementation does not
        actually hide keys. That could be solved by making the keys so
        obscure that no one would remember them. Second, any future,
        more robust solution would require reworking all modules keyed
        to it. I have toyed with this solution, and it might be the
        pragmatic way forward, but it is not ideal.

           2. A localized sort order, which I think this is what David
        means by true alphabetical order, would be a better long-term
        solution.

           3. In addition, using genbooks for lexica would work for
        lexica that are sorted by root, with subentries nested in a
        hierarchy, just like in the Hesychius module and BDB. I have
        been working with Troy on this. Unfortunately, front-ends do
        not recognize the Feature=HebrewDef option in the conf file
        and allow genbooks as lexica. I can send anyone an example
        lexicon if you are interested in working on this. In that
        case, instead of @n as the key, */x-entry/@osisID would be the
        key.

        Any thoughts?


    I think there is a problem with the sorting of entries in
    dictionaries where the keys are not ascii. I don't remember the
    details, but I seem to remember it having been discussed here.

    For JSword, we'll be building a Lucene search index for the key,
    the term and the whole entry. A user lookup will be normalized and
    the search will return the key with which lookup will proceed
    internally as it does today. ICU provides the ability to create a
    localized sort key (not at all suitable for display) that can be
    used to sort dictionary entries for the end-users locale. I'm
    thinking that for TEI dictionaries the representation of the key
    should not be shown at all.

BPBible, and I believe some other frontends as well use binary searchon the original module order to locate a key in a virtual list. Thisprovides very noticeable speedups on large dictionaries like ISBE. Ithink this would require the original module creation to place amodule in localised key order if we really wanted to order by that,not just have a lookup which as I understand it would only be donewhen actually looking for a key? It also really means that a modulecan be sorted in one and only one way.

Then again, I'm not even sure we can guarantee any kind of binarysearch on localised keys.

A related issue for English dictionaries is allowing mixed-casedictionary keys (and I think I have heard similar comments about Greekand maybe other languages). At the moment I think SWORD requiresdictionary keys to be upper-case to ensure that they sort correctly,but really "Aaron's Rod" looks much better than "AARON'S ROD".BPBible now attempts to automatically and heuristically turn keys tomixed case, which I think looks a lot better, but ideally this wouldbe done in the same way as for other languages: separating sort orderfrom codepoint order in some way.

The idea given above is to have an index to the SWORD index. It can bebuilt to be ordered and accessed in whatever way is needed to solve theproblems.

As you note, the problem is that SWORD makes severe assumptions aboutthe order and nature of the keys. Unless care is taken uppercasing isnot always appropriate. For example in Turkish the uppercase of 'i' isnot 'I'.


In Him,
    DM

_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] TEI formatting, duplicated key (BDB Glosses)

Reply via email to