Re: [sword-devel] Creating a version of the BSB module with interlinear support

Timothy Allen Mon, 02 Oct 2023 00:39:13 -0700

Ah, thanks. I did look at that page when I started making my module, butI'd forgotten about it by the time I needed this more detailed advice.Thanks for reminding me! Using this to update the guesses from myoriginal message:


gloss
   I *might* be able to try grabbing the first word from the BDB/Thayer
   gloss, but that seems error-prone and I probably won't bother unless
   somebody really wants it
lemma
   This should be used for Strongs numbers, marked up as "strong:G123"
   or "strong:H123", but could also be used for storing the original
   source text as "lemma.BSB:בְּרֵאשִׁ֖ית" if we assume a hypothetical
   lexicon that indexes all the words in the BSB.
morph
   This should be used for Robinson morphology codes, so I should not
   bother with this until I can figure out how to translate the BSB's
   codes to Robinson ones. The wiki page also has "strongMorph" codes
   in its examples, but I can't find any extra information on what
   system this might refer to. Apparently there aren't any Hebrew
   morphology lexicons available for SWORD; maybe someday I could make one?
POS
   Still unclear to me, it's not mentioned on the wiki page
src
   Apparently this is for word order in the source language, but it's
   not at all clear where "word 1" is. The start of the <w> element?
   The start of the verse? The start of the chapter? The start of the
   book? The start of the Bible? Does it not matter, because front-ends
   are intended to just sort the words they have?
xlit
   Still for the transliteration, simply enough.

According to the wiki page, there's also an "n" attribute not mentionedin the official OSIS docs, which is for "marking enumerated words". Idon't know what this means, and the wiki page doesn't include anyexamples. I'm going to guess I don't need it.



Do I have all that right? Is there anything I've misunderstood?

Also, would it be better to have "lemma.BSB:בְּרֵאשִׁ֖ית" and use thesame "BSB" lexicon for every word in the entire text, or would it bemore appropriate to use "lemma.WLC:בְּרֵאשִׁ֖ית" and use differentlexicons to indicate the different sources used for the translation(Nestle1904, TR, NA, SBL, etc.)?



Timothy


On 30/9/23 20:00, David Haslam wrote:

Hi Timothy,

Please consult the developers’ wiki

https://wiki.crosswire.org/

And consult the page about OSIS Bibles.

David

Sent from Proton Mail <https://proton.me/mail/home> for iOS
On Sat, Sep 30, 2023 at 10:54, Timothy Allen <thrist...@gmail.com<mailto:On Sat, Sep 30, 2023 at 10:54, Timothy Allen <<a href=>> wrote:
The Berean Standard Bible is available in two machine-readableformats: USFM, and "translation tables", a 40MB Excel spreadsheetwith a row for every Hebrew or Greek word in their chosen sourcetexts with the English text it's translated to. I would like to makeone module with the nice formatting of the USFM sources and themetadata from the spreadsheet, so I've spent the last few weekswriting a script that runs through them both in parallel and makessure everything lines up, so I'm now confident that I have anaccurate mapping between them.
My question now is, how can I translate the data from the spreadsheetinto OSIS?
Here's the information the spreadsheet gives me:

Column
        Example
        Notes
he_ordinal
        1
"Hebrew Ordinal", increments for each spreadsheet row in the OldTestament, set to 999999 for each row in the New Testament
el_ordinal
        0
"Greek Ordinal", set to 0 for each row in the Old Testament,increments for each row in the New Testament, except for Mark 1:1which has a word with the number 18379.5 (presumably something neededto be inserted and they didn't want to renumber everything else)
en_ordinal
        1
"English Ordinal", increments for each spreadsheet row (except forthat word in Mark 1:1)
language
        Hebrew
        "Hebrew", "Greek", or sometimes "Aramaic"
verse_ordinal
        1
Increments for each verse in the Bible, so every word in Genesis 1:1has "1", etc.
source_word
        בְּרֵאשִׁ֖ית
The word in the original source text. Sometimes includes fancybrackets to mark sources other than WLC or Nestle 1904: {TR} ⧼RP⧽(WH) 〈NE〉 [NA] ‹SBL› [[ECM]]
transliteration
        bə·rê·šîṯ
        A transliteration of the source word into the Latin alphabet
grammar_code
        Prep-b | N-fs
A code describing the grammatical form of the word; these don'tappear to be Robinson codes, but their own custom thing for Hebrew(https://biblehub.com/hebrewparse.htm) and Greek(https://biblehub.com/abbrev.htm)
grammar_description
        Preposition-b | Noun - feminine singular
        The grammar code, unabbreviated
strongs_number
        7225
        The Strongs number of the basic form of this word
translation
        In the beginning
        The English text that appears in the BSB
gloss
        1) first, beginning, best, chief
1a) beginning
1b) first
1c) chief
1d) choice part
A definition from the Brown-Driver-Briggs Hebrew Lexicon, orThayer's Greek Definitions, as appropriate
Looking at the OSIS 2.1.1 User's Manual (and sniffing around in theKJVA module), to represent this information in OSIS I should use the<w> element, which supports the following attributes (copy/pastedfrom the Manual):
  * *gloss* Record comments on a particular word or its usage.
  * *lemma* Use to record the base form of a word.
  * *morph* Use to record grammatical information for a word.
  * *POS* Use to record the function of a word according to a
    particular view of the language's syntax.
  * *src* Use to record origin of the word.
  * *xlit* Use to record a transliteration of a word.
The first problem is that sometimes multiple source words aretranslated into a single English span, and it's not made clear how toexpress that in these attributes. From poking around in the KJVAmodule, I get the impression these are supposed to be space-delimitedlists. Is that correct?
Assuming that's the case, here's my guesses at how to fill out theseattributes for each span:
  * *gloss* can't be done, because each gloss contains spaces which
    means the displaying app can't figure out which part of the gloss
    goes with which word
  * *lemma* is where Strongs numbers go; Greek Strongs numbers should
    be prefixed with "G" and Hebrew/Aramaic ones with "H0"
  * *morph* might be used for the "grammar code" content, but I would
    probably need to figure out how to translate them into Robinson
    codes first, since that seems to be the only morphological
    dictionary module in the Crosswire repositories
  * *POS* is unclear to me, I don't see how it differs from the
    "morph" attribute
  * *src* is also unclear: is this for the word order (he_ordinal or
    el_ordinal, possibly numbered from the beginning of the verse
    rather than the beginning of the entire Bible) or the actual
    choice of source text (Nestle1904, TR, NA, SBL, etc.)?
  * *xlit* clearly comes from the "transliteration" field
One thing that's clearly missing is where to put the source word. Howdoes that work?
Is there other way to represent information that doesn't fit into the<w> element? I'd like this module to be as useful as possible, so I'mhesitant to toss out any information that can be usefully represented.
Is there anything else I've missed or misunderstood?


Timothy.
_______________________________________________
sword-devel mailing list:sword-devel@crosswire.org
http://crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] Creating a version of the BSB module with interlinear support

Reply via email to