"Bernard Chardonneau" <bechapert...@free.fr>

> Hey everybody.
> After 10 days mostly in the nature without a computer and just before
> 8 other weeks without a permanent internet connexion (widely chosen),
> I want to give my opinion as a new pair developer about the discussion
> about what should countain dictionaries.
> 1) For monodices, I perfectly agree with Fran and some others to think
> all interesting information should be there even if not used for several
> pairs.
> As doing that generally means to write a complete paradigm, and after
> just to use it hundred or thousand of times for the main ones, it is
> not a big problem.
> 2) For bidixes, the most natural way to build them is to write something
> like :
> <e><p><l>my_word<s n="kind1"/></l><r>my_translation<s n="kind2"/></r></p></e>
> where kind1 and kind2 are often the same and can be built from the
> name of the paradigm used in the monodix.
> I tell that because I quickly realised that including a new line
> typing the right xml syntax in a file with more 40 000 other lines
> becomes quickly painful.
> So I wrote a 4 parameter shell to generate new lines, and another
> to put these lines at the good place.
> I think a lot of pair developers have their own shell to do the
> same or something similar to build a bidix when monodices are
> available.
> So, making bidixes lines like as above means other <s n="something"/>
> would be better if not needed.
> Of course, there are exceptions witch permit to get pleasant results
> like in fr-es pair :
> <e><p><l>coma<s n="n"/><s n="m"/></l><r>coma<s n="n"/><s n="m"/></r></p></e>
> <e><p><l>virgule<s n="n"/><s n="f"/></l><r>coma<s n="n"/><s 
> n="f"/></r></p></e>
> or
> <e><p><l>composant<s n="n"/><s n="m"/></l><r>componente<s n="n"/><s 
> n="m"/></r></p></e>
> <e><p><l>composante<s n="n"/><s n="f"/></l><r>componente<s n="n"/><s 
> n="f"/></r></p></e>
> But having to write (in eo-fr pair)
> <e><p><l>ABC<s n="np"/><s n="al"/></l><r>ABC<s n="np"/><s n="al"/><s 
> n="mf"/></r></p></e>
> without forgeting any <s n="al"/> or the <s n="mf"/> to prevent
> getting a # in the translation, is not a very nice way to work.
> There is of course the problem of the beginner not doing that and
> asking on the list why it does not work. But that can be learned
> quickly.
> But the most important problem is being obliged to do that quite
> allways and finaly having bigger and a little less readable lines
> in the bidix.
> I think event in this case :
> <e><p><l>ajout<s n="n"/><s n="m"/></l><r>adición<s n="n"/><s
> n="f"/></r></p></e>(gender changing), there should be no need to give
> gender if there
> is no word ambiguity in each langage (like for coma and componente
> in Spanish).
> And of course something like :
> <e r="LR"><p><l>binaire<s n="adj"/><s n="mf"/></l><r>binario<s
> n="adj"/><s n="GD"/></r></p></e>
> <e r="RL"><p><l>binaire<s n="adj"/><s n="mf"/></l><r>binario<s
> n="adj"/><s n="f"/></r></p></e>
> <e r="RL"><p><l>binaire<s n="adj"/><s n="mf"/></l><r>binario<s
> n="adj"/><s n="m"/></r></p></e>
> would become more simple in one line.
> So, the question is how to succeed to do that without breaking things.
> Solution 1 : paradigm
> Several people spoke about it but without details.
> I remark the information <s n="kind"/> inside bidixes can generally
> be generated from the name of the paradigm used in the monodix
> witch looks like "something__kind" (or "foo__bar" if you prefer).
> But of course, there is les information in "kind" than in
> "something__kind".
> So a nice approach woud be for each paradigm of every monodix, to
> build a paradigm with the same name in the bidix just countaining
> an invariant list of informations like :
> <s n="thing1"/><s n="thing2"/>
> And like that, even gender ambiguities like for the Spanish word
> coma could be solved elegantly :
> <e><p><l>coma<s n="livre__n"/></l><r>coma<s n="abismo__n"/></r></p></e>
> <e><p><l>virgule<s n="abeille__n"/></l><r>coma<s n="abeja__n"/></r></p></e>

Didn't Jacob Nordfalk and Michael Kristensen make a script to do that
kind of thing with sv-da? Ie. automatically create bidix pardefs based
on monodix pardefs.

> Solution 2 : during compilation
> That's another approch. For compiling bidixes files, two cases :
> - an information is in a <s n="thing"/> , so just use it
> - this information is not indicated, so it is taken from the
>   monodix.
> Have a good summer.

You too :-)

Apertium-stuff mailing list

