Hi all, I made a thing: https://apertium.trigram.no/?dir=nob-nno&q=Vi%20liker%20enten%20%C3%A5%20fortsette%20%C3%A5%20bygge%20n%C3%A5r%20vi%20blant%20annet%20s%C3%B8ker%20forskjellen%20mens%20dere%20er%20uenige.#translation (try toggling the various "Style preferences")
Norway in general has a very positive view on dialects, and there are quite a lot of accepted spelling/word alternatives in both the Bokmål and Nynorsk variants of Norwegian. And people have style preferences that don't map cleanly into disjoint sets – some people want {me,a,kj}, others want {me,e,kj} or {vi,a,k} etc. The current system of alt-attributes and multiple binaries doesn't scale here. With just two options, you have to have four alt-values (and lots of duplication in .dix) and four generators – we'd like dozens of options, but without compiling 2^dozens of generators. So instead, why not just generate everything and disambiguate? CG can read variables inserted as blanks in the stream, and then use those to decide which style preferences to keep or throw away. Apparently others already do this.[1] I've got some branches of apertium-nno-nob/apertium-nno where it's implemented. We put a cg-proc command after both bidix and the generator, and change the generator to use the bilingual format (lt-proc -b).[2] For bidix-specified preferences, when translating from right to left, we remove the r="LR" on the entries in question, and match on lemmas in the biprefs.rlx CG file:[3] SELECT ("skilnad"i) IF (0 ("forskjell"i) + (VAR:forskjell_skilnad)); REMOVE ("skilnad"i) IF (0 ("forskjell"i)); The preference variable here is named "forskjell_skilnad" since the default is "forskjell", but if that option is ticked / variable is set, we choose "skilnad". For generator-specified preferences, we remove the LR's and instead add a tag when generating, typically through a pardef, e.g. this one is used for the set of words where 'kj' kan be written as 'k': <pardef n="v:kj_k"> <e r="LR"> <p><l></l> <r></r></p></e> <e r="RL"> <p><l><s n="v:kj_k"/></l> <r></r></p></e> </pardef> I gave cg-proc a new switch -g/--generation that outputs lexical units without without tags (unless --trace is also given) and without surrounding ^$. The generator (running with lt-proc -b) now gives $ echo spøkelse | apertium -d . nob-nno-dgen ^spøkelse<n><nt><sg><ind>/spø<v:kj_k>kelse/spøkjelse$^.<sent><clb>/.$ and yeah maybe the tag is at a weird spot but cg-proc doesn't mind: $ echo '^spøkelse<n><nt><sg><ind>/spø<v:kj_k>kelse/spøkjelse$' | cg-proc -n -g -w nob-nno.genprefs.rlx.bin spøkjelse echo '[<STREAMCMD:SETVAR:kj_k>]^spøkelse<n><nt><sg><ind>/spø<v:kj_k>kelse/spøkjelse$' | cg-proc -n -g -w nob-nno.genprefs.rlx.bin [<STREAMCMD:SETVAR:kj_k>]spøkelse Those STREAMCMD's are hard to do manually, so /usr/bin/apertium can now insert and strip them by reading AP_SETVAR: $ export AP_SETVAR $ for AP_SETVAR in "" "kj_k" "kj_k,infa_infe" "infa_infe"; do echo spøke| apertium -d . nob-nno; done spøkja spøka spøke spøkje The changes to apertium and vislcg3 are merged, but the html-tools changes may need deuglifying and testing. I also haven't merged the changes to apy[6] yet, since we need to bikeshed how to get the list of possible options from the language pair (which html-tools shows). Currently the apy branch just hardcodes the list for nob→nno[7] :) Some preferences are in the monolingual packages and some in the pair, so the preferences for a pair need to include both. Perhaps each package includes one or more preferences.xml files, and then modes.xml can do <mode name="nob-nno"> <pipeline>…</pipeline> <preferences> <pref path="nob-nno.preferences.xml"/> <pref path="nno.preferences.xml"/> </preferences> </mode> where nno.preferences.xml is copied by make from $(LANG1)/preferences.xml and contains something <preference id="kj_k"> <description lang="nno">søkje → søke</description> <description lang="nob">søkje → søke</description> </preference> (then apy would have to parse modes.xml and the files listed there) Or is there a better way? [1] https://github.com/TinoDidriksen/cg3/issues/68#issuecomment-736571504 [2] https://github.com/apertium/apertium-nno-nob/compare/biprefs#diff-94c6b34f4d7517dc0915b07677ec9a8656b559e0bbdb683a66d4270790b88812R21-L39 [3] https://github.com/apertium/apertium-nno-nob/compare/biprefs#diff-4e8f9b0972e6a59cd53d18a476d8d01bbe772f29a5c481659b24185d6839cd1dR14-R15 [4] https://github.com/apertium/apertium-nno/compare/biprefs#diff-4e49c13e8aa1b44221621ad844edbd9be60169e8baa78ed7bbb80b4237865573R638-R641 [5] https://github.com/apertium/apertium-html-tools/tree/biprefs [6] https://github.com/apertium/apertium-apy/tree/biprefs [7] https://github.com/apertium/apertium-apy/blob/c16902972097d4e9dd9af1d14413562f96d32604/apertium_apy/handlers/translate.py#L219 _______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff