Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-09-04 Thread Tanmai Khanna
In response to this issue, empty blanks are considered blanks again, so a or a in the output will put an empty blank if the input had an empty blank in that position. Once again, a and a now does the same thing. Earlier, was used to print a blank from the input and was used to print a litera

Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-09-03 Thread Hèctor Alòs i Font
Missatge de Tanmai Khanna del dia dv., 4 de set. 2020 a les 9:22: > Hèctor, > Yes, the new improvements aren't backwards compatible but that's because > they're better than the system we had earlier. Here's the changes: > > So, you are saying that the new stuff is not backwards compatible, aren't

Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-09-03 Thread Tanmai Khanna
Hèctor, Yes, the new improvements aren't backwards compatible but that's because they're better than the system we had earlier. Here's the changes: So, you are saying that the new stuff is not backwards compatible, aren't > you? There aren't any in the rule, but , which is not > the same. Until n

Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-09-03 Thread Hèctor Alòs i Font
Missatge de Tanmai Khanna del dia dj., 3 de set. 2020 a les 23:10: > Hèctor, > The extra blank there is because there's a blank in your rule output. See: > > $ echo "^052/052$^F/F$" | apertium-transfer > -z -b 'apertium-fra-frp.frp-fra.t1x' 'frp-fra.t1x.bin' > > ^num_n{^052$ ^F$}$ > > > The rule

Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-09-03 Thread Tanmai Khanna
Hèctor, The extra blank there is because there's a blank in your rule output. See: $ echo "^052/052$^F/F$" | apertium-transfer -z -b 'apertium-fra-frp.frp-fra.t1x' 'frp-fra.t1x.bin' ^num_n{^052$ ^F$}$ The rule for num_n has a in the rule output and hence there's a space. The reason earlier the

Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-09-03 Thread Hèctor Alòs i Font
Hi Tanmai, Yes, hyphens and quotes (") seem to be solved. But the system persists to add blanks where there were not. For instance, this causes that we get now strange Unicode codes: 05076. Table des caractères Unicode U+0500 à U+052F. < 05076. Tâbla des caractèros Unicode *U+0500 a *U+052F. ---

Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-09-03 Thread Tanmai Khanna
Hèctor can you check the page on beta now? The hyphen and the superscript issues are solved. Of course, there's now a space between l and ér. If that's a big problem we can discuss other solutions. *तन्मय खन्ना * *Tanmai Khanna* On Thu, Sep 3, 2020 at 8:09 PM Tino Didriksen wrote: > I have adj

Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-09-03 Thread Tino Didriksen
I have adjusted Transfuse with how spaces are treated for Apertium, and implemented adding temporary spaces around and . Changes are deployed on beta. I repeat my plea that all symbols should have an analysis. It breaks markup that things like - and : are not tokens. -- Tino Didriksen On Wed,

Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-09-03 Thread Kevin Brubeck Unhammer
Tanmai Khanna čálii: >> So currently if I have the multiword "i dag", it'll recognize > "idag" but it won't recognize "i dag"? (And I suppose if > I have the non-multiword "today" it won't recognize "today".) > > Exactly, but even when it recognises "idag", the will probably > be lost because it

Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-09-03 Thread Tanmai Khanna
> So currently if I have the multiword "i dag", it'll recognize "idag" but it won't recognize "i dag"? (And I suppose if I have the non-multiword "today" it won't recognize "today".) Exactly, but even when it recognises "idag", the will probably be lost because it's being seen as a normal blank.

Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-09-03 Thread Kevin Brubeck Unhammer
Tanmai Khanna čálii: > the analyser sees wordbound blanks as normal blanks, So currently if I have the multiword "i dag", it'll recognize "idag" but it won't recognize "i dag"? (And I suppose if I have the non-multiword "today" it won't recognize "today".) One possibility might be to have wordb

Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-09-03 Thread Tanmai Khanna
Oh I see the hyphen thing. That should've been fixed after the latest commit. Will check it out. *तन्मय खन्ना * *Tanmai Khanna* On Thu, Sep 3, 2020 at 12:34 PM Tanmai Khanna wrote: > Hey, > As of now, the analyser sees wordbound blanks as normal blanks, and so > when they occur, the dictionary

Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-09-03 Thread Tanmai Khanna
Hey, As of now, the analyser sees wordbound blanks as normal blanks, and so when they occur, the dictionary will often not recognise multiwords. The reason this was done was because we are offloading multiwords to apertium-separable anyway. As for Iér, given that Tino Didriksen is able to fix this

Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-09-02 Thread Tino Didriksen
That's not something the pipe ever sees - you can't fix it on your end. It's something I have to adjust in Transfuse. https://github.com/TinoDidriksen/Transfuse/blob/master/src/dom.cpp#L604 and L629 expands inline tags to encompass surrounding plain text, because it is unfortunately common for for

Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-09-02 Thread Hèctor Alòs i Font
I'm taking a look on how this list of names on Wikipedia: https://frp.wikipedia.org/wiki/Lista_des_comtos_et_ducs_de_Savou%C3%A8 and how it is translated in beta.apertium: https://beta.apertium.org/index.fra.html?dir=frp-fra&qP=https%3A%2F%2Ffrp.wikipedia.org%2Fwiki%2FLista_des_comtos_et_ducs_de_Sa

Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-08-27 Thread Tanmai Khanna
Unhammer I think I've implemented this in: https://github.com/apertium/apertium/pull/102 . If it looks good I can implement in interchunk and postchunk as well. The blanks are stored as a queue and output in available s in the rule output. If any are remaining they're output after the rule output.

Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-08-27 Thread Kevin Brubeck Unhammer
Tanmai Khanna čálii: > So what I'll try to do, is after the blanks are collected, lets say X is > the number of source LUs in the pattern and Y is the number of output LUs. > If X = Y then we can keep them in the same place, if X < Y, then we can > keep them in the first X gaps the rest can be sp

Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-08-27 Thread Tanmai Khanna
Again, remember these aren't wordbound blanks or block tags, just superblanks, like , or other tags that aren't hard breaks or wordbound. *तन्मय खन्ना * *Tanmai Khanna* On Thu, Aug 27, 2020 at 2:33 PM Tanmai Khanna wrote: > Or, if we want to give the user complete control of the blanks between

Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-08-27 Thread Tanmai Khanna
Or, if we want to give the user complete control of the blanks between the output LUs, i.e. if they want a space or not, we can just flush all the blanks in the patterns before any LUs are output (or after). It's considerably easier to implement and gives the user complete control over the spaces b

Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-08-27 Thread Tanmai Khanna
So what I'll try to do, is after the blanks are collected, lets say X is the number of source LUs in the pattern and Y is the number of output LUs. If X = Y then we can keep them in the same place, if X < Y, then we can keep them in the first X gaps the rest can be spaces or whatever the user denot

Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-08-27 Thread Francis Tyers
El 2020-08-27 09:54, Kevin Brubeck Unhammer escribió: Tanmai Khanna čálii: Hmm, well then I guess for now the small list of tags that still exist as blanks have to be printed using the option. I could change it so that it flushes all blanks, or keeps them in their position if possible. The

Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-08-27 Thread Kevin Brubeck Unhammer
Tanmai Khanna čálii: > Hmm, well then I guess for now the small list of tags that still exist as > blanks have to be printed using the option. I could change it > so that it flushes all blanks, or keeps them in their position if possible. > The good thing is that these aren't wordbound semantica

Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-08-27 Thread Tanmai Khanna
Hmm, well then I guess for now the small list of tags that still exist as blanks have to be printed using the option. I could change it so that it flushes all blanks, or keeps them in their position if possible. The good thing is that these aren't wordbound semantically so them not being in their

Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-08-27 Thread Kevin Brubeck Unhammer
Tanmai Khanna čálii: > I always thought that's the default behaviour. That if some blanks aren't > explicitly printed in the transfer rules then they're flushed. I'll check > it out, but it should be that. The old behaviour has been to just throw away anything that's eaten by a rule but not expl

Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-08-26 Thread Tanmai Khanna
I always thought that's the default behaviour. That if some blanks aren't explicitly printed in the transfer rules then they're flushed. I'll check it out, but it should be that. *तन्मय खन्ना * *Tanmai Khanna* On Thu, Aug 27, 2020 at 1:27 AM Kevin Brubeck Unhammer wrote: > Tanmai Khanna > čál

Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-08-26 Thread Kevin Brubeck Unhammer
Tanmai Khanna čálii: > Thanks Unhammer! > So now we have three kinds of units: block tags, superblanks, and wordbound > blanks. Block tags are hard breaks in the text, wordbound blanks move > around with words, and superblanks are tags that aren't hard breaks but not > attached to words (such as

Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-08-26 Thread Tanmai Khanna
Thanks Unhammer! So now we have three kinds of units: block tags, superblanks, and wordbound blanks. Block tags are hard breaks in the text, wordbound blanks move around with words, and superblanks are tags that aren't hard breaks but not attached to words (such as ). Tino can give you a list of ta

Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-08-26 Thread Kevin Brubeck Unhammer
Woohoo congrats and thanks for all the hard work Tanmai and Tino =D The superblank issues have been a pain for quite some time. How does it work with transfer now, what are the semantics of things like or just ? signature.asc Description: PGP signature _

[Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-08-26 Thread Tanmai Khanna
Hey guys! The markup handling project reached all of its goals this week. While it will continue to be improved, it is in a state that’s ready to be tested with real world data now. *We have updated https://beta.apertium.org so that if you translate a document/webpage o