Re: [Apertium-stuff] Please review my proposal draft

2020-03-27 Thread Tommi A Pirinen
how apertium would currently tokenise any Chinese language and how that would be improved. If/when there is no existing apertium dictionary you can make a toy example with just a handful of words, this would be very interesting. -- Doktor Tommi A Pirinen, Computational Linguist, <https://fl

Re: [Apertium-stuff] Please review my proposal draft

2020-03-26 Thread Tommi A Pirinen
c. Do you have the feeling that you know the parts of apertium pipeline to modify for the project? As I don't have so in-depth knowledge of the apertium codebase, it'd be of high importance to get feedback or co-mentor with that knowledge. -- Doktor Tommi A Pirinen, Computational Ling

[Apertium-stuff] Fwd: [Corpora-List] Postdoc fellowship in Indigenous Language Documentation and Technology

2020-03-23 Thread Tommi A Pirinen
i Arppe (ar...@ualberta.ca) --- ___ UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora Corpora mailing list corp...@uib.no https://mailman.uib.no/listinfo/corpora - End forwarded message - -- Doktor Tommi A Pirinen, Computational Linguist,

Re: [Apertium-stuff] Working around monodix trimming

2020-03-23 Thread Tommi A Pirinen
guessify / affix-guessify), it's a bit of a prototype and has issues with efficiency and the stability but the FSA algebra should be correct if the underlying FSA library's understanding of unknown alphabets works. -- Doktor Tommi A Pirinen, Computational Linguist, <https://flammie

Re: [Apertium-stuff] GSoC 2020

2020-02-25 Thread Tommi A Pirinen
e.net/lists/listinfo/apertium-stuff > >> > > ___ > > Apertium-stuff mailing list > > Apertium-stuff@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > > >

Re: [Apertium-stuff] Apertium GSoC 2020? Deadline Feb 5th

2020-02-06 Thread Tommi A Pirinen
On Tue, Feb 04, 2020 at 01:51:38PM -0500, Jonathan Washington wrote: > Everyone else interested in mentoring, please let me know. I can co-mentor this year again but with very randomly varying schedule while I'm possible moving between jobs or places. -- Doktor Tommi A Pirinen, Compu

Re: [Apertium-stuff] Lexd: a transducer compiler for prefixes and stuff

2020-02-04 Thread Tommi A Pirinen
n of the morphotactics with named patterns and all. Cool stuff, it'd be interesting to see something like this in standard language packages -- Doktor Tommi A Pirinen, Computational Linguist, <https://flammie.github.io/purplemonkeydishwasher/>, Universität Hamburg, Hamburger Zentrum f

[Apertium-stuff] Apertiumers at FOSDEM

2019-12-20 Thread Tommi A Pirinen
Hi yall, I think at least some of the apertiumers are at fosdem[1] every year, would it be of interest for apertiumers to meet at some point? I'll be there for the whole weekend you'll probably find me at the cafeteria. [1] <https://fosdem.org/2020/> -- Doktor Tommi A Pirine

Re: [Apertium-stuff] Separate Corpus Repos

2019-12-11 Thread Tommi A Pirinen
think this would also be doable option, in the end the testing and development and ci can be scripted to fetch one extra repo with relatively small effort too. Perhaps one of the arguments to keep at least a selection of annotated and hand-disambiguated gold-corpora in teh same repo with diction

Re: [Apertium-stuff] Fwd: [Mt-list] Special Issue on Machine Translation for Low-Resource Languages

2019-11-04 Thread Tommi A Pirinen
it and comment of course. -- Doktor Tommi A Pirinen, Computational Linguist, <https://flammie.github.io/purplemonkeydishwasher/>, Universität Hamburg, Hamburger Zentrum für Sprachkorpora <http://hzsk.de>. CLARIN-D Entwickler. President of ACL SIGUR SIG for Uralic languages <http://gtweb.

Re: [Apertium-stuff] Fwd: [Mt-list] Special Issue on Machine Translation for Low-Resource Languages

2019-10-23 Thread Tommi A Pirinen
ice comparison between apertium and state-of-the-art NMT that should be replicated in this article. -- Doktor Tommi A Pirinen, Computational Linguist, <https://flammie.github.io/purplemonkeydishwasher/>, Universität Hamburg, Hamburger Zentrum für Sprachkorpora <http://hzsk.de>. CLARIN-D

Re: [Apertium-stuff] Fwd: [Mt-list] Special Issue on Machine Translation for Low-Resource Languages

2019-10-21 Thread Tommi A Pirinen
rowing things together? -- Doktor Tommi A Pirinen, Computational Linguist, <https://flammie.github.io/purplemonkeydishwasher/>, Universität Hamburg, Hamburger Zentrum für Sprachkorpora <http://hzsk.de>. CLARIN-D Entwickler. President of ACL SIGUR SIG for Uralic languages <http://gt

Re: [Apertium-stuff] GSoC19 - Unsupervised weighting of automata progress update

2019-07-25 Thread Tommi A Pirinen
ted segments to specific tags to help weighting or voting and get more better scores, like, linguistically we know that all in Finnish will be ssa or ssä and all will be sta or stä, some relatively simple counting might work. Perhaps this is already in some more advanced morfessing models like fla

Re: [Apertium-stuff] GSoC19 - Unsupervised weighting of automata progress update

2019-07-24 Thread Tommi A Pirinen
can sketch and build an experiment in few days it's worth looking into. -- Doktor Tommi A Pirinen, Computational Linguist, <https://flammie.github.io/purplemonkeydishwasher/>, Universität Hamburg, Hamburger Zentrum für Sprachkorpora <http://hzsk.de>. CLARIN-D Entwickler. President of

Re: [Apertium-stuff] Apertium and ICU

2019-05-27 Thread Tommi A Pirinen
seen in issues recently that outweighs it. -- Doktor Tommi A Pirinen, Computational Linguist, <https://flammie.github.io/purplemonkeydishwasher/>, Universität Hamburg, Hamburger Zentrum für Sprachkorpora <http://hzsk.de>. CLARIN-D Entwickler. President of ACL SIGUR SIG for Uralic langu

Re: [Apertium-stuff] Ubuntu 14.04 EOL. And 32bit EOL?

2019-05-21 Thread Tommi A Pirinen
needs to be replaced dist: xenial now. I do not remember why I had to hard-code the dist originally but I'm guessing we'll find out soon. -- Doktor Tommi A Pirinen, Computational Linguist, <https://flammie.github.io/purplemonkeydishwasher/>, Universität Hamburg, Hamburger Zentrum f

Re: [Apertium-stuff] GSoC 2019 project discussion - Unsupervised weighting of automata

2019-04-05 Thread Tommi A Pirinen
e throughout summer * I don't have any good pointers for the background, maybe check through what other fst folk have done: http://www.opengrm.org/twiki/bin/view/GRM/WebHome https://aclweb.org/aclwiki/SIGFSM -- Doktor Tommi A Pirinen, Computational Linguist, <https://flammie

Re: [Apertium-stuff] OpenNMT

2019-03-11 Thread Tommi A Pirinen
penNMT-py#quickstart I have not gotten it to produce the bleu points that it should though... -- Doktor Tommi A Pirinen, Computational Linguist, <https://flammie.github.io/purplemonkeydishwasher/>, Universität Hamburg, Hamburger Zentrum für Sprachkorpora <http://hzsk.de>. CLARIN-D Entw

Re: [Apertium-stuff] phonology across word boundaries for HFST generator?

2019-03-01 Thread Tommi A Pirinen
y the same and the apertium ~ feature might be better for that. -- Doktor Tommi A Pirinen, Computational Linguist, <https://flammie.github.io/purplemonkeydishwasher/>, Universität Hamburg, Hamburger Zentrum für Sprachkorpora <http://hzsk.de>. CLARIN-D Entwickler. President of ACL SIGUR SI

[Apertium-stuff] CfP: 5th IWCLUL, deadline extension (was: Re: CfP: 5th IWCLUL (International Workshop for Computational Linguistics of Uralic Languages))

2018-11-12 Thread Tommi A Pirinen
wrote: > [Sorry for multiposting, please consider submitting your Uralic > apertiums to us :-] > > https://sisu.ut.ee/iwclul2019: -- Doktor Tommi A Pirinen, Computational Linguist, <https://flammie.github.io/purplemonkeydishwasher/>, Universität Hamburg, Hamburger Zentrum für S

Re: [Apertium-stuff] New lttoolbox binary format

2018-08-14 Thread Tommi A Pirinen
s later. > [...] > This means we can add features, fluff, and versioning to the binary format > later with less pain. > > Anyone got any bikeshedding for the format break? Seems perfect, actually what I intended HFST format to be like as well but we didn't quite get there.

Re: [Apertium-stuff] Implementation of weights in lttoolbox

2018-08-07 Thread Tommi A Pirinen
ept. > We just need to make sure that this is properly documented. Exactly, and for a GSOC project, we should probably have documentation as a part of final things to do, I believe there is some already in the wiki even? -- Doktor Tommi A Pirinen, Computational Linguist, <https://flammie

[Apertium-stuff] Apertiums at European Summer University for Digital Humanities

2018-08-06 Thread Tommi A Pirinen
pertium/apertium-bas> [2] <https://github.com/apertium/apertium-btc> [3] <https://github.com/apertium/apertium-bas-btc> [4] <https://github.com/apertium/apertium-kar> -- Doktor Tommi A Pirinen, Computational Linguist, <https://flammie.github.io/purplemonkeydishwasher/>, Univ

Re: [Apertium-stuff] Release lttoolbox 3.4.2 & apertium-separable 0.3.1

2018-06-15 Thread Tommi A Pirinen
uild :-) Thanks, [0] <https://github.com/flammie/flammie-overlay> -- Doktor Tommi A Pirinen, Computational Linguist, <https://flammie.github.io/purplemonkeydishwasher/>, Universität Hamburg, Hamburger Zentrum für Sprachkorpora <http://hzsk.de>. CLARIN-D Entwickler. Presid

Re: [Apertium-stuff] enquiry for python API for Apertium

2018-04-18 Thread Tommi A Pirinen
re unimportant to me, but it should be simple and not require messing about with irrelevant datatypes (headers/streams/alphabets/pipelines/...) and underlying complexities like many c++-to-python bindings tend to be. [1] <https://pyfst.github.io/> -- Doktor Tommi A Pirinen, Comput

Re: [Apertium-stuff] How to Release & New Versioning Scheme

2018-03-15 Thread Tommi A Pirinen
files were not added to the dist target. This is what the autotools make target distcheck is for, it makes a tarball, unpacks, builds (with VPATH and writeonly srcdir), checks and tests all sorts of stuff. It can be extra-annoying with some picky checks but if you pass them you should

[Apertium-stuff] Release fin-deu-0.0.1

2018-03-06 Thread Tommi A Pirinen
(Packaging could be made with the release tag https://github.com/flammie/apertium-fin-deu/releases/tag/0.0.1 ) -- Doktor Tommi A Pirinen, Computational Linguist, <https://flammie.github.io/purplemonkeydishwasher/>, Universität Hamburg, Hamburger Zentrum für Sprachkorpora <http://h

Re: [Apertium-stuff] Lttoolbox tasks - use HFST instead?

2018-02-18 Thread Tommi A Pirinen
e tasks can be used to submit a gsoc application that includes 3 months of work (or combined into one). -- Doktor Tommi A Pirinen, Computational Linguist, <https://flammie.github.io/purplemonkeydishwasher/>, Universität Hamburg, Hamburger Zentrum für Sprachkorpora <http://hzsk.de>. CLARIN-D E

[Apertium-stuff] Can we have apertium shirts and hoodies for next fosdem?

2018-02-06 Thread Tommi A Pirinen
12:07 < Flammie> spectei: was there some apertium shirt design somewhere on sale? it would be cool for next fosdem :-D 12:07 < spectei> hmm, i don't think so but it's a good idea just an idea, I guess this needs someone who guards the apertium logo etc. rights to do. --

Re: [Apertium-stuff] Proposal: Move Apertium to Github

2018-02-05 Thread Tommi A Pirinen
tion of like CVS would've got ten years ago :-D [0] <https://github.com/flammie/apertium-fin-deu> -- Doktor Tommi A Pirinen, Computational Linguist, <https://flammie.github.io/purplemonkeydishwasher/>, Universität Hamburg, Hamburger Zentrum für Sprachkorpora <http://hzsk.de&

Re: [Apertium-stuff] GSOC flyer translation

2018-01-15 Thread Tommi A Pirinen
icipants receive a stipend, enabling them to focus on their programming projects for three months." heh well, this looks like something I might write in a conference paper... is it meant to be (machine) untranslateable? ;-) -- Doktor Tommi A Pirinen, Computational Linguist, <

Re: [Apertium-stuff] 2nd CfP IWCLUL 2018

2017-12-15 Thread Tommi A Pirinen
ic languages missing apertium pairs! Please > submit your resources to us. Take a note of new LaTeX stylesheets here > [1]. Deadline coming up in less than two months > > [1] > <https://github.com/acl-sigur/iwclul-latex/releases/tag/iwclul-2018> > > 2017-07-11, Tommi A Pirinen s

Re: [Apertium-stuff] about comment/c attribute again

2017-12-11 Thread Tommi A Pirinen
idity error : No declaration for attribute c of element r Document apertium-fin-deu.fin-deu.dix does not validate against /usr/share/lttoolbox/dix.dtd But now I fixed the DTD and it works for me. -- Doktor Tommi A Pirinen, Computational Linguist, <https://flammie.github.io/purplemonkeydishwas

[Apertium-stuff] about comment/c attribute again

2017-12-11 Thread Tommi A Pirinen
is is result of wrong de-compounding", "bar surface is in old orthography". Would it be ok to actually just have comment attributes for the l and r elements just as well? How do other language pairs use @c? -- Doktor Tommi A Pirinen, Computational Linguist, <https://flammie

Re: [Apertium-stuff] is treating newlines the same as spaces the right thing to do ?

2017-11-08 Thread Tommi A Pirinen
ML, cause in reality we do often need to sneak in metadata for most of the texts, it'd be a good thing to have a proper way of doing this instead of relying on filenames and magic characters. -- Doktor Tommi A Pirinen, Computational Linguist, <https://flammie.github.io/purplemonkeydishwas

Re: [Apertium-stuff] Apertium-stuff] separable words module -- call for requests

2017-08-11 Thread Tommi A Pirinen
a build system. [1] <https://pastebin.com/xWS18fBx> [2] <https://github.com/flammie/apertium-fin-deu/commit/925f212e7effef2fae84e4c7967018b6f100b962> -- Doktor Tommi A Pirinen, Computational Linguist, <https://flammie.github.io/purplemonkeydishwasher/>, Universität Hamburg, Ham

Re: [Apertium-stuff] Genitives in apertium-eng

2017-07-25 Thread Tommi A Pirinen
> > > This should do the trick. Yep, works neatly now, thanks again. I had used lu stuff earlier I probably had had difficulties with + or something that I thought it was magical. -- Doktor Tommi A Pirinen, Computational Linguist, <https://flammie.github.io/purplemonkeydishwasher

Re: [Apertium-stuff] Genitives in apertium-eng

2017-07-25 Thread Tommi A Pirinen
kissa. + pojan kolli. [1] <http://wiki.apertium.org/wiki/Apertium-fin-eng/Pending_tests#Genitives> -- Doktor Tommi A Pirinen, Computational Linguist, <https://flammie.github.io/purplemonkeydishwasher/>, Universität Hamburg, Hamburger Zentrum für Sprachkorpora <http://hzsk.de

Re: [Apertium-stuff] Toponims and gender/number in (some) monolingual dictionaries

2017-05-31 Thread Tommi A Pirinen
that provides a bit of protection against such changes but only for one-direction of the translation pair and breaks the other direction. This can also be combined with gradually changing the things and

Re: [Apertium-stuff] .lexc structure

2017-03-16 Thread Tommi A Pirinen
pairs[4] using these language modules instead of apertium ones. [1] <http://giellatekno.uit.no/doc/infra/infraremake/> [2] <http://giellatekno.uit.no/doc/infra/infraremake/NewInfraOverview.html> [3] <https://gtsvn.uit.no/langtech/trunk/langs/fin/src/morphology/> [4] &l

Re: [Apertium-stuff] avoiding forced-uppercase generation when identical dictionary-uppercase exists

2017-02-28 Thread Tommi A Pirinen
PC-ane/PC-ANE > > Would it make sense for lt-proc to not output forced-uppercase > analyses if there's an otherwise identical dictionary-uppercase > analysis? (Would it be easily implementable?) I have no idea of the implementability of this, but it affects most fin- pairs, so i

Re: [Apertium-stuff] Greedy hfst-fst2strings

2017-01-18 Thread Tommi A Pirinen
as an argument against the word-form list / database morphology approach. Btw, the above experiment generated 9 megabytes of word-forms from 4 noun lexemes, maybe they aren't what would be generally be wanted for "all word-forms", but it is likely apertium-tat won't

Re: [Apertium-stuff] Greedy hfst-fst2strings

2017-01-17 Thread Tommi A Pirinen
ut I haven't tested it. Using Finnish morphology this can yield ~200 gigabytes to multiple terabytes of word-forms. I don't know if that is also the case for Tatar but something you should generally keep in mind when processing not your average indoeuropean languages like this. -- Doktor Tom

[Apertium-stuff] Final CfP and Extension: IWCLUL 2017

2016-11-08 Thread Tommi A Pirinen
t; Travel > >Participants from outside Russia area may require a visa to visit >Russia. If you require an invitation letter confirming your >participation, please get in contact with the local organising >committee. > >A small number of travel stipends will be availab

[Apertium-stuff] Third International Workshop for Computational Linguistics of Uralic Langues

2016-07-04 Thread Tommi A Pirinen
ravel stipends will be available for authors of accepted papers. After submitting your paper please contact the organising committee to request consideration. Invited speaker * Heiki-Jaan Kaalep Organisers * Tommi A. Pirinen, Universität Hamburg * Francis M. Tyers, UiT N

[Apertium-stuff] IWCLUL final CfP / deadline extension

2014-10-29 Thread Tommi A Pirinen
Organisers * Tommi A. Pirinen, Dublin City University * Francis M. Tyers, UiT Norgga árktalaš universitehta * Trond Trosterud, UiT Norgga árktalaš universitehta Programme committee * Тимофей Архангельский, Национальный исследовательский университет "Высшая школа экономи

[Apertium-stuff] 2nd CfP (FIWCLUL): First International Workshop on Computational Linguistics for Uralic Languages

2014-08-20 Thread Tommi A Pirinen
organising committee to request consideration. Organisers * Tommi A. Pirinen, Dublin City University * Francis M. Tyers, UiT Norgga árktalaš universitehta * Trond Trosterud, UiT Norgga árktalaš universitehta Programme committee * Тимофей Архангельский, Национальный

[Apertium-stuff] CfP: First International Workshop on Computational Linguistics for Uralic Languages (FIWCLUL)

2014-07-01 Thread Tommi A Pirinen
-fakultetet at UiT The Arctic University of Norway in Tromsø, Norway. Organisers * Tommi A. Pirinen, Dublin City University * Francis M. Tyers, UiT Norgga árktalaš universitehta * Trond Trosterud, UiT Norgga árktalaš universitehta Programme committee * Тимофей