how
apertium would currently tokenise any Chinese language and how that
would be improved. If/when there is no existing apertium dictionary you
can make a toy example with just a handful of words, this would be very
interesting.
--
Doktor Tommi A Pirinen, Computational Linguist,
<https://fl
c.
Do you have the feeling that you know the parts of apertium pipeline to
modify for the project? As I don't have so in-depth knowledge of the
apertium codebase, it'd be of high importance to get feedback or
co-mentor with that knowledge.
--
Doktor Tommi A Pirinen, Computational Ling
i Arppe (ar...@ualberta.ca)
---
___
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
corp...@uib.no
https://mailman.uib.no/listinfo/corpora
- End forwarded message -
--
Doktor Tommi A Pirinen, Computational Linguist,
guessify / affix-guessify), it's a bit of
a prototype and has issues with efficiency and the stability but the FSA
algebra should be correct if the underlying FSA library's understanding
of unknown alphabets works.
--
Doktor Tommi A Pirinen, Computational Linguist,
<https://flammie
e.net/lists/listinfo/apertium-stuff
> >>
> > ___
> > Apertium-stuff mailing list
> > Apertium-stuff@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> >
>
On Tue, Feb 04, 2020 at 01:51:38PM -0500, Jonathan Washington wrote:
> Everyone else interested in mentoring, please let me know.
I can co-mentor this year again but with very randomly varying schedule
while I'm possible moving between jobs or places.
--
Doktor Tommi A Pirinen, Compu
n of
the morphotactics with named patterns and all.
Cool stuff, it'd be interesting to see something like this in standard
language packages
--
Doktor Tommi A Pirinen, Computational Linguist,
<https://flammie.github.io/purplemonkeydishwasher/>, Universität
Hamburg, Hamburger Zentrum f
Hi yall,
I think at least some of the apertiumers are at fosdem[1] every year,
would it be of interest for apertiumers to meet at some point? I'll be
there for the whole weekend you'll probably find me at the cafeteria.
[1] <https://fosdem.org/2020/>
--
Doktor Tommi A Pirine
think this would also be doable option, in the end the testing and
development and ci can be scripted to fetch one extra repo with
relatively small effort too.
Perhaps one of the arguments to keep at least a selection of annotated
and hand-disambiguated gold-corpora in teh same repo with diction
it and comment of course.
--
Doktor Tommi A Pirinen, Computational Linguist,
<https://flammie.github.io/purplemonkeydishwasher/>, Universität
Hamburg, Hamburger Zentrum für Sprachkorpora <http://hzsk.de>. CLARIN-D
Entwickler. President of ACL SIGUR SIG for Uralic languages
<http://gtweb.
ice
comparison between apertium and state-of-the-art NMT that should be
replicated in this article.
--
Doktor Tommi A Pirinen, Computational Linguist,
<https://flammie.github.io/purplemonkeydishwasher/>, Universität
Hamburg, Hamburger Zentrum für Sprachkorpora <http://hzsk.de>. CLARIN-D
rowing things together?
--
Doktor Tommi A Pirinen, Computational Linguist,
<https://flammie.github.io/purplemonkeydishwasher/>, Universität
Hamburg, Hamburger Zentrum für Sprachkorpora <http://hzsk.de>. CLARIN-D
Entwickler. President of ACL SIGUR SIG for Uralic languages
<http://gt
ted
segments to specific tags to help weighting or voting and get more
better scores, like, linguistically we know that all in Finnish
will be ssa or ssä and all will be sta or stä, some relatively
simple counting might work. Perhaps this is already in some more
advanced morfessing models like fla
can
sketch and build an experiment in few days it's worth looking into.
--
Doktor Tommi A Pirinen, Computational Linguist,
<https://flammie.github.io/purplemonkeydishwasher/>, Universität
Hamburg, Hamburger Zentrum für Sprachkorpora <http://hzsk.de>. CLARIN-D
Entwickler. President of
seen in issues
recently that outweighs it.
--
Doktor Tommi A Pirinen, Computational Linguist,
<https://flammie.github.io/purplemonkeydishwasher/>, Universität
Hamburg, Hamburger Zentrum für Sprachkorpora <http://hzsk.de>. CLARIN-D
Entwickler. President of ACL SIGUR SIG for Uralic langu
needs to be replaced dist: xenial now. I do not remember why I had to
hard-code the dist originally but I'm guessing we'll find out soon.
--
Doktor Tommi A Pirinen, Computational Linguist,
<https://flammie.github.io/purplemonkeydishwasher/>, Universität
Hamburg, Hamburger Zentrum f
e throughout summer
* I don't have any good pointers for the background, maybe check through
what other fst folk have done:
http://www.opengrm.org/twiki/bin/view/GRM/WebHome
https://aclweb.org/aclwiki/SIGFSM
--
Doktor Tommi A Pirinen, Computational Linguist,
<https://flammie
penNMT-py#quickstart
I have not gotten it to produce the bleu points that it should though...
--
Doktor Tommi A Pirinen, Computational Linguist,
<https://flammie.github.io/purplemonkeydishwasher/>, Universität
Hamburg, Hamburger Zentrum für Sprachkorpora <http://hzsk.de>. CLARIN-D
Entw
y the same and the apertium ~
feature might be better for that.
--
Doktor Tommi A Pirinen, Computational Linguist,
<https://flammie.github.io/purplemonkeydishwasher/>, Universität
Hamburg, Hamburger Zentrum für Sprachkorpora <http://hzsk.de>. CLARIN-D
Entwickler. President of ACL SIGUR SI
wrote:
> [Sorry for multiposting, please consider submitting your Uralic
> apertiums to us :-]
>
> https://sisu.ut.ee/iwclul2019:
--
Doktor Tommi A Pirinen, Computational Linguist,
<https://flammie.github.io/purplemonkeydishwasher/>, Universität
Hamburg, Hamburger Zentrum für S
s later.
> [...]
> This means we can add features, fluff, and versioning to the binary format
> later with less pain.
>
> Anyone got any bikeshedding for the format break?
Seems perfect, actually what I intended HFST format to be like as well
but we didn't quite get there.
ept.
> We just need to make sure that this is properly documented.
Exactly, and for a GSOC project, we should probably have documentation
as a part of final things to do, I believe there is some already in the
wiki even?
--
Doktor Tommi A Pirinen, Computational Linguist,
<https://flammie
pertium/apertium-bas>
[2] <https://github.com/apertium/apertium-btc>
[3] <https://github.com/apertium/apertium-bas-btc>
[4] <https://github.com/apertium/apertium-kar>
--
Doktor Tommi A Pirinen, Computational Linguist,
<https://flammie.github.io/purplemonkeydishwasher/>, Univ
uild :-)
Thanks,
[0] <https://github.com/flammie/flammie-overlay>
--
Doktor Tommi A Pirinen, Computational Linguist,
<https://flammie.github.io/purplemonkeydishwasher/>, Universität
Hamburg, Hamburger Zentrum für Sprachkorpora <http://hzsk.de>. CLARIN-D
Entwickler. Presid
re unimportant to
me, but it should be simple and not require messing about with
irrelevant datatypes (headers/streams/alphabets/pipelines/...) and
underlying complexities like many c++-to-python bindings tend to be.
[1] <https://pyfst.github.io/>
--
Doktor Tommi A Pirinen, Comput
files were not added to the dist target.
This is what the autotools make target distcheck is for, it makes a
tarball, unpacks, builds (with VPATH and writeonly srcdir), checks and
tests all sorts of stuff. It can be extra-annoying with some picky
checks but if you pass them you should
(Packaging could be made with the release tag
https://github.com/flammie/apertium-fin-deu/releases/tag/0.0.1 )
--
Doktor Tommi A Pirinen, Computational Linguist,
<https://flammie.github.io/purplemonkeydishwasher/>, Universität
Hamburg, Hamburger Zentrum für Sprachkorpora <http://h
e tasks can be used to submit a gsoc application that
includes 3 months of work (or combined into one).
--
Doktor Tommi A Pirinen, Computational Linguist,
<https://flammie.github.io/purplemonkeydishwasher/>, Universität
Hamburg, Hamburger Zentrum für Sprachkorpora <http://hzsk.de>. CLARIN-D
E
12:07 < Flammie> spectei: was there some apertium shirt design somewhere on
sale? it would be cool for next fosdem :-D
12:07 < spectei> hmm, i don't think so but it's a good idea
just an idea, I guess this needs someone who guards the apertium logo
etc. rights to do.
--
tion
of like CVS would've got ten years ago :-D
[0] <https://github.com/flammie/apertium-fin-deu>
--
Doktor Tommi A Pirinen, Computational Linguist,
<https://flammie.github.io/purplemonkeydishwasher/>, Universität
Hamburg, Hamburger Zentrum für Sprachkorpora <http://hzsk.de&
icipants receive a stipend,
enabling them to focus on their programming projects for three months."
heh well, this looks like something I might write in a conference
paper... is it meant to be (machine) untranslateable? ;-)
--
Doktor Tommi A Pirinen, Computational Linguist,
<
ic languages missing apertium pairs! Please
> submit your resources to us. Take a note of new LaTeX stylesheets here
> [1]. Deadline coming up in less than two months
>
> [1]
> <https://github.com/acl-sigur/iwclul-latex/releases/tag/iwclul-2018>
>
> 2017-07-11, Tommi A Pirinen s
idity error : No
declaration for attribute c of element r Document
apertium-fin-deu.fin-deu.dix does not validate
against /usr/share/lttoolbox/dix.dtd
But now I fixed the DTD and it works for me.
--
Doktor Tommi A Pirinen, Computational Linguist,
<https://flammie.github.io/purplemonkeydishwas
is is result of
wrong de-compounding", "bar surface is in old orthography". Would it be
ok to actually just have comment attributes for the l and r elements
just as well? How do other language pairs use @c?
--
Doktor Tommi A Pirinen, Computational Linguist,
<https://flammie
ML, cause in reality we do often need to
sneak in metadata for most of the texts, it'd be a good thing to have a
proper way of doing this instead of relying on filenames and magic
characters.
--
Doktor Tommi A Pirinen, Computational Linguist,
<https://flammie.github.io/purplemonkeydishwas
a build system.
[1] <https://pastebin.com/xWS18fBx>
[2]
<https://github.com/flammie/apertium-fin-deu/commit/925f212e7effef2fae84e4c7967018b6f100b962>
--
Doktor Tommi A Pirinen, Computational Linguist,
<https://flammie.github.io/purplemonkeydishwasher/>, Universität
Hamburg, Ham
>
>
> This should do the trick.
Yep, works neatly now, thanks again. I had used lu stuff earlier I
probably had had difficulties with + or something that I thought it was
magical.
--
Doktor Tommi A Pirinen, Computational Linguist,
<https://flammie.github.io/purplemonkeydishwasher
kissa.
+ pojan kolli.
[1]
<http://wiki.apertium.org/wiki/Apertium-fin-eng/Pending_tests#Genitives>
--
Doktor Tommi A Pirinen, Computational Linguist,
<https://flammie.github.io/purplemonkeydishwasher/>, Universität
Hamburg, Hamburger Zentrum für Sprachkorpora <http://hzsk.de
that provides a bit of protection against such changes but only for
one-direction of the translation pair and breaks the other direction.
This can also be combined with gradually changing the things and
pairs[4] using these language
modules instead of apertium ones.
[1] <http://giellatekno.uit.no/doc/infra/infraremake/>
[2]
<http://giellatekno.uit.no/doc/infra/infraremake/NewInfraOverview.html>
[3] <https://gtsvn.uit.no/langtech/trunk/langs/fin/src/morphology/>
[4] &l
PC-ane/PC-ANE
>
> Would it make sense for lt-proc to not output forced-uppercase
> analyses if there's an otherwise identical dictionary-uppercase
> analysis? (Would it be easily implementable?)
I have no idea of the implementability of this, but it affects most
fin- pairs, so i
as an argument against the
word-form list / database morphology approach. Btw, the above
experiment generated 9 megabytes of word-forms from 4 noun lexemes,
maybe they aren't what would be generally be wanted for "all
word-forms", but it is likely apertium-tat won't
ut I haven't tested it.
Using Finnish morphology this can yield ~200 gigabytes to multiple
terabytes of word-forms. I don't know if that is also the case for
Tatar but something you should generally keep in mind when processing
not your average indoeuropean languages like this.
--
Doktor Tom
t; Travel
>
>Participants from outside Russia area may require a visa to visit
>Russia. If you require an invitation letter confirming your
>participation, please get in contact with the local organising
>committee.
>
>A small number of travel stipends will be availab
ravel stipends will be available for authors of
accepted papers. After submitting your paper please contact the
organising committee to request consideration.
Invited speaker
* Heiki-Jaan Kaalep
Organisers
* Tommi A. Pirinen, Universität Hamburg
* Francis M. Tyers, UiT N
Organisers
* Tommi A. Pirinen, Dublin City University
* Francis M. Tyers, UiT Norgga árktalaš universitehta
* Trond Trosterud, UiT Norgga árktalaš universitehta
Programme committee
* Тимофей Архангельский, Национальный исследовательский университет
"Высшая школа экономи
organising committee to request consideration.
Organisers
* Tommi A. Pirinen, Dublin City University
* Francis M. Tyers, UiT Norgga árktalaš universitehta
* Trond Trosterud, UiT Norgga árktalaš universitehta
Programme committee
* Тимофей Архангельский, Национальный
-fakultetet at UiT The Arctic
University of Norway in Tromsø, Norway.
Organisers
* Tommi A. Pirinen, Dublin City University
* Francis M. Tyers, UiT Norgga árktalaš universitehta
* Trond Trosterud, UiT Norgga árktalaš universitehta
Programme committee
* Тимофей
48 matches
Mail list logo