Hi,
what should the text-files look like before starting the tagger
training? One sentence a line? Something else?
Is a text formatted like below OK:
Antingen genom att gå in under rätt rubrik ovan och lägga till ditt
bidrag eller lägg ditt bidrag i bufferten om du inte vet var eller hur
det
On 23 September 2013 08:17, Per Tunedal per.tune...@operamail.com wrote:
Hi,
what should the text-files look like before starting the tagger
training? One sentence a line? Something else?
Is a text formatted like below OK:
Antingen genom att gå in under rätt rubrik ovan och lägga till ditt
On 23 September 2013 15:45, Per Tunedal per.tune...@operamail.com wrote:
Hi,
Thanks!
I noticed your tool, but unfortunately I'm not sure how to use it!
SYNOPSIS
apertium-tsx-lint tsx-file [DIC]
[DIC] is the 'dictionary' generated during tagger training (not an
actual dictionary!). It'll run
Hi Jimmy,
Interesting, keep me informed! I might have use for your work when I'm
ready to start the training.
Yours,
Per Tunedal
On Mon, Sep 23, 2013, at 18:30, Jimmy O'Regan wrote:
On 23 September 2013 17:03, Jimmy O'Regan jore...@gmail.com wrote:
On 23 September 2013 15:45, Per Tunedal
Hi,
Thanks!
I noticed your tool, but unfortunately I'm not sure how to use it!
Yours,
Per Tunedal
On Mon, Sep 23, 2013, at 11:26, Jimmy O'Regan wrote:
On 23 September 2013 08:17, Per Tunedal per.tune...@operamail.com
wrote:
Hi,
what should the text-files look like before starting the
Hi,
thank you. Works as charm for Wikipedia, Wikivoyage and Wikibooks, as
far as I can see.
But, NO, it doesn't work for the Wiktionary. I get output that looks
OK, but it doesn't include the full atricles. Further, it includes
explanations for foreign words as well.
I tried:
bzcat
Hi,
the cleaned Danish Wikipedia file containted this unwanted characters:
__NOTOC__
on a separate line somewhere in the middle of the text. Aught to be
discarded in the cleaning script.
Yours,
Per Tunedal
On Sat, Sep 14, 2013, at 15:18, Per Tunedal wrote:
Hi,
thank you. Works as charm for
Lars Aronsson l...@aronsson.se writes:
On 09/13/2013 02:54 AM, Gang Chen wrote:
1) Is it possible to make some kind of Wikipedia dump?
This tool works fine for extracting the main text from Wikipedia,
http://wiki.apertium.org/wiki/User:Gang_Chen/Wikipedia_Extractor
Wikipedia very rarely
Hi again,
the extractor is already finished.
I overlooked a line in your instructions (maybe I'm too tired):
cat output/*/* svwiktionary.text
Now I'm running the cleaning script:
python cleanHTML.py svwiktionary.text
I will give you a report when it as finished.
Yours,
Per Tunedal
On
Hi,
Thank you! Your Wikipedia Extractor is running right now. I will look
for the result in an hour.
How do I use the script for filtering out tags? I've saved it as a
Python file. Do I have to run it separately for every singe file in the
output directory? Can't I just take every file in the
Hi,
The script needs an input redirect () from a file instead of the stdin
and an output redirect () to a file instead of the stdout. The following
will do the work:
python cleanHTML.py svwiktionary.text svwiktionary.filter.text
Btw, I only tested it on Wikipedia, but I'm not sure whether it
Jimmy O'Regan jore...@gmail.com
writes:
On 14 December 2011 20:19, Pim Otte otte@gmail.com wrote:
I'm not sure how i should get the output of the analyser.
but running the makefile itself results in an empty af-tagger-data/af.dic
running this line: after creating af.dic.expand gives
Francis Tyers fty...@prompsit.com writes:
El dj 15 de 12 de 2011 a les 10:42 +0100, en/na Kevin Brubeck Unhammer
va escriure:
Jimmy O'Regan jore...@gmail.com
writes:
On 14 December 2011 20:19, Pim Otte otte@gmail.com wrote:
I'm not sure how i should get the output of the analyser.
On 15 December 2011 10:13, Francis Tyers fty...@prompsit.com wrote:
El dj 15 de 12 de 2011 a les 10:42 +0100, en/na Kevin Brubeck Unhammer
va escriure:
Jimmy O'Regan jore...@gmail.com
writes:
On 14 December 2011 20:19, Pim Otte otte@gmail.com wrote:
I'm not sure how i should get the
Jimmy O'Regan jore...@gmail.com
writes:
On 15 December 2011 10:13, Francis Tyers fty...@prompsit.com wrote:
El dj 15 de 12 de 2011 a les 10:42 +0100, en/na Kevin Brubeck Unhammer
va escriure:
Jimmy O'Regan jore...@gmail.com
writes:
On 14 December 2011 20:19, Pim Otte otte@gmail.com
Thanks everyone.
Just using -e indeed solved the problem. :)
I noticed the usage information was different from the man-page of
lt-proc. I updated this in r35321. Could someone who knows the options
check if i did so correctly?
Pim
On Thu, Dec 15, 2011 at 3:28 PM, Kevin Brubeck Unhammer
Hiya everyone,
I'm trying to retrain the af-nl pos-tagger since i'm attempting to fix
some things by using .tsx rules, but it isn't going very well. I get
this error:
pim-oneiric@pim-K53SV:~/source/apertium-af-nl$ apertium-tagger -t 1
dev/apertium-af-nl.af.exp af-tagger-data/af.smaller.crp
I'm not sure how i should get the output of the analyser.
This section of af-nl-unsupervised.make seems relevant, but running
that gives an empty apertium-af-nl/af.dic
lt-expand $(BASENAME).$(LANG1).dix | grep -v __REGEXP__ |
grep -v :: | grep -v 'DUE_TO_LT_PROC_HANG' |\
awk
On 14 December 2011 20:19, Pim Otte otte@gmail.com wrote:
I'm not sure how i should get the output of the analyser.
but running the makefile itself results in an empty af-tagger-data/af.dic
running this line: after creating af.dic.expand gives usage on lt-proc
usage lt-proc -e -w -a
19 matches
Mail list logo