[Apertium-stuff] Tagger training prerequisites

2013-09-23 Thread Per Tunedal
Hi, what should the text-files look like before starting the tagger training? One sentence a line? Something else? Is a text formatted like below OK: Antingen genom att gå in under rätt rubrik ovan och lägga till ditt bidrag eller lägg ditt bidrag i bufferten om du inte vet var eller hur det

Re: [Apertium-stuff] Tagger training prerequisites

2013-09-23 Thread Jimmy O'Regan
On 23 September 2013 08:17, Per Tunedal per.tune...@operamail.com wrote: Hi, what should the text-files look like before starting the tagger training? One sentence a line? Something else? Is a text formatted like below OK: Antingen genom att gå in under rätt rubrik ovan och lägga till ditt

Re: [Apertium-stuff] Tagger training prerequisites

2013-09-23 Thread Jimmy O'Regan
On 23 September 2013 15:45, Per Tunedal per.tune...@operamail.com wrote: Hi, Thanks! I noticed your tool, but unfortunately I'm not sure how to use it! SYNOPSIS apertium-tsx-lint tsx-file [DIC] [DIC] is the 'dictionary' generated during tagger training (not an actual dictionary!). It'll run

Re: [Apertium-stuff] Tagger training prerequisites

2013-09-23 Thread Per Tunedal
Hi Jimmy, Interesting, keep me informed! I might have use for your work when I'm ready to start the training. Yours, Per Tunedal On Mon, Sep 23, 2013, at 18:30, Jimmy O'Regan wrote: On 23 September 2013 17:03, Jimmy O'Regan jore...@gmail.com wrote: On 23 September 2013 15:45, Per Tunedal

Re: [Apertium-stuff] Tagger training prerequisites

2013-09-23 Thread Per Tunedal
Hi, Thanks! I noticed your tool, but unfortunately I'm not sure how to use it! Yours, Per Tunedal On Mon, Sep 23, 2013, at 11:26, Jimmy O'Regan wrote: On 23 September 2013 08:17, Per Tunedal per.tune...@operamail.com wrote: Hi, what should the text-files look like before starting the

Re: [Apertium-stuff] Tagger training sv-da

2013-09-14 Thread Per Tunedal
Hi, thank you. Works as charm for Wikipedia, Wikivoyage and Wikibooks, as far as I can see. But, NO, it doesn't work for the Wiktionary. I get output that looks OK, but it doesn't include the full atricles. Further, it includes explanations for foreign words as well. I tried: bzcat

Re: [Apertium-stuff] Tagger training sv-da

2013-09-14 Thread Per Tunedal
Hi, the cleaned Danish Wikipedia file containted this unwanted characters: __NOTOC__ on a separate line somewhere in the middle of the text. Aught to be discarded in the cleaning script. Yours, Per Tunedal On Sat, Sep 14, 2013, at 15:18, Per Tunedal wrote: Hi, thank you. Works as charm for

Re: [Apertium-stuff] Tagger training sv-da

2013-09-13 Thread Kevin Brubeck Unhammer
Lars Aronsson l...@aronsson.se writes: On 09/13/2013 02:54 AM, Gang Chen wrote: 1) Is it possible to make some kind of Wikipedia dump? This tool works fine for extracting the main text from Wikipedia, http://wiki.apertium.org/wiki/User:Gang_Chen/Wikipedia_Extractor Wikipedia very rarely

Re: [Apertium-stuff] Tagger training sv-da

2013-09-13 Thread Per Tunedal
Hi again, the extractor is already finished. I overlooked a line in your instructions (maybe I'm too tired): cat output/*/* svwiktionary.text Now I'm running the cleaning script: python cleanHTML.py svwiktionary.text I will give you a report when it as finished. Yours, Per Tunedal On

Re: [Apertium-stuff] Tagger training sv-da

2013-09-13 Thread Per Tunedal
Hi, Thank you! Your Wikipedia Extractor is running right now. I will look for the result in an hour. How do I use the script for filtering out tags? I've saved it as a Python file. Do I have to run it separately for every singe file in the output directory? Can't I just take every file in the

Re: [Apertium-stuff] Tagger training sv-da

2013-09-13 Thread Gang Chen
Hi, The script needs an input redirect () from a file instead of the stdin and an output redirect () to a file instead of the stdout. The following will do the work: python cleanHTML.py svwiktionary.text svwiktionary.filter.text Btw, I only tested it on Wikipedia, but I'm not sure whether it

Re: [Apertium-stuff] Tagger training

2011-12-15 Thread Kevin Brubeck Unhammer
Jimmy O'Regan jore...@gmail.com writes: On 14 December 2011 20:19, Pim Otte otte@gmail.com wrote: I'm not sure how i should get the output of the analyser. but running the makefile itself results in an empty af-tagger-data/af.dic running this line: after creating af.dic.expand gives

Re: [Apertium-stuff] Tagger training

2011-12-15 Thread Kevin Brubeck Unhammer
Francis Tyers fty...@prompsit.com writes: El dj 15 de 12 de 2011 a les 10:42 +0100, en/na Kevin Brubeck Unhammer va escriure: Jimmy O'Regan jore...@gmail.com writes: On 14 December 2011 20:19, Pim Otte otte@gmail.com wrote: I'm not sure how i should get the output of the analyser.

Re: [Apertium-stuff] Tagger training

2011-12-15 Thread Jimmy O'Regan
On 15 December 2011 10:13, Francis Tyers fty...@prompsit.com wrote: El dj 15 de 12 de 2011 a les 10:42 +0100, en/na Kevin Brubeck Unhammer va escriure: Jimmy O'Regan jore...@gmail.com writes: On 14 December 2011 20:19, Pim Otte otte@gmail.com wrote: I'm not sure how i should get the

Re: [Apertium-stuff] Tagger training

2011-12-15 Thread Kevin Brubeck Unhammer
Jimmy O'Regan jore...@gmail.com writes: On 15 December 2011 10:13, Francis Tyers fty...@prompsit.com wrote: El dj 15 de 12 de 2011 a les 10:42 +0100, en/na Kevin Brubeck Unhammer va escriure: Jimmy O'Regan jore...@gmail.com writes: On 14 December 2011 20:19, Pim Otte otte@gmail.com

Re: [Apertium-stuff] Tagger training

2011-12-15 Thread Pim Otte
Thanks everyone. Just using -e indeed solved the problem. :) I noticed the usage information was different from the man-page of lt-proc. I updated this in r35321. Could someone who knows the options check if i did so correctly? Pim On Thu, Dec 15, 2011 at 3:28 PM, Kevin Brubeck Unhammer

[Apertium-stuff] Tagger training

2011-12-14 Thread Pim Otte
Hiya everyone, I'm trying to retrain the af-nl pos-tagger since i'm attempting to fix some things by using .tsx rules, but it isn't going very well. I get this error: pim-oneiric@pim-K53SV:~/source/apertium-af-nl$ apertium-tagger -t 1 dev/apertium-af-nl.af.exp af-tagger-data/af.smaller.crp

Re: [Apertium-stuff] Tagger training

2011-12-14 Thread Pim Otte
I'm not sure how i should get the output of the analyser. This section of af-nl-unsupervised.make seems relevant, but running that gives an empty apertium-af-nl/af.dic lt-expand $(BASENAME).$(LANG1).dix | grep -v __REGEXP__ | grep -v :: | grep -v 'DUE_TO_LT_PROC_HANG' |\ awk

Re: [Apertium-stuff] Tagger training

2011-12-14 Thread Jimmy O'Regan
On 14 December 2011 20:19, Pim Otte otte@gmail.com wrote: I'm not sure how i should get the output of the analyser. but running the makefile itself results in an empty af-tagger-data/af.dic running this line: after creating af.dic.expand gives usage on lt-proc usage lt-proc -e -w -a