I am currently exporting word frequencies for all languages I have
collected over the years.
These frequency lists are 'dirty', which means there has been done no
check if words are correct.
That will be handled by the the speller anyway. Spell checker maintainers
could also use it for input..
The frequency lists are now available.
You can find yours here:
the 'gaia' format:
www.spellonit.com/downloads/frequencies/language code_gaia.xml.zip
the plain csv:
www.spellonit.com/downloads/frequencies/language code_wordfreqs.csv.zip
Ruud
On 2014-10-13 23:09, Juan Martorell wrote:
java.io.IOException: Cannot load or parse input stream of
'/org/languagetool/rules/es/grammar.xml'
You have local changes in your grammar.xml, don't you? This exception
indicates the 'name' attribute isn't set for a rule/rulegroup. It's not
related
On 2014-10-14 08:26, R.J. Baars wrote:
the 'gaia' format:
www.spellonit.com/downloads/frequencies/language code_gaia.xml.zip
Could you list which ones are available? (Or configure the server so
that it lists the directory when
www.spellonit.com/downloads/frequencies/ is opened?)
As this is
Daniel Naber daniel.na...@languagetool.org wrote:
Hi,
to provide LT as a 100% pure Java software, I'd like to switch from
Hunspell (native code) to Morfologik (Java-based). For that, I think the
following languages are easy to switch:
Asturian
Galician
Khmer
Spanish
I could list hem, but I don't want to yet. I first want to resolve the
license.
Just data It is a lot more work to collect data like this, than it is
to make a little program. I don't see the difference. It is the effort and
ingenuity that counts.
It is not a plain collection, but picking
Hi,
Wikicheck is not working now for articles with titles that include some
diacritic. See, for example, [1]. It used to work well.
Regards,
Jaume Ortolà
[1]
http://tools.wmflabs.org/languagetool/pageCheck/index?lang=caurl=Llista_dels_rius_m%C3%A9s_llargs
On 2014-10-14 08:49, R.J. Baars wrote:
I even would rather exclude commercial use without written consent of
the
owner (me). In fact, I would object to any use except for open and free
purposes.
Is there a license that fits that?
Creative Commons has a non-commercial option, but then we
On 14 October 2014 08:35, Daniel Naber daniel.na...@languagetool.org
wrote:
On 2014-10-13 23:09, Juan Martorell wrote:
java.io.IOException: Cannot load or parse input stream of
'/org/languagetool/rules/es/grammar.xml'
You have local changes in your grammar.xml, don't you? This exception
On 2014-10-14 08:56, Jaume Ortolà i Font wrote:
Wikicheck is not working now for articles with titles that include
some diacritic. See, for example, [1]. It used to work well.
I know... I don't know how to solve this, everything works fine locally
and I also don't see what has changed. It
On 2014-10-11 12:00, Daniel Naber wrote:
to provide LT as a 100% pure Java software, I'd like to switch from
Hunspell (native code) to Morfologik (Java-based). For that, I think
the
following languages are easy to switch:
Asturian
I've switched over Asturian now, would be nice if
Hi,
I did some internal cleanup to the poxm.xml files so that they contain
less duplication. It shouldn't make a difference for anyone, but if you
have problems building LT with Maven, let me know.
Regards
Daniel
After conferring a bit more with Daniel, I decided to make my company to
publish the top 30% of the frequency lists free and open using CC-BY.
This should be enough for LT.
If you want to add frequencies to the morfologik speller, the frequency
list for your language could be in the complete set
2014-10-14 8:49 GMT+02:00 R.J. Baars r.j.ba...@xs4all.nl:
I could list hem, but I don't want to yet. I first want to resolve the
license.
Just data It is a lot more work to collect data like this, than it is
to make a little program. I don't see the difference. It is the effort and
14 matches
Mail list logo