Re: Regional variants of Catalan (ca-ES-valencia)

2013-10-07 Thread Ruud Baars
Marcin, I am nto sure about English, since there seem to be variants of British, USA, Canadian, Zew Zealand, Australia, that could be considered local/regional variants. A country is just a region of the world, right? Or is this too philosophical? Ruud On 07-10-13 16:03, Marcin Miłkowski

Re: Modules for individual supported languages?

2013-10-05 Thread Ruud Baars
No, those are the postag dictionaries! Ruud On 05-10-13 13:15, Stefan Lotties wrote: LO/OO provides dictionaries when installing it. Can't we use them instead of our own hunspell dictionaries? Languages such as german stiff have a huge grammar.xml beside the dictionary, but leaving out the

Re: dump of LT command line

2013-10-02 Thread Ruud Baars
Right. It crashed again with the big batch. Some solution would be nice. Ruud On 02-10-13 12:00, Stefan Lotties wrote: On Wed, Oct 2, 2013 at 11:34 AM, Daniel Naber list2...@danielnaber.de wrote: On 2013-10-02 11:14, Stefan Lotties wrote: Sounds right. The fix should look very similar to

Re: dump of LT command line

2013-10-01 Thread Ruud Baars
- with it It could be platform, But I am not the only one on (K)ubuntu, using JDK. Ruud On 2013-09-30 18:17, Ruud Baars wrote: For reproducing exactly: remove the line feeds from the sentence. They were introduced by the e-mail. That and using your disambiguation file didn't help, I

Re: dump of LT command line

2013-10-01 Thread Ruud Baars
Pellé wrote: Ruud Baars wrote: I retrace the steps with 2.3, and the error does not reproduce. It must be a snapshot thing. No matter, case closed. Ruud Hopefully it's fixed indeed... or it's a rare multi-threading bug and those are painful bugs to debug and reproduce. In fact I just saw

Re: dump of LT command line

2013-09-29 Thread Ruud Baars
I just added a bug for this on Sourceforge. I was wondering what all those fields the creator is able to set are doing there, there is no explanation at all. What is Milestone? Owner? Which priority is highest? Should all these items not be set by the 'moderator' of the bugs list? Ruud

Re: dump of LT command line

2013-09-29 Thread Ruud Baars
, echoing the file done into a admin file, so I can find out in what file it crashes. Ruud On 29-09-13 12:32, Daniel Naber wrote: On 2013-09-29 11:02, Ruud Baars wrote: But I will have do redo it all (it happend after 14 GB of output ..) , and there is no indication of file it was in, nor

Re: SV: Two strings missing i Transifex

2013-09-29 Thread Ruud Baars
deletion or Undo addition. Regards, Panagiotis Minos On 29/09/2013 01:04 ??, Ruud Baars wrote: I guess this is the same thing I reported, and is caused by missing translations of Java itself. There apperently is a trick to work around it, but it is too late, bevause of feature freeze .. Ruud

Disambiguation

2013-09-25 Thread Ruud Baars
Can or should I add an exception to this disambiguation rule (in fact a postag assignment) to prevent a word to get the postag NN1r twice, or is that not necessary? Can I add exceptions to the disambig rule? Ruud rule name=onbekend verkleinwoord enkelvoud id=UNKNOWN_NN1r pattern

Re: Translations for Dutch

2013-09-25 Thread Ruud Baars
Might be a good idea to add after the release .. Ruud On 25-09-13 08:45, Daniel Naber wrote: On 2013-09-25 07:38, Ruud Baars wrote: In the menu: Undo addition (What is ist?) In the file open dialog: Look in, File Name, Files of type etc. Are these hardcoded? Yes, but not by LT

LT start problem. (trying to check translations)

2013-09-24 Thread Ruud Baars
Daniel, this is what I get when trying to start LT: OpenJDK Runtime Environment (IcedTea6 1.12.6) (6b27-1.12.6-1ubuntu0.12.04.2) OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode) ruud@ruud-desktop:~/Bureaublad/LanguageTool-2.3-SNAPSHOT$ java -jar languagetool-standalone.jar Exception in

Translations for Dutch

2013-09-24 Thread Ruud Baars
Daniel, I have been able to get Java working (removed all java-likes, and installed 7) But the Dutch menus have not been adjusted. I redid all of it in Transifex, hoping this time it will be saved ... Ruud -- October

Re: Translations for Dutch

2013-09-24 Thread Ruud Baars
Sorry, I still haven mastered coding on the level required, nor any knowledge about checking in/out. I will try the snapshot tomorrow. Ruud On 24-09-13 20:09, Daniel Naber wrote: On 2013-09-24 18:31, Ruud Baars wrote: I redid all of it in Transifex, hoping this time it will be saved

Re: tooltips in stand-alone version

2013-09-24 Thread Ruud Baars
I see them too. Kubuntu. Ruud On 24-09-13 20:22, gulp21 wrote: does anybody see tooltips when hovering over the icons in the stand-alone version? They seem to be in the code, but they don't show up for me. I see them. LT from 28/08/2013 on Kubuntu 12.04 Regards Markus

Re: Translations for Dutch

2013-09-24 Thread Ruud Baars
? Ruud On 24-09-13 20:43, Ruud Baars wrote: Sorry, I still haven mastered coding on the level required, nor any knowledge about checking in/out. I will try the snapshot tomorrow. Ruud On 24-09-13 20:09, Daniel Naber wrote: On 2013-09-24 18:31, Ruud Baars wrote: I redid all

Maintenance for Dutch

2013-09-22 Thread Ruud Baars
I will be starting improving Dutch again after the coming release. This will include redoing postags all over, and adapting and improving rules. Ruud -- LIMITED TIME SALE - Full Year of Microsoft Training For Just

Translation

2013-09-22 Thread Ruud Baars
I tried to translate all untranslated Dutch interface elements using Transifex. It is not clear where all options are used, so I just tried to keep it consistent. Ruud -- LIMITED TIME SALE - Full Year of Microsoft

Re: New Language Turkish

2013-07-20 Thread Ruud Baars
As long as there is a Hunspell one, and it is of good quality. On 20-07-13 08:51, Daniel Naber wrote: Am 20.07.2013 00:19, schrieb Daniel Naber: http://code.google.com/p/zemberek/ [4] Yes, if the MPL license of the project also applies to the word list we could use that. Actually, the

Re: New Language Turkish

2013-07-20 Thread Ruud Baars
What about abbreviations? Hunspell accepts abbreviations without the . even when these are actually incorrect. LT knows about abbreviations, using srx for sentence endings at least, but what about in-sentence abbreviations? Ruud On 20-07-13 08:51, Daniel Naber wrote: Am 20.07.2013 00:19,

Re: suggestions in Morfologik spelling rule

2013-07-16 Thread Ruud Baars
By the way, I could help with words frequencies for some langauges. e.g. Portuguese, Spanish, Dutch. Ruud On 16-07-13 14:20, R.J. Baars wrote: Coding word frequencies as a character is fine. I think it would be classes, logarithmic as far as I am concerned. Ruud W dniu 2013-07-16 00:03,

Re: Detect Spaces Before Words?

2013-05-23 Thread Ruud Baars
Might a change to the spacebefore-detection be an option? Like specifying space type instead of just No or Yes ? Ruud On 23-05-13 12:58, Marcin Miłkowski wrote: W dniu 2013-05-23 11:32, Nathan Wells pisze: So I just confirmed with some farther testing that spacebefore considers a zero-width

Re: simple compound words with hyphen support in speller

2013-05-15 Thread Ruud Baars
Would this touch all languages? In which way then? For Dutch it is very likely spliting at a dash will end up accepting wrong words.. Ruud On 15-05-13 18:04, Daniel Naber wrote: Am 15.05.2013 05:15, schrieb Andriy Rysin: As this change touches the core code I wanted to review it here first

Re: simple compound words with hyphen support in speller

2013-05-14 Thread Ruud Baars
Marcin did a change for the Dutch tokenizer to support the - as part of the token, not as separator. Ruud On 15-05-13 05:15, Andriy Rysin wrote: Hi all I had some requests for Ukrainian module to support hyphenated words better in our spellchecker. In general Ukrainian has a lot of special

Re: Hunspell expansion to words list

2013-05-09 Thread Ruud Baars
). The first is ideal for number generation e.g.; the second more useful for normal words. Ruud On 09-05-13 15:57, Marcin Mi?kowski wrote: W dniu 2013-05-08 19:46, Daniel Naber pisze: Am 08.05.2013 19:34, schrieb Ruud Baars: Daniel, maybe it is an idea to get a page somewhere to get the info

Re: Hunspell expansion to words list

2013-05-08 Thread Ruud Baars
Daniel, maybe it is an idea to get a page somewhere to get the info together on requirements for compounding in the spell checker. A wiki page? I have been contributing a lot of ideas to the most recent version of Hunspell, required things for good Dutch support. Ruud On 08-05-13 18:08,

Re: LT as a Java spell checker

2013-05-07 Thread Ruud Baars
The current hyphenation mechanisms are not able to support Dutch very well. The number of levels is too little, the possible pattern length too; there is no concept of compounding, nor of correct changing a word when hyphenating. Ruud On 07-05-13 16:36, Marcin Miłkowski wrote: Ah, this of

Re: Hunspell expansion to words list

2013-05-04 Thread Ruud Baars
Schreiber wrote: The problem with the compounds in Hunspell that Ruud described exists for German as well. Just saying. Am 03.05.2013 13:07, schrieb Ruud Baars: Hi. Finally I have a full keyborad, to elaborate a bit on the expansion issue. Spell checking is supposed signal any incorrect word

Hunspell expansion to words list

2013-05-03 Thread Ruud Baars
Hi. Finally I have a full keyborad, to elaborate a bit on the expansion issue. Spell checking is supposed signal any incorrect word. So most correct words should be accepted. There are words in between though. Words that are technically correct, but in everyday language use mocht commonly a

Re: Improving suggestions in speller rules

2013-04-25 Thread Ruud Baars
Since Dutch and German are very much alike, we might tem up on thinking what we need for that. I noticed German compounding has been implemented in Hunspell quite differently from my Dutch solution. When compouding, Dutch needs: - border-detection of case and letter combinations, forcing a

Mulit-language spell checking

2013-04-17 Thread Ruud Baars
For some purpose, I might need a spell checker covering all languages. Word and expected language in, word and alternatives out, including languages: example for Dutch : sex (NL) = sex (EN), seks (NL), sexy (NL/EN) Maybe this is a way to detect language better then tika does, maybe detect

Re: Morfologik speller

2013-04-11 Thread Ruud Baars
You can assume it is a replacement. I never registred a bug. Where do I do that? On 11-04-13 16:36, Daniel Naber wrote: On 10.04.2013, 21:32:19 Ruud Baars wrote: data.taaltik.nl/LanguageTool/Accepted_by_OpenTaal_210.txt.zip , 3.5 MB So this is the list of words that should be accepted, isn't

Re: Morfologik speller

2013-04-11 Thread Ruud Baars
Forget that, I found it. On 11-04-13 16:41, Ruud Baars wrote: You can assume it is a replacement. I never registred a bug. Where do I do that? On 11-04-13 16:36, Daniel Naber wrote: On 10.04.2013, 21:32:19 Ruud Baars wrote: data.taaltik.nl/LanguageTool/Accepted_by_OpenTaal_210.txt.zip

Morfologik speller

2013-04-10 Thread Ruud Baars
What I could do for the Dutch speller to improve it (it is just too limited now) is run my millions of words through it, finding out which ones are accepted, thus effectively detecting all accepted compounds. How would I from that plain list generate the spelling list, or could I just post the

Re: Improving suggestions in speller rules

2013-04-07 Thread Ruud Baars
There is a relatively fast way to deal with alternatives. Fore every word one could compute a number, using this recipe: - lowercase the character - unaccent the character - have the ascii-value of the letter, raise it to the fifth power - add all these numbers This makes a 'fast lookup'-number

Re: LanguageTool release 2.1 in progress

2013-04-01 Thread Ruud Baars
Works fine for me. Ruud On 01-04-13 00:34, Daniel Naber wrote: Hi, the LanguageTool 2.1 release has now been uploaded to http://www.languagetool.org/download/?C=M;O=D It's not yet announced, so please take the time to test it. Unless some serious problem shows up, I will announce and link

Re: LanguageTool release 2.1 in progress

2013-04-01 Thread Ruud Baars
One setback is the spell checker. It has the same issue most English-based spellcheckers have: suggest a space, splitting up words. Since the words list is far from complete, it suggests a lot of wrong split-ups. Either the word splitting should be disabled, or the words list enhanced quite a

Re: finding English phrases

2013-03-01 Thread Ruud Baars
The university of Nijmegen (The Netherlands) has been working on tool combinations that do exactly that, use machine learning from large corpora, Dutch as well as English. The Dutch front-end is called valkuil.net. More info on the tools and usage is here:

Translations of Dutch

2013-02-16 Thread Ruud Baars
In case you need it, here is the official translation of Dutch in a lot of languages: (Unfortunately, the language it is in, is mentioned in Dutch...) Ruud Afrikaans: Nederlands Albanees:

Re: Translations of Dutch

2013-02-16 Thread Ruud Baars
of language names in Slovenian is used, so I hope you are not suggesting to replace the existing Slovenian name for Dutch with the one on your list? Lp, m. 2013/2/16 Ruud Baars baar...@xs4all.nl In case you need it, here is the official

Re: Uncompounding words

2013-02-06 Thread Ruud Baars
If regexp is good enough, you could do that in disambiguation. But no certainty that the word as a whole is correct. Ruud On 06-02-13 13:05, Jaume Ortolà i Font wrote: 2013/2/6 Ruud Baars baar...@xs4all.nl mailto:baar...@xs4all.nl Jaume, I was not working on compouding; that has

Re: Uncompounding words

2013-02-06 Thread Ruud Baars
Naber wrote: On 06.02.2013, 08:05:16 Ruud Baars wrote: When I know a bit more, I could try to adjust the prototype code to support multiple languages by design. jwordsplitter can be extended to support languages other than German. You need to implement a class that extends

Re: Uncompounding words

2013-02-06 Thread Ruud Baars
, 19:41:26 Ruud Baars wrote: I have had a look at that code a long time ago. I found it hard to understand. The most recent code on github (not in the latest release yet) has been rewritten and should be easier to follow. Help for checking out the code is on github (basically just git clone

Uncompounding words

2013-02-05 Thread Ruud Baars
A long time ago I prototyped a word uncompounder for Dutch. Though it worked, it was far from elegant and supporting only Dutch. Earlier this week I found a more elegant soution, able to uncompound words like 'langetermijnplanning' into 'lange termijn planning'. In Dutch there are 4 possible

Re: Simplify directory structure? (language appear twice in directories)

2013-01-25 Thread Ruud Baars
+1 On 25-01-13 18:01, Dominique Pellé wrote: Hi The directory tree of LanguageTool source and resource files is quite deep. I understand that this is mostly due to Java conventions. But now that languages are separated in modules, I don't think we need to have 2 directories with the name

Re: switching to Maven - done!

2013-01-24 Thread Ruud Baars
As far as I am concerned, I think it is very technical. I will work from the nightly builds. On 24-01-13 13:15, Jaume Ortolà i Font wrote: 2013/1/23 Daniel Naber list2...@danielnaber.de mailto:list2...@danielnaber.de Don't hesitate to ask if you have questions. Hi Daniel, I miss an

Re: Morfologik spelling and replacement patterns

2013-01-24 Thread Ruud Baars
Common errors in Dutch are: kado where cadeau is right : a k-c switch, combined with a eau-o switch. 2 replacements in one word. To be honest, Hunspell applies only 1. A differnet example is the combination stofgezogen-gestofzuigd. These are all not typo's, but corrections are desired

Re: adding suggestions to a pattern rule

2013-01-16 Thread Ruud Baars
If suggestions are listed in a suggestion list, why having suggestions in the message at all? There is a short message, a long message with suggestions, and then a suggestion list. Maybe it is possible to reduce the messages to 1, and leave the suggestion out. Ruud On 16-01-13 18:44, Daniel

Re: getting LT online

2013-01-16 Thread Ruud Baars
Daniel, I am working on just that for Dutch. Ruud On 17-01-13 00:35, Daniel Naber wrote: Now that we have a nice online check, we might contact website owners who offer automatic spell or grammar checking, asking them if they would like to integrate LT into their service. For example,

multi-word spell checking

2013-01-05 Thread Ruud Baars
I am cosidering this wild idea: - ignore tradtional spell checking of just single words - start spellchecking word groups from 1 to n (3 to start with) words. - use an engine to suggest known correct alternatives for known erroneous word groups - suggest more common much alike word groups. A

Re: compounds.txt

2013-01-01 Thread Ruud Baars
-12 21:30, Ruud Baars wrote: Thanks. Ruud On 30-12-12 21:23, Daniel Naber wrote: On 29.12.2012, 21:28:22 Ruud Baars wrote: I added land-in-waarts+ to compounds.txt. The very latest snapshot at http://www.languagetool.org/download/snapshots now contains a fix for that. Regards Daniel

Re: new disambiguation action implemented (patch included)

2013-01-01 Thread Ruud Baars
unless you add a disambiguator, this is not useful to you anyway. To learn more about the disambiguator in LT, see: http://wiki.languagetool.org/developing-a-disambiguator Regards -- Dominique Ruud Baars baar...@xs4all.nl mailto:baar...@xs4all.nl wrote: Could you make clear what

Re: compounds.txt

2012-12-31 Thread Ruud Baars
Is this rule case sensitive? That is relevant sometimes (Aids Fonds [=organisation] vs. aidsfonds [ any fund]) On 30-12-12 21:30, Ruud Baars wrote: Thanks. Ruud On 30-12-12 21:23, Daniel Naber wrote: On 29.12.2012, 21:28:22 Ruud Baars wrote: I added land-in-waarts+ to compounds.txt

Re: compounds.txt

2012-12-31 Thread Ruud Baars
Could you add case sensitivity on the wish list? Ruud On 31-12-12 16:01, Daniel Naber wrote: On 31.12.2012, 15:51:37 Ruud Baars wrote: Is this rule case sensitive? That is relevant sometimes (Aids Fonds [=organisation] vs. aidsfonds [ any fund]) It's currently case-insensitive. Regards

Re: compounds.txt

2012-12-31 Thread Ruud Baars
Or more plain, add the exclamation mark at the start for case-sensitivity. Ruud On 31-12-12 16:18, Daniel Naber wrote: On 31.12.2012, 16:12:18 Ruud Baars wrote: Could you add case sensitivity on the wish list? It's already on the wish list for a long time, as it's also needed (rarely

Re: making XML rules more compact?

2012-12-30 Thread Ruud Baars
Please don't lock thing to Eclipse. Eclipse is a tough thing to learn fro non-developers like me. I would rather stick to my plain text editor to edit rules and check them from the command line. Ruud On 30-12-12 23:01, Mauro Condarelli wrote: Just my 2c... IMHO there's no gain (aside from

switching of rule in interface

2012-12-29 Thread Ruud Baars
I started LT GUI as a server, and was using that. In the GUI, I disabled a rule using the link. But after that, the rul was still applied by the server. Ruud -- Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012,

compounds.txt

2012-12-29 Thread Ruud Baars
Could the rule that uses compounds.txt be adjusted to do a suggestion? Ruud -- Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current

Re: compounds.txt

2012-12-29 Thread Ruud Baars
regels - klik aan om weer in te schakelen: Waarschijnlijk een spelfout Klaar, 1 aandachtspunten Could it be a string translation issue in the where I missed a parameter? Ruud On 29-12-12 20:12, Daniel Naber wrote: On 29.12.2012, 20:04:36 Ruud Baars wrote: Could the rule that uses

quote error

2012-12-28 Thread Ruud Baars
There apparently is no rule checking for tv , e double quote, caught between 2 letters. A single qout is possible in Dutch, a double never. Is this true for all languages? Would a rule for this a small change in one of the java rules, or should I make a xml rule for this? Ruud

spell checker

2012-12-28 Thread Ruud Baars
There is a problem with Hunspell. It looks like you have not implemented the latest version. The call from LT for the word 'wulfsite' results in the erroneous suggestion 'wulf site'. This is quite serious for Dutch! Using the incorporated .dic and .aff and entering the same word, results in

java rule issue

2012-12-28 Thread Ruud Baars
I hit an example that is quite strange: Een week geleden zei de brancheorganisatie rekening te houden met een extra omzet van 230 miljoen euro in de dagen voor Pasen. ,Die inschatting lijkt aan de voorzichtige kant te zijn geweest. Vermoedelijk komt dat bedrag hoger uit. It is quite clear

Re: spell checker

2012-12-28 Thread Ruud Baars
How do I switch of the spell-checking rule in the GUI/options? On 28-12-12 13:48, Ruud Baars wrote: There is a problem with Hunspell. It looks like you have not implemented the latest version. The call from LT for the word 'wulfsite' results in the erroneous suggestion 'wulf site

Re: quote error

2012-12-28 Thread Ruud Baars
This is not 2 single quotes, but 1 double. The error there is no space, where there should be one, left or right. Ruud On 28-12-12 18:44, Marcin Miłkowski wrote: W dniu 2012-12-28 13:39, Ruud Baars pisze: There apparently is no rule checking for tv , e double quote, caught between 2 letters

Re: [suggestion] valid words specified with regexp in LanguageTool spelling checker

2012-12-27 Thread Ruud Baars
both. Ruud On 27-12-12 19:53, Dominique Pellé wrote: Ruud Baars baar...@xs4all.nl mailto:baar...@xs4all.nl wrote: You don't have to use compounding. add a flag for the 'affix' 're' etc. Like SFX X J 2 SFX X e . SFX X ere . (or something better) and add numbers

Re: Api variables

2012-12-27 Thread Ruud Baars
But what is offset then? Ruud On 27-12-12 22:26, Dominique Pellé wrote: Ruud Baars baar...@xs4all.nl mailto:baar...@xs4all.nl wrote: Can anyone please explain what the added value is of the api values of: contextoffset and offset in: error fromy=0 fromx=0 toy=0 tox=5

Re: Firefox extension

2012-12-14 Thread Ruud Baars
Ah, I found the add-on-balk (which is off by default) You would need that on for the icon to be visible. Ruud -- LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices

Re: Firefox extension

2012-12-12 Thread Ruud Baars
The texts are very detailed, and might be tailored for Windows, since I cannot see FF-bars anywhere in my FF on Kubuntu. Some terminology should be following Mozilla terminology, which in unfamiliar to me. So the translation is there mostly, but not all of it, and certainly not

Re: [Languagetool] subject prefix on this mailing list

2012-11-22 Thread Ruud Baars
Reduce it to LT ? On 22-11-12 16:11, Daniel Naber wrote: Hi, does anybody have a problem with the [Languagetool] prefix removed from the mails on this list? I think there are other ways to filter mails in all mail clients and it just takes up space better used for the subject. Regards

Re: [Languagetool] Performance

2012-11-21 Thread Ruud Baars
Number of reported errors (rules) will structurally grow; length of text is out of our field of influence. So there is only speeding things up left. Apart from profiling the code, one could consider profiling the individual rules and the amount of found hits, related to the seriousness of the

[Languagetool] Wikipedia text

2012-11-19 Thread Ruud Baars
I am looking for a tool to dump (paragraph data of articles in) a Wikipedia dump to play utf8 text. Any advice? Ruud -- Monitor your physical, virtual and cloud infrastructure from a single web console. Get in-depth

[Languagetool] word confusion

2012-11-17 Thread Ruud Baars
In Dutch, ther eis a lot of word confusion. An example: paardenwagen = a vehicle (wagen) to transport horses (paarden) paard-en-wagen = a wagon and horse combination It is possible to make ruls for these kind of confusions, but neither of the two is really wrong. One would just want to inform

Re: [Languagetool] Language detection

2012-11-17 Thread Ruud Baars
Daniel, that won't do. Dutch, even with spelling errors, should be distinquished from Afrikaans. I'll keep you informed. On 17-11-12 12:36, Daniel Naber wrote: On 17.11.2012, 07:49:29 Ruud Baars wrote: I need to find out the quality of distinction possible between old-fashiond Dutch

Re: [Languagetool] word confusion

2012-11-17 Thread Ruud Baars
Is there any info on how that works? It is not in the wiki. Ruud On 17-11-12 15:42, Daniel Naber wrote: On 17.11.2012, 12:05:14 Ruud Baars wrote: Would it be and idea to add A function for this, a bit like compoundrule, having its own data file? We already have de/wrongWordInContext.txt

Re: [Languagetool] Language detection

2012-11-17 Thread Ruud Baars
right from the start. So I am curious how it is trained and used. Ruud On 17-11-12 18:05, Susana Sotelo Docio wrote: Ruud Baars escribiu: Hi I tried to read the documentation, but that is very technical. Not a word about what it is really able to, and how it is trained. Would you know where I

[Languagetool] Language detection

2012-11-16 Thread Ruud Baars
Is LT using language detection from : https://code.google.com/p/language-detection/ Or something else? I am currently experimenting with this Google stuff, but the profiles for anything else then English seem to be relatively poor. Any opinions on this? Ruud

Re: [Languagetool] Language detection

2012-11-16 Thread Ruud Baars
, German, Afrikaans, Frysian etc., for better filtering of a corpus. Ruud On 17-11-12 00:16, Daniel Naber wrote: On 16.11.2012, 18:14:50 Ruud Baars wrote: Is LT using language detection from : https://code.google.com/p/language-detection/ We're using http://tika.apache.org Regards Daniel

Re: [Languagetool] Firefox extension

2012-11-13 Thread Ruud Baars
Daniel, I once started the server by starting the gui from the commandline, after having the server switched on. By testen every once and a while, i was able to kille the process entirely, and start the gui (and server again). Problem is to let the port be freed entirely. Maybe it is of help?

[Languagetool] Runons, runoffs and runwrongs

2012-11-12 Thread Ruud Baars
Does anyone have a nice solution for multi-word problems like /inter net/ instead of /internet/ (This is nice covered by the compoundRule) /wordproblem/ instead of /word problem/ I know spellchecking can do that, but it is dangerous to have a space suggested in compounding languages like

Re: [Languagetool] bug: rules that never match

2012-11-12 Thread Ruud Baars
Daniel. For Dutch, the rule 813 is ok; it is the sentence splitter that needs an exception for these incorrrect abbreviations. It is currently tuned for only correct abbreviations. )./token rule break=no beforebreak\b(mm|cm|km|mg|kg|h|kW|mW)\.\s/beforebreak

Re: [Languagetool] compounds for Dutch

2012-11-11 Thread Ruud Baars
Done. On 11-11-12 12:40, Daniel Naber wrote: On 11.11.2012, 07:03:08 Ruud Baars wrote: The * is good, but indeed; we still miss the differnce like Jan mentioned. I see - could you open a bug report / feature request for that? Regards Daniel

Re: [Languagetool] compounds for Dutch

2012-11-10 Thread Ruud Baars
Daniel, others I added to compound.txt: van-waarde-verklaring+ Having /van waarde verklaring/ leads to the suggestion of writing things together; /vanwaarde verklaring/ does not, however, nor does / van waardeverklaring/. That is by design? Luckily, most of these are reported by the spell

Re: [Languagetool] compounds for Dutch

2012-11-10 Thread Ruud Baars
Daniel, would you be willing to think about addint the opposite the + at the end: # + at the end of the line will turn off the suggestion that # uses a hyphen In some case we would want to only suggest the option with the hyphen: # - at the end of the line will turn off the suggestion that #

Re: [Languagetool] compounds for Dutch

2012-11-10 Thread Ruud Baars
The * is good, but indeed; we still miss the differnce like Jan mentioned. I don't know the impact, but it would be nice to have. Ruud On 11-11-12 01:35, Jan Schreiber wrote: Daniel Naber wrote: On 10.11.2012, 18:37:36 Ruud Baars wrote: In some case we would want to only suggest

Re: [Languagetool] Firefox extension

2012-11-09 Thread Ruud Baars
Both tools require or request a local server. Is that caused by not having (scalable) capacity? Any idea what would be needed for a central server (farm). Ruud On 09-11-12 12:15, Daniel Naber wrote: On 09.11.2012, 11:00:54 Marco A.G.Pinto wrote: What I would like to have would be a

Re: [Languagetool] compounds for Dutch

2012-11-09 Thread Ruud Baars
Then the translations might be: Koppeltekenprobleem Woorden die aaneen horen met koppeltekens, bijvoorbeeld 'zee-egel' i.p.v. 'zee egel'. Ruud On 09-11-12 19:53, Daniel Naber wrote: On 09.11.2012, 19:37:53 Ruud Baars wrote: Hyphenation problem A short error description used in LO/OO's

[Languagetool] getting fsa operational

2012-10-16 Thread Ruud Baars
Guys, How do I setup the software to generate postag dictionaries? Marcin once handed me a set, but that got lost over the years, changing computers etc. Ruud -- Don't let slow site performance ruin your business.

[Languagetool] Postags for compunds with -

2012-10-14 Thread Ruud Baars
Though the input for fsa surely has 13-jarig as a base word for 13-jarige, when I look at the tag output of TL, is see jarig used as the base word for 13-jarige. That is not exact. It is close, but not correct when using the relations to suggest alternatives using the postag. (13-jarig would

Re: [Languagetool] Postags for compunds with -

2012-10-14 Thread Ruud Baars
Watch the difference between 2 examples, very much alike, apart from the number-letter difference: 16-jarige[jarig/AJe] 17e-eeuwse[17e-eeuws/AJe] Ruud On 14-10-12 21:44, Ruud Baars wrote: Though the input for fsa surely has 13-jarig as a base word for 13-jarige, when I look at the tag output

Re: [Languagetool] Breton speller (and general tokenization issue)

2012-07-01 Thread Ruud Baars
Marcin, For Dutch, tokenisation on - would be really wrong, since it is a real word character, sometime required, sometimes optional. Dutch has the phenomenon of 'klinkerbotsing' (sonant collision) when two single sonants get glued together, and can be mistaken for one of the two-charcter

Re: [Languagetool] Release LanguageTool 1.8 in progress

2012-06-30 Thread Ruud Baars
There is no Oracle Java package available. But adding the libreoffice-java-common solved it. Ruud Op 30-06-12 16:47, Daniel Naber schreef: On Samstag, 30. Juni 2012, Ruud Baars wrote: I also get a java error when trying to install it. Ubuntu, installed default Java runtime yesterday. What

Re: [Languagetool] SimpleReplaceRule as spelling rule?

2012-06-29 Thread Ruud Baars
How do I invoke this rule from XML or otherwise (not java code) Ruud Op 29-06-12 15:53, Marcin Miłkowski schreef: Hi all, shouldn't we make also SimpleReplaceRule a spelling rule and display its results underlined in red? Regards, Marcin

Re: [Languagetool] Hunspell spellcheck performance

2012-06-23 Thread Ruud Baars
When words get longer, edit distance should increase too. A % maybe. In the matter of 'rep', the same issue is present. 1 rep is not enough: e.g. the k/c mutation for dutch : kokos vs cocos. Multiple replacements in one word could be needed. (It is a change request for Hunspell too). Also.

Re: [Languagetool] Wordlists for spellers?

2012-06-22 Thread Ruud Baars
There is a words list for Dutch at www.opentaal.org. Relatively small, just 450.000 words, including proper names, but it is the best open one there is. Ruud Marcin Miłkowski list-addr...@wp.plschreef: Hi all, considering that hunspell is so slow and that for many languages, we already have

Re: [Languagetool] Wordlists for spellers?

2012-06-22 Thread Ruud Baars
Aspell for debian is generated from the same source. But other aspell versions are not. Better use the real, officially aproved list; it has te certificate of the Dutch language union; the govt in fact! Marcin Miłkowski list-addr...@wp.plschreef: Ruud, what about aspell-nl? Is that the same?

Re: [Languagetool] Wordlists for spellers?

2012-06-22 Thread Ruud Baars
, I might be the only one using it as a service of from the command line, later this month. Ruud Marcin W dniu 2012-06-22 14:05, Ruud Baars pisze: Aspell for debian is generated from the same source. But other aspell versions are not. Better use the real, officially aproved list; it has te

Re: [Languagetool] Hunspell spellcheck performance

2012-06-22 Thread Ruud Baars
Dominiqu, Marcin once did that trick for Dutch, allowing for single quotes within a word. Might check that part of the code. Dutch even has the /Jans'/ (owned by Jans) as a word and /'s morgens/; in both cases these are not quotes, but apostrophes. Ruud Op 22-06-12 20:31, Dominique Pellé

Re: [Languagetool] Hunspell spellcheck performance

2012-06-22 Thread Ruud Baars
This might also be different for words rigjht in the Hunspell dictionary, versus words formed by affixes, and more so, words recognized using compounding. So be careful with these results ... Ruud Op 22-06-12 21:46, Marcin Miłkowski schreef: W dniu 2012-06-22 20:31, Dominique Pellé pisze:

Re: [Languagetool] How to enable spellchecking?

2012-06-05 Thread Ruud Baars
I am willing to help starting a better dic. Time is limited though. Since we are both in the Netherlands,(?), getting together is the fast option. Ruud Marcin Miłkowski list-addr...@wp.plschreef: W dniu 2012-06-04 22:51, Dominique Pellé pisze: Hi Another problem with spell checking, is the

Re: [Languagetool] How to enable spellchecking?

2012-06-05 Thread Ruud Baars
For Europe, one should use British. For worldwide: US. I think this is a gov viewpoint, at least in Holland Daniel Naber list2...@danielnaber.deschreef: On Sonntag, 3. Juni 2012, Marcin Miłkowski wrote: I only added de-DE dictionary (frami version) because adding Swiss and Austrian

Re: [Languagetool] How to enable spellchecking?

2012-06-04 Thread Ruud Baars
...@gmail.com napisał: There is no command line for hunspell library. It's not executed as standalone program. 04-06-2012 11:44 użytkownik Ruud Baars baar...@xs4all.nl napisał: Command line hunspell needs utf8 to be specified. Ruud Marcin Miłkowski list-addr...@wp.plschreef: W dniu 2012-06

  1   2   >