Re: Korean Tokenizer in solr

2014-07-14 Thread Poornima Jay
I have upgrade the solr version to 4.8.1. But after making changes in the schema file i am getting the below error Error instantiating class: 'org.apache.lucene.analysis.cjk.CJKBigramFilterFactory' I assume CJKBigramFilterFactory and CJKFoldingFilterFactory are supported in 4.8.1. Do I need to

Re: Korean Tokenizer in solr

2014-07-14 Thread Alexandre Rafalovitch
You sure, it's not a spelling error or something other weird like that? Because Solr ships with that filter in it's example schema: filter class=solr.CJKBigramFilterFactory/ So, you can compare what you are doing differently with that. Regards, Alex. Personal:

Re: Korean Tokenizer in solr

2014-07-14 Thread Poornima Jay
Yes, Below is my defined fieldtype fieldType name=text_match_phrase_cjk class=solr.TextField positionIncrementGap=100       analyzer type =index          tokenizer class=solr.ICUTokenizerFactory/          filter class=solr.CJKBigramFilterFactory indexUnigrams=true han=true/          filter

Re: Korean Tokenizer in solr

2014-07-14 Thread Alexandre Rafalovitch
What happens if you have a new collection with absolute minimum in it and then add the definition? Start from something like: https://github.com/arafalov/simplest-solr-config . Also, is there a long exception earlier in a log. It may have more clues. Regards, Alex. Personal:

Re: Korean Tokenizer in solr

2014-07-14 Thread Poornima Jay
When I am trying to index the below error comes java.io.FileNotFoundException: /home/searchuser/multicore/apac_content/data/tlog/tlog.000 (No such file or directory) On Monday, 14 July 2014 2:07 PM, Poornima Jay poornima...@rocketmail.com wrote: Yes, Below is my

Re: Solr irregularly having QTime 50000ms, stracing solr cures the problem

2014-07-14 Thread Harald Kirsch
Thanks IJ for the link. I am not sure this can solve my problem, because I have only one machine in play anyway. Harald. On 12.07.2014 20:49, IJ wrote: GUess - I had the same issues as you. Was resolved

Re: Reference numbers for major page fauls per seconds, index size, query throughput

2014-07-14 Thread Harald Kirsch
Hello Erik, thanks for the reply. Indeed the CPUs are kind of idling during the load test. They are not 20% but clearly don't get far beyond 40%. Changing the number of threads in jmeter has minor effects only on the qps, but increases the average latency, as soon as the threads outnumber

Re: Solr irregularly having QTime 50000ms, stracing solr cures the problem

2014-07-14 Thread Harald Kirsch
This problem seems to completely disappear under load. I started making load tests despite fearing them to be useless. It turns out that there are no more 5 ms delays under load. Harald. On 09.07.2014 09:50, Harald Kirsch wrote: Good point. I will see if I can get the necessary access

Of, To, and Other Small Words

2014-07-14 Thread Teague James
Hello all, I am working with Solr 4.9.0 and am searching for phrases that contain words like of or to that Solr seems to be ignoring at index time. Here's what I tried: curl http://localhost/solr/update?commit=true -H Content-Type: text/xml --data-binary 'adddocfield name=id100/fieldfield

Re: Of, To, and Other Small Words

2014-07-14 Thread Anshum Gupta
Hi Teague, The StopFilterFactory (which I think you're using) by default uses lang/stopwords_en.txt (which wouldn't be empty if you check). What you're looking at is the stopword.txt. You could either empty that file out or change the field type for your field. On Mon, Jul 14, 2014 at 12:53 PM,

Re: Of, To, and Other Small Words

2014-07-14 Thread Jack Krupansky
Or, if you happen to leave off the words attribute of the stop filter (or misspell the attribute name), it will use the internal Lucene hardwired list of stop words. -- Jack Krupansky -Original Message- From: Anshum Gupta Sent: Monday, July 14, 2014 4:03 PM To:

Strategies for effective prefix queries?

2014-07-14 Thread Hayden Muhl
I'm working on using Solr for autocompleting usernames. I'm running into a problem with the wildcard queries (e.g. username:al*). We are tokenizing usernames so that a username like solr-user will be tokenized into solr and user, and will match both sol and use prefixes. The problem is when we

RE: Of, To, and Other Small Words

2014-07-14 Thread Teague James
Hi Anshum, Thanks for replying and suggesting this, but the field type I am using (a modified text_general) in my schema has the file set to 'stopwords.txt'. fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index

Re: Of, To, and Other Small Words

2014-07-14 Thread Alexandre Rafalovitch
Have you tried the Admin UI's Analyze screen. Because it will show you what happens to the text as it progresses through the tokenizers and filters. No need to reindex. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources: http://www.solr-start.com/ and

Re: Strategies for effective prefix queries?

2014-07-14 Thread Alexandre Rafalovitch
Search against both fields (one split, one not split)? Keep original and tokenized form? I am doing something similar with class name autocompletes here: https://github.com/arafalov/Solr-Javadoc/blob/master/JavadocIndex/JavadocCollection/conf/schema.xml#L24 Regards, Alex. Personal:

RE: Of, To, and Other Small Words

2014-07-14 Thread Teague James
Jack, Thanks for replying and the suggestion. I replied to another suggestion with my field type and I do have filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt /. There's nothing in the stopwords.txt. I even cleaned out stopwords_en.txt just to be certain. Any other

RE: Of, To, and Other Small Words

2014-07-14 Thread Teague James
Alex, Thanks! Great suggestion. I figured out that it was the EdgeNGramFilterFactory. Taking that out of the mix did it. -Teague -Original Message- From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] Sent: Monday, July 14, 2014 9:14 PM To: solr-user Subject: Re: Of, To, and Other

Re: Of, To, and Other Small Words

2014-07-14 Thread Alexandre Rafalovitch
You could try experimenting with CommonGramsFilterFactory and CommonGramsQueryFilter (slightly different). There is actually a lot of cool analyzers bundled with Solr. You can find full list on my site at: http://www.solr-start.com/info/analyzers Regards, Alex. Personal:

Re: External File Field eating memory

2014-07-14 Thread Apoorva Gaurav
Hey Kamal, What all config changes have you done to establish replication of external files and how have you disabled role reloading? On Wed, Jul 9, 2014 at 11:30 AM, Kamal Kishore Aggarwal kkroyal@gmail.com wrote: Hi All, It was found that external file, which was getting replicated