Re: Don't snowball depending on terms

2011-11-29 Thread Tomas Zerolo
On Tue, Nov 29, 2011 at 01:53:44PM -0500, François Schiettecatte wrote: > It won't and depending on how your analyzer is set up the terms are most > likely stemmed at index time. > > You could create a separate field for unstemmed terms though, or use a less > aggressive stemmer such as EnglishM

Re: Number of threads increased after restart

2011-11-29 Thread Samarendra Pratap
Hi Could there be any possibility of index being in unstable state when I restarted the server? On Tue, Nov 29, 2011 at 5:21 PM, Samarendra Pratap wrote: > Hi, > I restarted solr server with minor configuration changes and I/O and load > on server increased a lot. > > Details:- > Solr version:

Re: how index words with their perfix in solr?

2011-11-29 Thread Erick Erickson
Stemming is imprefect since it's algorithmic. If you look at the admin/analysis page you can see the effects of the various steps (check the verbose box)... rainy stems to raini so it's not a match for rain. There will always be anomalies like this, you either have to handle them with exceptions

Re: Solr 3.5 very slow (performance)

2011-11-29 Thread Chris Hostetter
: : On both - solr 1.4 and 3.5 configuration of JVM is the same, and the : : same servlet container ... jetty-6 I forgot to ask: when you say it's jetty-6 in both cases, are you running the exact same set of jetty configs for both Solr 1.4 and Solr 3.5, or is each instance using the sample jett

Re: Solr 3.5 very slow (performance)

2011-11-29 Thread Chris Hostetter
: > previous Solr 1.4 index?  what does a directory listing (including file : > sizes) look like for both your old and new indexes? : : Yes, both indexes have same data. Indexes are build using some C++ : programm which reads data from database and inserts it into Solr : (using XML). Both indexes

Re: Solr 3.5 very slow (performance)

2011-11-29 Thread Jan Høydahl
Hi, Perhaps you could try to remove the luceneMatchVersion from the 3.5 solrconfig again and use same schema version as for 1.4; to more closely emulate behavior of 1.4. LuceneMatchVersion will modify several defaults. Quick way to see if any of these new defaults make a change. -- Jan Høydahl

Re: Solr 3.5 very slow (performance)

2011-11-29 Thread Pawel Rog
IO waits about 0-2% Didn't see any suspicious activity in logs, but I can check it again On Tue, Nov 29, 2011 at 11:40 PM, Darren Govoni wrote: > Any suspicous activity in the logs? what about disk activity? > > > On 11/29/2011 05:22 PM, Pawel Rog wrote: >> >> On Tue, Nov 29, 2011 at 9:13 PM, Chr

Re: Solr 3.5 very slow (performance)

2011-11-29 Thread Darren Govoni
Any suspicous activity in the logs? what about disk activity? On 11/29/2011 05:22 PM, Pawel Rog wrote: On Tue, Nov 29, 2011 at 9:13 PM, Chris Hostetter wrote: Let's back up a minute and cover some basics... 1) You said that you built a brand new index on a brand new master server, using Solr

Re: Solr 3.5 very slow (performance)

2011-11-29 Thread Pawel Rog
On Tue, Nov 29, 2011 at 9:13 PM, Chris Hostetter wrote: > > Let's back up a minute and cover some basics... > > 1) You said that you built a brand new index on a brand new master server, > using Solr 3.5 -- how do you build your indexes?  did the source data > change at all? does your new index ha

Re: Returning and faceting on some of the field's values

2011-11-29 Thread Jeff Schmidt
This does appear to work well. It seems there are not many people interested in this particular problem now, but I figured I'd just complete the story in case it helps somebody in the future. With the neighbor node ID prefixes, I'm getting the facet values and counts as I require. Since I ha

Re: Don't snowball depending on terms

2011-11-29 Thread Rob Brown
Yes, it looks like I'll have to do some pre-processing outside of Solr. I don't mind giving users the option to query a differently indexed field, ie, same content, but not stemmed, although this would apply to all keywords they enter, so they couldn't allow stemming on one keyword, but not anothe

Re: solr - http error 404 when requesting solrconfig.xml or schema.xml

2011-11-29 Thread Chris Hostetter
: To answer myself (sorry for the noise) - removed accidentally the admin handler section (only : ping was there) and thats causing the issue, after fixing this error, : all is fine again. Right, this handler... ...automaticly registers most of the /admin/* handlers

Re: Solr 3.5 very slow (performance)

2011-11-29 Thread Chris Hostetter
Let's back up a minute and cover some basics... 1) You said that you built a brand new index on a brand new master server, using Solr 3.5 -- how do you build your indexes? did the source data change at all? does your new index have the same number of docs as your previous Solr 1.4 index? wha

Re: Solr 3.5 very slow (performance)

2011-11-29 Thread Pawel Rog
in my last pos i mean default operation AND promoted - int ending - int b_count - int name - text cat1 - int cat2 - int On Tue, Nov 29, 2011 at 7:54 PM, Pawel Rog wrote: > examples > > facet=true&sort=promoted+desc,ending+asc,b_count+desc&facet.mincount=1&start=0&q=name:(kurtka+skóry+brazowe42)&f

Re: how index words with their perfix in solr?

2011-11-29 Thread François Schiettecatte
You might try the snowball stemmer too, I am not sure how closely that will fit your requirements though. Alternatively you could use synonyms. François On Nov 29, 2011, at 1:08 AM, mina wrote: > thank you for your answer.i read it and i use this filter in my schema.xml in > solr: > > > > b

Re: Solr 3.5 very slow (performance)

2011-11-29 Thread Pawel Rog
examples facet=true&sort=promoted+desc,ending+asc,b_count+desc&facet.mincount=1&start=0&q=name:(kurtka+skóry+brazowe42)&facet.limit=500&facet.field=cat1&facet.field=cat2&wt=json&rows=50 facet=true&sort=promoted+desc,ending+asc,b_count+desc&facet.mincount=1&start=1350&q=name:naczepa&facet.limit=50

Re: Don't snowball depending on terms

2011-11-29 Thread François Schiettecatte
It won't and depending on how your analyzer is set up the terms are most likely stemmed at index time. You could create a separate field for unstemmed terms though, or use a less aggressive stemmer such as EnglishMinimalStemFilterFactory. François On Nov 29, 2011, at 12:33 PM, Robert Brown wro

Splitting Words but retaining offsets

2011-11-29 Thread Geetu Ambwani
Does solr include the functionality to split words but give the same offsets to both words. So for instance if I have a word like "pop-dop", and I split on "-", I get two words "pop" and "dop" but both have offsets start and end 0,3 instead of 0,2 followed by (4,6) Thanks ! Geetu

Re: How to configure /select handler ?

2011-11-29 Thread Chris Hostetter
: Another newbie question here : Browse handler works perfect. Now I want to configure my "/select" handler : so that I perform ajax-solr on it. : How to perform it. The website : https://github.com/evolvingweb/ajax-solr : https://github.com/evolvingweb/ajax-solr : explains how t

Re: Solr 3.5 very slow (performance)

2011-11-29 Thread Yonik Seeley
On Tue, Nov 29, 2011 at 12:25 PM, Pawel wrote: > I've build index on solr 1.4 some time ago (about 18milions documents, > about 8GB). I need new features from newer version of solr, so i > decided to upgrade solr version from 1.4 to 3.5. > > * I created new solr master on new physical machine > *

Don't snowball depending on terms

2011-11-29 Thread Robert Brown
Is it possible to search a field but not be affected by the snowball filter? ie, searching for "manage" is matching "management", but a user may want to restrict results to only containing "manage". I was hoping that simply quoting the term would do this, but it doesn't appear to make any di

Solr 3.5 very slow (performance)

2011-11-29 Thread Pawel
I've build index on solr 1.4 some time ago (about 18milions documents, about 8GB). I need new features from newer version of solr, so i decided to upgrade solr version from 1.4 to 3.5. * I created new solr master on new physical machine * then I created new index using the same schema as in earlie

Re: Seek past EOF

2011-11-29 Thread Mark Miller
Hmm...I've seen a bug like this, but I don't think it would be tickled if you are replicating config files... It def looks related though ... I'll try to dig around. Next time it happens, take a look on the slave for 0 size files - also if the index dir on the slave is plain 'index' or has a time

Re: Seek past EOF

2011-11-29 Thread Ruben Chadien
Hi, for the moment there are no 0 sized files, but all indexes are working now. I will have to look next time it breaks. Yes, the directory name is "index" and it replicates the schema and a synonyms file. /Ruben Chadien On 29 November 2011 15:29, Mark Miller wrote: > Also, on your master, what

Re: Weird docs-id clustering output in Solr 1.4.1

2011-11-29 Thread Vadim Kisselmann
Hi, the quick and dirty way sound good:) It would be great if you can send me a patch for 1.4.1. By the way, i tested Solr. 3.5 with my 1.4.1 test index. I can search and optimize, but clustering doesn't work (java.lang.Integer cannot be cast to java.lang.String) My uniqieKey for my docs it the "

Re: PatternTokenizer failure

2011-11-29 Thread Michael Kuhlmann
Am 29.11.2011 15:20, schrieb Erick Erickson: Hmmm, I tried this in straight Java, no Solr/Lucene involved and the behavior I'm seeing is that no example works if it has more than one whitespace character after the hyphen, including your failure example. I haven't lived inside regexes for long en

Re: Solr 4.0 Levenshtein distance algorithm for DirectSpellChecker

2011-11-29 Thread Robert Muir
On Tue, Nov 29, 2011 at 9:21 AM, elisabeth benoit wrote: > ok, thanks. > > I think it would be a nice improvment to consider inversion as distance = > 1, since it's a so common mistake. The distance = 2 makes it difficult to > correct transpositions on small words (for instance, the DirectSpellChe

Re: conditionally update document on unique id

2011-11-29 Thread Erick Erickson
Note that the original question was when working from a custom update request handler, you're not doing that at all... It's not clear to me whether doing queries ahead of time like you're doing is more or less speedy than a custom update request handler given that in the one case you're querying a

Re: Seek past EOF

2011-11-29 Thread Mark Miller
Also, on your master, what is the name of the index directory? Just 'index'? And are you replicating config files as well or no? On Nov 29, 2011, at 9:23 AM, Mark Miller wrote: > Does the problem index have any 0 size files in it? > > On Nov 29, 2011, at 2:54 AM, Ruben Chadien wrote: > >> HI

Re: Seek past EOF

2011-11-29 Thread Mark Miller
Does the problem index have any 0 size files in it? On Nov 29, 2011, at 2:54 AM, Ruben Chadien wrote: > HI all > > After upgrading tol Solr 3.4 we are having trouble with the replication. > The setup is one indexing master with a few slaves that replicate the > indexes once every night. > The la

Re: Solr 4.0 Levenshtein distance algorithm for DirectSpellChecker

2011-11-29 Thread elisabeth benoit
ok, thanks. I think it would be a nice improvment to consider inversion as distance = 1, since it's a so common mistake. The distance = 2 makes it difficult to correct transpositions on small words (for instance, the DirectSpellChecker couldn't make the right suggestion for "joile" given for 'joli

Re: PatternTokenizer failure

2011-11-29 Thread Erick Erickson
Hmmm, I tried this in straight Java, no Solr/Lucene involved and the behavior I'm seeing is that no example works if it has more than one whitespace character after the hyphen, including your failure example. I haven't lived inside regexes for long enough that I don't know what the right regex sho

Re: Weird docs-id clustering output in Solr 1.4.1

2011-11-29 Thread Stanislaw Osinski
> > But my actual live system works on solr 1.4.1. i can only change my > solrconfig.xml and integrate new packages... > i check the possibility to upgrade from 1.4.1 to 3.5 with the same index > (without reinidex) with luceneMatchVersion 2.9. > i hope it works... > Another option would be to chec

Re: Solr 4.0 Levenshtein distance algorithm for DirectSpellChecker

2011-11-29 Thread Robert Muir
On Tue, Nov 29, 2011 at 8:07 AM, elisabeth benoit wrote: > Hello, > > I'd like to know if the Levensthein distance algorithm used by Solr 4.0 > DirectSpellChecker (working quite well I must say) is considering an > inversion as distance = 1 or distance = 2? > > For instance, if I write Monteruil a

Re: Weird docs-id clustering output in Solr 1.4.1

2011-11-29 Thread Vadim Kisselmann
Hello Staszek, thanks for testing:) i think the same (serialization issue ->int to string). This config works fine with solr 4.0 in my test cluster, i think with 3,5 too, without problems. But my actual live system works on solr 1.4.1. i can only change my solrconfig.xml and integrate new packages

Error running ComplexPhraseQueryParser

2011-11-29 Thread meghana
hi all, I want to use wild query and fuzzy search together in my query. i installed ComplexPhraseQueryParser for that, but when running url with "defType=complexphrase" , i ger below error. HTTP Status 500 - luceneMatchVersion java.lang.NoSuchFieldError: luceneMatchVersion at org.apache.solr.sear

Solr 4.0 Levenshtein distance algorithm for DirectSpellChecker

2011-11-29 Thread elisabeth benoit
Hello, I'd like to know if the Levensthein distance algorithm used by Solr 4.0 DirectSpellChecker (working quite well I must say) is considering an inversion as distance = 1 or distance = 2? For instance, if I write Monteruil and I meant Montreuil, is the distance 1 or 2? Thanks, Elisabeth

Re: solr - http error 404 when requesting solrconfig.xml or schema.xml

2011-11-29 Thread Torsten Krah
To answer myself (sorry for the noise) - removed accidentally the admin handler section (only ping was there) and thats causing the issue, after fixing this error, all is fine again. Torsten smime.p7s Description: S/MIME cryptographic signature

Re: Weird docs-id clustering output in Solr 1.4.1

2011-11-29 Thread Stanislaw Osinski
Hi, It looks like some serialization issue related to writing integer ids to the output. I've just tried a similar configuration on Solr 3.5 and the integer identifiers looked fine. Can you try the same configuration on Solr 3.5? Thanks, Staszek On Tue, Nov 29, 2011 at 12:03, Vadim Kisselmann

Re: solr - http error 404 when requesting solrconfig.xml or schema.xml

2011-11-29 Thread Chantal Ackermann
Hi Torsten, some more information would help us to help you: - does calling /apps/solrslave/admin/ return the Admin Homepage? - what is the path to your SOLR_HOME - where in the filesystem are solrconfig.xml and schema.xml (even if this sounds redundant, maybe they are just misplaced) - their read

Number of threads increased after restart

2011-11-29 Thread Samarendra Pratap
Hi, I restarted solr server with minor configuration changes and I/O and load on server increased a lot. Details:- Solr version: 3.4 Jetty version: 6.1 Index Size: 30 GB Total document: 14.6 mn Request / second :20 (in day time) Last restart: 15 days ago Open threads: 50 - 60 (at any moment) A f

Weird docs-id clustering output in Solr 1.4.1

2011-11-29 Thread Vadim Kisselmann
Hi folks, i've installed the clustering component in solr 1.4.1 and it works, but not really:) You can see what the doc id is corrupt. Euro-Krise ½Íџ ¾౥ͽ ¿)ై ˆ࡯׸ my fields: and my config-snippets: title id text i changed my config snippets (carrot.url=id, url, title..) but the

Re: IllegalStateException, response already committed - replication related

2011-11-29 Thread Torsten Krah
Anyone an idea? regards smime.p7s Description: S/MIME cryptographic signature

solr - http error 404 when requesting solrconfig.xml or schema.xml

2011-11-29 Thread Torsten Krah
Hi, got some interesting problem and don't know how to debug further. I am using an external solr home configured via jndi. Deployed my war file (context is /apps/solrslave/) and if want to look at the schema: /apps/solrslave/admin/file/?contentType=text/xml;charset=utf-8&file=schema.xml the res