How to index large set data

2009-05-21 Thread Jianbin Dai
Hi, I have about 45GB xml files to be indexed. I am using DataImportHandler. I started the full import 4 hours ago, and it's still running My computer has 4GB memory. Any suggestion on the solutions? Thanks! JB

Phrase Search Issue

2009-05-21 Thread dabboo
Hi, I am facing one issue in phrase query. I am entering 'Top of the world' as my search criteria. I am expecting it to return all the records in which, one field should all these words in any order. But it is treating as OR and returning all the records, which are having either of these

Re: Phrase Search Issue

2009-05-21 Thread dabboo
This problem is related with the default operator in dismax. Currently OR is the default operator and it is behaving perfectly fine. I have changed the default operator in schema.xml to AND, I also have changed the minimum match to 100%. But it seems like AND as default operator doesnt work with

what does the version parameter in the query mean?

2009-05-21 Thread Anshuman Manur
Hello all, I'm using Solr 1.3.0, and when I query my index for solr using the admin page, the query string in the address bar of my browser reads like this: http://localhost:8080/solr/select/?q=solrversion=2.2start=0rows=10indent=on Now, I don't know what version=2.2 means, and the wiki or the

Re: How to change the weight of the fields ?

2009-05-21 Thread Vincent Pérès
It seems I can only search on the field 'text'. With the following url : http://localhost:8983/solr/select/?q=novelqt=dismaxfl=title_s,idversion=2.2start=0rows=10indent=ondebugQuery=on I get answers, but on the debug area, it seems it's only searching on the 'text' field (with or without 'qt'

Strange Phrase Query Issue with Dismax

2009-05-21 Thread dabboo
Hi, I am facing very strange issue on solr, not sure if it is already a bug. If I am searching for 'Top 500' then it returns all the records which contains either of these anywhere, which is fine. But if I search for 'Top 500 Companies' in any order, it gives me all those records, which

Re: java.lang.RuntimeException: after flush: fdx size mismatch

2009-05-21 Thread Michael McCandless
On Wed, May 20, 2009 at 11:18 AM, James X hello.nigerian.spamm...@gmail.com wrote: Hi Mike, thanks for the quick response: $ java -version java version 1.6.0_11 Java(TM) SE Runtime Environment (build 1.6.0_11-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.0-b16, mixed mode) I hadn't

Re: java.lang.RuntimeException: after flush: fdx size mismatch

2009-05-21 Thread Michael McCandless
Another question: are there any other exceptions in your logs? Eg problems adding certain documents, or anything? Mike On Wed, May 20, 2009 at 11:18 AM, James X hello.nigerian.spamm...@gmail.com wrote: Hi Mike, thanks for the quick response: $ java -version java version 1.6.0_11 Java(TM)

Re: java.lang.RuntimeException: after flush: fdx size mismatch

2009-05-21 Thread Michael McCandless
If you're able to run a patched version of Lucene, can you apply the attached patch, run it, get the issue to happen again, and post back the resulting exception? It only adds further diagnostics to that RuntimeException you're hitting. Another thing to try is turning on assertions, which may

Re: best way to cache base queries (before application of filters)

2009-05-21 Thread Yonik Seeley
On Thu, May 21, 2009 at 3:30 AM, Kent Fitch kent.fi...@gmail.com wrote: #2) Your problem might be able to be solved with field collapsing on the category field in the future (but it's not in Solr yet). Sorry - I didnt understand this A single relevancy search, but group or collapse results

Re: master/slave failure scenario

2009-05-21 Thread nk 11
Just curious. What would be the disadvantages of a no replication / multi master (no slave) setup? The client code should do the updates for evey master ofc, but if one machine would fail then I can imediatly continue the indexing process and also I can query the index on any machine for a valid

Customizing SOLR-236 field collapsing

2009-05-21 Thread Marc Sturlese
Hey there, I have been testing the last adjacent field collapsing patch in trunk and seems to work perfectly. I am trying to modify the function of it but don't know exactly how to do it. What I would like to do is instead of collapse the results send them to the end of the results cue. Aparently

Re: How to index large set data

2009-05-21 Thread Erick Erickson
This isn't much data to go on. Do you have any idea what your throughput is?How many documents are you indexing? one 45G doc or 4.5 billion 10 character docs? Have you looked at any profiling data to see how much memory is being consumed? Are you IO bound or CPU bound? Best Erick On Thu, May 21,

Re: Plugin Not Found

2009-05-21 Thread Jeff Newburn
Nothing else is in the lib directory but this one jar. Additionally, the logs seem to say that it finds the lib as shown below INFO: Solr home set to '/home/zetasolr/' May 20, 2009 10:16:56 AM org.apache.solr.core.SolrResourceLoader createClassLoader INFO: Adding

Re: master/slave failure scenario

2009-05-21 Thread Bryan Talbot
Indexing is usually much more expensive that replication so it won't scale well as you add more servers. Also, what would a client do if it was able to send the update to only some of the servers because others were down (for maintenance, etc)? -Bryan On May 21, 2009, at May 21,

RE: Creating a distributed search in a searchComponent

2009-05-21 Thread siping liu
I was looking for answer to the same question, and have similar concern. Looks like any serious customization work requires developing custom SearchComponent, but it's not clear to me how Solr designer wanted this to be done. I have more confident to either do it at Lucene level, or stay on

Re: Plugin Not Found

2009-05-21 Thread Mark Miller
Jeff Newburn wrote: Nothing else is in the lib directory but this one jar. Additionally, the logs seem to say that it finds the lib as shown below INFO: Solr home set to '/home/zetasolr/' May 20, 2009 10:16:56 AM org.apache.solr.core.SolrResourceLoader createClassLoader INFO: Adding

Re: Creating a distributed search in a searchComponent

2009-05-21 Thread Shalin Shekhar Mangar
On Wed, May 20, 2009 at 10:59 PM, Nick Bailey nicholas.bai...@rackspace.com wrote: Hi, I am wondering if it is possible to basically add the distributed portion of a search query inside of a searchComponent. I am hoping to build my own component and add it as a first-component to the

Re: Creating a distributed search in a searchComponent

2009-05-21 Thread Shalin Shekhar Mangar
Also look at SOLR-565 and see if that helps you. https://issues.apache.org/jira/browse/SOLR-565 On Thu, May 21, 2009 at 9:58 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Wed, May 20, 2009 at 10:59 PM, Nick Bailey nicholas.bai...@rackspace.com wrote: Hi, I am wondering if

Re: what does the version parameter in the query mean?

2009-05-21 Thread Jay Hill
I was interested in this recently and also couldn't find anything on the wiki. I found this in the list archive: The version parameter determines the XML protocol used in the response. Clients are strongly encouraged to ''always'' specify the protocol version, so as to ensure that the format of

No sanity checks before replicating files?

2009-05-21 Thread Damien Tournoud
Hi list, We have deployed an experimental Solr 1.4 cluster (a master/slave setup, with automatic promotion of the slave as a master in case of failure) on drupal.org, to manage our medium size index (3GB, about 400K documents). One of the problem we are facing is that there seems to be no sanity

Re: master/slave failure scenario

2009-05-21 Thread nk 11
You are right... I just don't like the idea of stopping the indexing process if the master fails until a new one is started (more or less by hand). On Thu, May 21, 2009 at 6:49 PM, Bryan Talbot btal...@aeriagames.comwrote: Indexing is usually much more expensive that replication so it won't

Re: Customizing SOLR-236 field collapsing

2009-05-21 Thread Marc Sturlese
Yes, I have tried it but I see couple of problems doing that. I will have to do more searches so response time will increase. The second thing is that, imagine I show the results collapsed in page one and put a button to see the non collapsed results. If later results for the second page are

Re: master/slave failure scenario

2009-05-21 Thread Otis Gospodnetic
Hi, You should be able to do the following. Put masters behind a load balancer (LB). Create a LB VIP and a pool with 2 masters, masterA masterB with a rule that all requests always go to A unless A is down. If If A is down they go to B. Bring up master instances A and B on 2 servers and make

Re: No sanity checks before replicating files?

2009-05-21 Thread Otis Gospodnetic
Hi Damien, Interesting, this is similar to my suggestion to another person I just replied to here on solr-user. Have you actually run into this problem? I haven't tried it, but I'd think the first next replication (copying index from s1 to s2) would not necessarily fail, but would simply

clustering SOLR-769

2009-05-21 Thread Allahbaksh Asadullah
Hi, I built Solr from SVN today morning. I am using Clustering example. I have added my own schema.xml. The problem is the even though I change carrot.snippet field from features to filecontent the clustering results are not changed a bit. Please note features field is also there in my document.

Re: No sanity checks before replicating files?

2009-05-21 Thread Damien Tournoud
Hi Otis, Thanks for your answer. On Thu, May 21, 2009 at 7:14 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Interesting, this is similar to my suggestion to another person I just replied to here on solr-user. Have you actually run into this problem?  I haven't tried it, but I'd

Re: How to change the weight of the fields ?

2009-05-21 Thread Otis Gospodnetic
Hi, I'm not sure why the rest of the scoring explanation is not shown, but your query *was* expanded to search on text and title_s, and id fields, so I think that expanded/rewritten query is what went to the index. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch -

Re: Phrase Search Issue

2009-05-21 Thread Otis Gospodnetic
Amit, Append debugQuery=true to the search request URL and you'll see how your query string was interpreted. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: dabboo ag...@sapient.com To: solr-user@lucene.apache.org Sent: Thursday, May

Re: No sanity checks before replicating files?

2009-05-21 Thread Otis Gospodnetic
Aha, I see. Perhaps you can post the error message/stack trace? As for the sanity check, I bet a call to http://host:port/solr/replication?command=indexversion could be used ensure only newer versions of the index are being pulled. We'll see what Paul says when he wakes up. :) Otis --

Re: Plugin Not Found

2009-05-21 Thread Jeff Newburn
One additional note we are on 1.4 tunk as of 5/7/2009. Just not sure why it won't load since it obviously works fine if directly inserted into the WEB-INF directory. -- Jeff Newburn Software Engineer, Zappos.com jnewb...@zappos.com - 702-943-7562 From: Mark Miller markrmil...@gmail.com

Regarding Delta-Import Query in DIH

2009-05-21 Thread jayakeerthi s
Hi All, I understand from the details provided under http://wiki.apache.org/solr/DataImportHandler regarding Delta-import that there should be an additional column *last_modified* of timestamp type in the table. Is there any other way/method the same can be achieved without creating the

Re: Plugin Not Found

2009-05-21 Thread Grant Ingersoll
Can you share your full log (at least through startup) as well as the config for both the component and the ReqHandler that is using it? -Grant On May 21, 2009, at 3:37 PM, Jeff Newburn wrote: One additional note we are on 1.4 tunk as of 5/7/2009. Just not sure why it won't load since it

Re: clustering SOLR-769

2009-05-21 Thread Stanislaw Osinski
Hi. I built Solr from SVN today morning. I am using Clustering example. I have added my own schema.xml. The problem is the even though I change carrot.snippet field from features to filecontent the clustering results are not changed a bit. Please note features field is also there in my

Re: java.lang.RuntimeException: after flush: fdx size mismatch

2009-05-21 Thread James X
Hi Mike,Documents are web pages, about 20 fields, mostly strings, a couple of integers, booleans and one html field (for document body content). I do have a multi-threaded client pushing docs to Solr, so yes, I suppose that would mean I have several active Solr worker threads. The only

Re: Solr statistics of top searches and results returned

2009-05-21 Thread Grant Ingersoll
On May 20, 2009, at 4:33 AM, Shalin Shekhar Mangar wrote: On Wed, May 20, 2009 at 1:31 PM, Plaatje, Patrick patrick.plaa...@getronics.com wrote: At the moment Solr does not have such functionality. I have written a plugin for Solr though which uses a second Solr core to store/index the

Re: clustering SOLR-769

2009-05-21 Thread Allahbaksh Asadullah
Hi, I will try this. Because when I tried it with field declared by me there was no change. Will check out this and let you know. Is it possbile to specify more than one snippet field or should I use copy field to copy copy two or three field into single field and specify it in snippet field.

getting all rows from SOLRJ client using setRows method

2009-05-21 Thread darniz
Hello is there a way you can get all the results back from SOLR when querying solrJ client my gut feeling was that this might work query.setRows(-1) The way is to change the configuration xml file, but that like hard coding the configuration, and there also i have to set some valid number, i

Re: getting all rows from SOLRJ client using setRows method

2009-05-21 Thread Ryan McKinley
careful what you ask for... what if you have a million docs? will you get an OOM? Maybe a better solution is to run a loop where you grab a bunch of docs and then increase the start value. but you can always use: query.setRows( Integer.MAX_VALUE ) ryan On May 21, 2009, at 8:37 PM,

Re: what does the version parameter in the query mean?

2009-05-21 Thread Anshuman Manur
ahI see! thank you so much for the response! I'm using SolrJ, so I probably don't need to set XML version since the wiki tells me that it uses binary as a default! On Thu, May 21, 2009 at 10:00 PM, Jay Hill jayallenh...@gmail.com wrote: I was interested in this recently and also couldn't

lock problem

2009-05-21 Thread Ashish P
Hi, The scenario is I have 2 different solr instances running at different locations concurrently. The data location for both instances is same: \\hostname\FileServer\CoreTeam\Research\data. Both instances use EmbeddedSolrServer and locktype at both instances is 'single'. I am getting

Re: Regarding Delta-Import Query in DIH

2009-05-21 Thread Noble Paul നോബിള്‍ नोब्ळ्
the last_modified column is just one way. The query has to be intelligent enough to detect the delta . it doesn't matter how you do it On Fri, May 22, 2009 at 1:32 AM, jayakeerthi s mail2keer...@gmail.com wrote: Hi All, I understand from the details provided under

Re: No sanity checks before replicating files?

2009-05-21 Thread Noble Paul നോബിള്‍ नोब्ळ्
Let us see what is the desired behavior. When s1 comes back up online , s2 must download a fresh copy of index from s1 because s1 is the slave and s2 has a newer version of index than s1. Are you suggesting that s2 downloads the index files and then commit fails? The code is written as follows

Re: How to index large set data

2009-05-21 Thread Noble Paul നോബിള്‍ नोब्ळ्
check the status page of DIH and see if it is working properly. and if, yes what is the rate of indexing On Thu, May 21, 2009 at 11:48 AM, Jianbin Dai djian...@yahoo.com wrote: Hi, I have about 45GB xml files to be indexed. I am using DataImportHandler. I started the full import 4 hours

Re: How to index large set data

2009-05-21 Thread Jianbin Dai
Hi Paul, Thank you so much for answering my questions. It really helped. After some adjustment, basically setting mergeFactor to 1000 from the default value of 10, I can finished the whole job in 2.5 hours. I checked that during running time, only around 18% of memory is being used, and VIRT

Re: How to index large set data

2009-05-21 Thread Noble Paul നോബിള്‍ नोब्ळ्
what is the total no:of docs created ? I guess it may not be memory bound. indexing is mostly amn IO bound operation. You may be able to get a better perf if a SSD is used (solid state disk) On Fri, May 22, 2009 at 10:46 AM, Jianbin Dai djian...@yahoo.com wrote: Hi Paul, Thank you so much