Search Issue with Indexed Docs

2010-03-06 Thread Devin Austin
Hi all, Solr newb here. I'm attempting to index some docs and then search for them using the usual XML posts to send the document data to the app. The documents seem to be indexing as the numDocs under statistics seems to reflect the number of documents I've POSTed. However, through no medium a

Re: how to boost first token

2010-03-06 Thread Erik Hatcher
I don't think there's a way to do this out of the box, but if there were a SpanFirstQuery parser plugin, it could be used as a dismax boosting query (bq) or as an optional _query_ expression. Erik On Mar 5, 2010, at 11:56 AM, Сергей Кашин wrote: I have some documents in Solr index

Re: multiCore

2010-03-06 Thread Erick Erickson
I've seen similar errors happen if you delete the *contents* of our index directory but not the directory itself. Just to be sure, stop/restart your SOLR instance if you manually delete your index. But the error I've seen when doing the above usually doesn't mention a specific character, so I'd g

Filter Query or Main Query or facetting?

2010-03-06 Thread MitchK
Hello community, I am not sure about what is the best way to handle the following problem: I have got an index, let's say with 2mio documents, and there is a check-field. The check-field contains on boolean values (TRUE/FALSE). What is the best way to query only documents with a TRUE check-value

test mail... my mails to solr-user@lucene.apache.org are bouncing ... sorry for any inconvenience

2010-03-06 Thread Mark Fletcher
Hi, Users pls ignore this mail. I am just sending a test mail to check whether my user id is okay. The mails I am sending to this group is bouncing from yesterday.Pls excuse me for any inconvenience. Thanks and Rgds, Mark

index merge

2010-03-06 Thread Mark Fletcher
Hi, I have a doubt regarding Index Merging:- I have set up 2 cores COREX and COREY. COREX - always serves user requests COREY - gets updated with the latest values (dataDir is in a different location from COREX) I tried merging coreX and coreY at the end of COREY getting updated with the latest d

Re: Search Issue with Indexed Docs

2010-03-06 Thread Erick Erickson
At a guess, you're looking in the default field for the letter "i", which has probably been removed at indexing time because it is a stopword. Unless you specify a field (e.g. q=field:value), the search goes against your default field (specified in schema). Two very useful tools are : the solr adm

Re: Filter Query or Main Query or facetting?

2010-03-06 Thread Erick Erickson
H, Why isn't q helpful? You can specify field:value pairs for a q clause. so you can pretty easily tack on an AND check:true. I'd try that and measure performance before trying more complex solutions Or do I misunderstand the problem? HTH Erick On Sat, Mar 6, 2010 at 8:21 AM, MitchK wr

Re: facet on null value

2010-03-06 Thread Yonik Seeley
On Thu, Mar 4, 2010 at 10:33 PM, Lance Norskog wrote: > I have added facet.limit=5 to the above to make this easier. Here is > the part of the response: > > > - >   > - > - >  0 >  0 >  0 >  0 >  0 >  2 >   >   >   >   > > (What is the 2?) That's what happens when you have a null name/key in

Re: Search Issue with Indexed Docs

2010-03-06 Thread Devin Austin
On Sat, Mar 6, 2010 at 7:34 AM, Erick Erickson wrote: > At a guess, you're looking in the default field for the letter "i", which > has probably been removed at indexing time because it is a > stopword. Unless you specify a field (e.g. q=field:value), the search > goes against your default field (

SOLR takes more than 9 hours to index 300000 rows

2010-03-06 Thread JavaGuy84
Hi, I am facing performance issue in SOLR when indexing huge data. Please find below the stats, 8:57:17.334 42778 273725 42775 0 Indexing of 273725 rows is taking almost 9 hours. Please find below my Data config file

Re: SOLR takes more than 9 hours to index 300000 rows

2010-03-06 Thread Shawn Heisey
At the 9+ hour mark, is your database server showing active connections that are sending data, or is all the activity local to SOLR? We have a 40 million row database in MySQL, with each row comprising more than 80 fields. I'm including the config from one of our shards. There are about 6.6

Re: Filter Query or Main Query or facetting?

2010-03-06 Thread MitchK
Yes, that's possible. However I thought, that the normal-q-param forces Solr to lookup every check-field whereas it is true or false. So I am looking for something like a tree that devides the index into two pieces - true and false. So Solr do not need to lookup the check-field anymore, because

Re: SOLR takes more than 9 hours to index 300000 rows

2010-03-06 Thread Shawn Heisey
I have now learned more about the CachedSqlEntityProcessor, so in theory it should be done with the server connections, but if it were me, I'd verify that. How big are the resultsets from those queries? SOLR has to put all the table data into RAM, which is very likely going to take considerab

Re: SOLR takes more than 9 hours to index 300000 rows

2010-03-06 Thread JavaGuy84
Shawn, Thanks a lot for your response, Yes, still the DB connection is active.. It is still fetching the data from the DB. I am using Redhat MetaMatrix DB as backend and I am trying to find out the parameter for setting the JDBC fetch size.. Do you think that this problem will be mostly due t

Re: Search Issue with Indexed Docs

2010-03-06 Thread Erick Erickson
I think the root of your problem is the string type of your default field. That type is untokenized, so if you indexed "my name is erick", the *only* thing that would match is searching for exactly that. Searching for "erick" wouldn't match, nor anything besides the exact and entire value I su

Re: SOLR takes more than 9 hours to index 300000 rows

2010-03-06 Thread Shawn Heisey
Do keep looking into the batchSize, but I think I might have found the issue. If I understand things correctly, you will need to add processor="CachedSqlEntityProcessor" to your first entity. It's only specified on the other two. Assuming you have enough RAM and heap space available in your

Re: SOLR takes more than 9 hours to index 300000 rows

2010-03-06 Thread JavaGuy84
Shawn, Please find below the resulset size of each query, select objectuid as uid, objectid, objecttype, objectname, repositoryname, a.lastupdateddate from MetaModel.POC.Object a, MetaModel.POC.Repository b where a.repositoryid = b.repositoryid ---> 30 rows ---> Query 1 select Objec

Re: Filter Query or Main Query or facetting?

2010-03-06 Thread Erick Erickson
The last thing I'd do is partition my index into two, unless and until I really *knew* I had speed problems. The added complexity isn't worth it and your index isn't huge, so search speed can probably be addressed without that complexity. Filter queries are probably your first choice here. Memory

Re: Filter Query or Main Query or facetting?

2010-03-06 Thread MitchK
Erick, your response was really helpfull - the problem is solved for the next time. However, there are two questions: Where do you know, that the bit-vector has a maximum size of 250k? Did I overlook something (because I have got an index of 2.000.000 documents)? Are there any theoretical docu

Re: Filter Query or Main Query or facetting?

2010-03-06 Thread Erick Erickson
the 250K is an approximation, (total number of docs)/8. As in one bit per document. Really, all a filter is is a bit-vector where each bit represents whether the doc ID represented by that bit should be included in the results or not. Technically, it's the (largest doc id)/8 where (largest doc id)

Re: Search Issue with Indexed Docs

2010-03-06 Thread Devin Austin
On Sat, Mar 6, 2010 at 12:13 PM, Erick Erickson wrote: > I think the root of your problem is the string type of your default > field. That type is untokenized, so if you indexed > "my name is erick", the *only* thing that would match > is searching for exactly that. Searching for "erick" wouldn't

Is it possible to use ODBC with DIH?

2010-03-06 Thread JavaGuy84
Hi, I have a ODBC driver with me for MetaMatrix DB(Redhat). I am trying to figure out a way to use DIH using the DSN which has been created in my machine with that ODBC driver? Is it possible to spcify a DSN in DIH and index the DB? if its possible, can you please let me know the ODBC URL that I

Re: Search Issue with Indexed Docs

2010-03-06 Thread Devin Austin
On Sat, Mar 6, 2010 at 4:20 PM, Devin Austin wrote: > > > On Sat, Mar 6, 2010 at 12:13 PM, Erick Erickson > wrote: > >> I think the root of your problem is the string type of your default >> field. That type is untokenized, so if you indexed >> "my name is erick", the *only* thing that would mat

Re: Search Issue with Indexed Docs

2010-03-06 Thread Erick Erickson
Did you reindex all your data and commit it afterward? Erick On Sat, Mar 6, 2010 at 7:01 PM, Devin Austin wrote: > On Sat, Mar 6, 2010 at 4:20 PM, Devin Austin > wrote: > > > > > > > On Sat, Mar 6, 2010 at 12:13 PM, Erick Erickson >wrote: > > > >> I think the root of your problem is the strin

Re: Search Issue with Indexed Docs

2010-03-06 Thread Devin Austin
On Sat, Mar 6, 2010 at 6:04 PM, Erick Erickson wrote: > Did you reindex all your data and commit it afterward? > > Erick > > On Sat, Mar 6, 2010 at 7:01 PM, Devin Austin > wrote: > > > On Sat, Mar 6, 2010 at 4:20 PM, Devin Austin > > wrote: > > > > > > > > > > > On Sat, Mar 6, 2010 at 12:13 PM,

Re: If you could have one feature in Solr...

2010-03-06 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Fri, Mar 5, 2010 at 4:34 AM, Mark Miller wrote: > On 03/04/2010 05:56 PM, Chris Hostetter wrote: >> >> : The ability to read solr configuration files from the classpath instead >> of >> : solr.solr.home directory. >> >> Solr has always supported this. >> >> When SolrResourceLoader.openResourceL