Re: WELCOME to solr-user@lucene.apache.org

2008-01-08 Thread bjorkgre
There are some instructions about integrating Nutch with Solr here: http://blog.foofactory.fi/2007/02/online-indexing-integrating-nutch-with.html Joakim "Otis Gospodnetic" <[EMAIL PROTECTED]> kirjoitti 9.1.2008: > Nutch and Solr work nice in tandem. We've used Nutch for its distributed > fet

Re: WELCOME to solr-user@lucene.apache.org

2008-01-08 Thread Otis Gospodnetic
Nutch and Solr work nice in tandem. We've used Nutch for its distributed fetching + parsing and related functionality and have used Solr to indexed the resulting text. What glued them together was Solrj, actually. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Origi

Re: Best practice for storing relational data in Solr

2008-01-08 Thread Ryan Grange
I've found that Solr running on modest hardware (a 2.4 GHz PC running Windows XP Pro for testing changes) is able to index about 23,000 records in under three minutes. Assuming you aren't going to make too many typos in your naming, you should be fine just doing the re-indexing. Try timing yo

Re: JNDI and multiple installations in SOLR/Tomcat

2008-01-08 Thread Chris Hostetter
Marc: sorry that no one seems to have replied to your meail untill now (holidays and all that). : I try to run three SOLR webapps. Tomcat starts up and the three SOLR instances : work perfectly well. However, in the logfiles I found a : : ... "INFO: no /solr/home in JNDI" (followed by a "SEVERE

Re: Min-Score Filter

2008-01-08 Thread Chris Hostetter
: Is there a way or a point in filtering all results bellow a certain score? : e.g. exclude all results bellow score Y.Thanks not out of the box, you'd hav to write aq custom request handler to do this ... but unless your custom request handler also changes the scoring forula drasticly, filteri

FW: Tomcat and Solr - out of memory

2008-01-08 Thread Lance Norskog
On Tomcat 5.5, an OutOfMemory on a query leaves the server in an OK state, and future queries work. But a facet query that runs out of ram does not free its undone state and all future requests get OutOfMemory. I have not tried the Solr 'luke' handler since it took 5 minutes to run on our index wh

RE: Tomcat and Solr - out of memory

2008-01-08 Thread Norskog, Lance
On Tomcat, an OutOfMemory on a query leaves the server in an OK state, and future queries work. But a facet query that runs out of ram does not free its undone state and all future requests get OutOfMemory. Lance -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On B

Re: Luke response format explained

2008-01-08 Thread Ryan McKinley
Robert Young wrote: Thanks, that is very helpfull. So, is there a way to find out the total number of distinct tokens, regardless of which field they're associated with? And to find which are most popular? nothing standard does that... the semantics of what it would mean get a little wierd -

Re: Luke response format explained

2008-01-08 Thread Yonik Seeley
On Jan 8, 2008 3:07 PM, Robert Young <[EMAIL PROTECTED]> wrote: > Thanks, that is very helpfull. So, is there a way to find out the > total number of distinct tokens, regardless of which field they're > associated with? No. A term in lucene consists of a field and value, so the same word in diff

Re: Luke response format explained

2008-01-08 Thread Robert Young
Thanks, that is very helpfull. So, is there a way to find out the total number of distinct tokens, regardless of which field they're associated with? And to find which are most popular? Cheers Rob On Jan 8, 2008 5:04 PM, Ryan McKinley <[EMAIL PROTECTED]> wrote: > numTerms counts the unique terms

Re: DisMax Syntax

2008-01-08 Thread s d
I may be mistaken, but this is not equivalent to my query.In my query i have matches for x1, matches for x2 without slope and/or boosting and then match to "x1 x2" (exact match) with slope (~) a and boost (b) in order to have results with exact match score better. The total score is the sum of all

Re: DisMax Syntax

2008-01-08 Thread Chris Hostetter
: User Query: x1 x2 : Desired query (Lucene): field:x1 x2 field:"x1 x2"~a^b : : In the standard handler the only way i saw how to make this work was: : field:x1 field:x2 field:"x1 x2"!a^b : : Now that i want to try the DisMax is there a way to implement this without : having duplicate fields? i.

Re: DisMax Syntax

2008-01-08 Thread Yonik Seeley
Please see http://wiki.apache.org/solr/DisMaxRequestHandler -Yonik On Jan 8, 2008 2:40 PM, s d <[EMAIL PROTECTED]> wrote: > User Query: x1 x2 > Desired query (Lucene): field:x1 x2 field:"x1 x2"~a^b > > In the standard handler the only way i saw how to make this work was: > field:x1 field:x2 field

Re: Solr Multicore

2008-01-08 Thread Ryan McKinley
The multicore stuff has changed around a bit since its debut (and may change some more before the final release) check: http://wiki.apache.org/solr/MultiCore There is no longer a 'SETASDEFAULT' action and all requests require the core name, so you will need: http://localhost:8983/solr/core0/

Re: Multicore request

2008-01-08 Thread Ryan McKinley
Jae Joo wrote: I have built two cores - core0 and core1. each core has different set of index. I can access core0 and core 1 by http://localhost:8983/solr/core[01]/admin/form.jsp. Is there any way to access multiple indexes with single query? nothing standard. From a custom RequestHandler,

DisMax Syntax

2008-01-08 Thread s d
User Query: x1 x2 Desired query (Lucene): field:x1 x2 field:"x1 x2"~a^b In the standard handler the only way i saw how to make this work was: field:x1 field:x2 field:"x1 x2"!a^b Now that i want to try the DisMax is there a way to implement this without having duplicate fields? i.e. since the fiel

Re: parallelmultisearcher in solr

2008-01-08 Thread Mike Klaas
On 8-Jan-08, at 4:34 AM, Heba Farouk wrote: Hello there, i would like to use a similar parallelmultisearcher of lucene in solr to search multiple indexes, does it exist? Nope. Instead, there is work progressing on more flexible distribution query distribution: https://issues.apache.org

Multicore request

2008-01-08 Thread Jae Joo
I have built two cores - core0 and core1. each core has different set of index. I can access core0 and core 1 by http://localhost:8983/solr/core[01]/admin/form.jsp. Is there any way to access multiple indexes with single query? Thanks, Jae

Re: Another text I cannot get into SOLR with csv

2008-01-08 Thread Michael Lackhoff
On 08.01.2008 19:09 Yonik Seeley wrote: There is no shorter way, but if you update to the latest solr-dev (changes I checked in today), the default will be no encapsulation for split fields. Many thanks, also for your patience! Do you think the dev-version is ready for production? -Michael

Solr Multicore

2008-01-08 Thread Jae Joo
I have set multicores - core0 and core1, core0 is default. Once I update the index by http://localhost:8983/solr/update, it updates core1 not core0. Also, I tried to set the deault core using SETASDEFAULT, but it is "unknown action command". Can any one help me? Thanks, Jae

Re: Another text I cannot get into SOLR with csv

2008-01-08 Thread Yonik Seeley
On Jan 8, 2008 12:59 PM, Michael Lackhoff <[EMAIL PROTECTED]> wrote: > On 08.01.2008 16:55 Yonik Seeley wrote: > > >> - A literal encapsulator should be possible to add by doubling > >>it ' => '' but this gives the same error > > > > I think you would have to tripple it (the first is the encaps

Re: Another text I cannot get into SOLR with csv

2008-01-08 Thread Michael Lackhoff
On 08.01.2008 16:55 Yonik Seeley wrote: - A literal encapsulator should be possible to add by doubling it ' => '' but this gives the same error I think you would have to tripple it (the first is the encapsulator). Regardless, don't use encapsulation on the split fields unless you have to.

Fuzziness with DisMaxRequestHandler

2008-01-08 Thread solruser2
Is there any way to make the DisMaxRequestHandler a bit more forgiving with user queries, I'm only getting results when the user enters a close to perfect match. I'd like to allow near matches if possible, but I'm not sure how to add something like this when special query syntax isn't allowed. --

Re: Status 500 - ParseError at [row,col]:[1,1] Message Content is not allowed in Prolog

2008-01-08 Thread Kirk Beers
Brian Whitman wrote: I found that on the Wiki at http://wiki.apache.org/solr/UpdateXmlMessages#head-3dfbf90fbc69f168ab6f3389daf68571ad614bef under the title: Updating a Data Record via curl. I removed it and now have the following: 0name="QTime">122This response format is experimental.

Re: Status 500 - ParseError at [row,col]:[1,1] Message Content is not allowed in Prolog

2008-01-08 Thread Ryan McKinley
try it without and see how you do... I just updated the wiki Kirk Beers wrote: Brian Whitman wrote: On Jan 8, 2008, at 10:58 AM, Kirk Beers wrote: curl http://localhost:8080/solr/update -H "Content-Type:text/xml" --data-binary '/overwritePending="true">0001name="title">TitleIt was the be

Re: Status 500 - ParseError at [row,col]:[1,1] Message Content is not allowed in Prolog

2008-01-08 Thread Brian Whitman
I found that on the Wiki at http://wiki.apache.org/solr/UpdateXmlMessages#head-3dfbf90fbc69f168ab6f3389daf68571ad614bef under the title: Updating a Data Record via curl. I removed it and now have the following: 0name="QTime">122This response format is experimental. It is likely to cha

Re: Status 500 - ParseError at [row,col]:[1,1] Message Content is not allowed in Prolog

2008-01-08 Thread Kirk Beers
Brian Whitman wrote: On Jan 8, 2008, at 10:58 AM, Kirk Beers wrote: curl http://localhost:8080/solr/update -H "Content-Type:text/xml" --data-binary '/overwritePending="true">0001name="title">TitleIt was the best of times it was the worst of times blah blah blah' Why the / after the first

Re: Luke response format explained

2008-01-08 Thread Ryan McKinley
numTerms counts the unique terms (field:value pair) in the index. The source is: TermEnum te = reader.terms(); int numTerms = 0; while (te.next()) { numTerms++; } indexInfo.add("numTerms", numTerms ); "distinct" is a similar calculation, but fo

Re: Status 500 - ParseError at [row,col]:[1,1] Message Content is not allowed in Prolog

2008-01-08 Thread Brian Whitman
On Jan 8, 2008, at 10:58 AM, Kirk Beers wrote: curl http://localhost:8080/solr/update -H "Content-Type:text/xml" -- data-binary '/overwritePending="true">0001field>TitleIt was the best of times it was the worst of times blah blah blahdoc>' Why the / after the first single quote?

Status 500 - ParseError at [row,col]:[1,1] Message Content is not allowed in Prolog

2008-01-08 Thread Kirk Beers
Hi, I am new to Lucene and even newer to Solr and I am attempting to setup Solr to fit our needs. I am using the Jan 7, 2008 build. I have updated the schema.xml fields to reflect our preferences: required="true"/> required="true"/> required="false"/> required="false"/> re

Re: Another text I cannot get into SOLR with csv

2008-01-08 Thread Yonik Seeley
On Jan 8, 2008 10:32 AM, Michael Lackhoff <[EMAIL PROTECTED]> wrote: > On 08.01.2008 16:11 Yonik Seeley wrote: > > > Ahh, wait, it looks a single quote as the encapsulator for split field > > values by default. > > Try adding f.PUBLPLACE.encapsulator=%00 > > to disable the encapsulation. > > Hmm. Y

Re: Another text I cannot get into SOLR with csv

2008-01-08 Thread Michael Lackhoff
On 08.01.2008 16:11 Yonik Seeley wrote: Ahh, wait, it looks a single quote as the encapsulator for split field values by default. Try adding f.PUBLPLACE.encapsulator=%00 to disable the encapsulation. Hmm. Yes, this works but: - I didn't find anything about it in the docs (wiki). On the contrar

Re: Another text I cannot get into SOLR with csv

2008-01-08 Thread Yonik Seeley
On Jan 8, 2008 9:58 AM, Yonik Seeley <[EMAIL PROTECTED]> wrote: > On Jan 8, 2008 3:07 AM, Michael Lackhoff <[EMAIL PROTECTED]> wrote: > > After a long weekend I could do a deeper look into this one and it looks > > as if the problem has to do with splitting. > > > > > This one works for me fine. >

Re: Tomcat and Solr - out of memory

2008-01-08 Thread Stuart Sierra
On Jan 7, 2008 12:08 PM, Jae Joo <[EMAIL PROTECTED]> wrote: > What happens if Solr application hit the max. memory of heap assigned? > > Will be die or just slow down? In my (limited) experience (with Jetty), Solr will not die but it will return HTTP 500 errors on all requests until it is restarte

Re: Another text I cannot get into SOLR with csv

2008-01-08 Thread Yonik Seeley
On Jan 8, 2008 3:07 AM, Michael Lackhoff <[EMAIL PROTECTED]> wrote: > After a long weekend I could do a deeper look into this one and it looks > as if the problem has to do with splitting. > > > This one works for me fine. > > > > $ cat t2.csv > > id,name > > 12345,"'s-Gravenhage" > > 12345,'s-Grav

Re: Performance - FunctionQuery

2008-01-08 Thread Yonik Seeley
On Jan 8, 2008 9:30 AM, Ryan McKinley <[EMAIL PROTECTED]> wrote: > whats the function? how many matching documents? Right... and remember that the first time you use a field in a function query has the same cost as sorting on a field (a cache entry needs to be generated). Try further queries or

Re: Performance - FunctionQuery

2008-01-08 Thread Ryan McKinley
whats the function? how many matching documents? s d wrote: Adding a FunctionQuery made the query response time slower by ~300ms, adding a 2ndFunctionQuery added another ~300ms so overall i got over 0.5sec for a response time (slow).Is this expected or am i doing something wrong ? Thx

Re: WELCOME to solr-user@lucene.apache.org

2008-01-08 Thread Ryan McKinley
currently two approaches: http://blog.foofactory.fi/2007/02/online-indexing-integrating-nutch-with.html and: https://issues.apache.org/jira/browse/NUTCH-442 I have had experience with the former... you may have more luck on the nutch-user list for help ryan Jan Buelens wrote: Hi, We are c

parallelmultisearcher in solr

2008-01-08 Thread Heba Farouk
Hello there, i would like to use a similar parallelmultisearcher of lucene in solr to search multiple indexes, does it exist? Thanks in advance

Luke response format explained

2008-01-08 Thread Robert Young
Hi, In the response for the LuceRequestHandler what do the different fields mean? Some of them are obvious but some are less so. Is numTerms the total number of terms or the total number of unique terms (ie the dictionary), if it is the former how can I find the size of the dictionary across all f

Re: How do i normalize diff information (different type of documents) in the index ?

2008-01-08 Thread s d
Got it ( http://wiki.apache.org/solr/DisMaxRequestHandler#head-cfa8058622bce1baaf98607b197dc906a7f09590) . thx ! On Jan 8, 2008 12:11 AM, Chris Hostetter < [EMAIL PROTECTED]> wrote: > > : Isn't there a better way to take the information into account but still > : normalize? taking the score of on

Min-Score Filter

2008-01-08 Thread s d
Is there a way or a point in filtering all results bellow a certain score? e.g. exclude all results bellow score Y.Thanks

Performance - FunctionQuery

2008-01-08 Thread s d
Adding a FunctionQuery made the query response time slower by ~300ms, adding a 2ndFunctionQuery added another ~300ms so overall i got over 0.5sec for a response time (slow).Is this expected or am i doing something wrong ? Thx

Re: PhraseQuery and WildcardQuery

2008-01-08 Thread Chris Hostetter
: I've got this error when trying to search query like q=+myFiled:"some : value"* : : org.apache.solr.core.SolrException: Query parsing error: Cannot parse : '+myFiled:"some value"*': '*' or '?' not allowed as first character in : WildcardQuery ... ...this is where the subtleties of the L

Re: WELCOME to solr-user@lucene.apache.org

2008-01-08 Thread Jan Buelens
Hi, We are currently using Solr as search engine. To add an existing website to our search engine, we are investigating Nutch. Does anyone have more information / experience about an integration between Solr and Nutch? Thanks in advance ! Best regards, Jan

Re: SOLR ON TOMCAT SERVER

2008-01-08 Thread Chris Hostetter
Naveen: it doesn't look like you ever got a reply (probably because of the holidays) ... did you ever manage to get things working? looking over your question, the root problem seems to be related to the the XPath Factories that should come standard in tomcat. I have never seen this error bef

Re: How do i normalize diff information (different type of documents) in the index ?

2008-01-08 Thread Chris Hostetter
: Isn't there a better way to take the information into account but still : normalize? taking the score of only one of the fields doesn't sound like the : best thing to do (it's basically ignoring part of the information). note the word "mostly" in Mike's response about dismax ... the "tie" param

Re: Another text I cannot get into SOLR with csv

2008-01-08 Thread Michael Lackhoff
After a long weekend I could do a deeper look into this one and it looks as if the problem has to do with splitting. This one works for me fine. $ cat t2.csv id,name 12345,"'s-Gravenhage" 12345,'s-Gravenhage 12345,"""s-Gravenhage" $ curl http://localhost:8983/solr/update/csv?commit=true --dat