Re: replication handler - compression

2008-10-29 Thread christophe
Hi, Is the new replication feature based on HTTP requests between sites ? If yes, then I guess it might be possible to configure an HTTP server with mod_deflate so the data is compressed on the fly. C. Simon Collins wrote: I have now optimized the index - down to 325mb, it compresses down

Re: how to improve concurrent request performance and stress testing

2008-10-29 Thread sunnyfr
Hi, I'm trying as well to stress test solr. I would love some advice to manage it properly. I'm using solr 1.3 and tomcat55. Thanks a lot, zqzuk wrote: Hi, I am doing a stress testing of my solr application to see how many concurrent requests it can handle and how long it takes. But I m

Re: how to improve concurrent request performance and stress testing

2008-10-29 Thread zqzuk
Hi, try to firstly have a look at http://wiki.apache.org/solr/SolrCaching the section on firstsearcher and warming. Search engines rely on caching, so first searches will be slow. I think to be fair testing it is necessary to warm up the search engine by sending most frequently used and/or most

Search by example

2008-10-29 Thread Marian Steinbach
Hi! Out of curiosity: How would one implement search by example with Solr? What I mean: Say I have a result entry with these fields/attributes: id: 1 title: blue big slow car color: blue size: 30 maxspeed: 100 make: buses inc. What would I have to do in order to find similar items? Do a

Multicore

2008-10-29 Thread sanraj25
Hi Now I am using SOLR and two different type of data indexed and searched.For ex: 1) JobRec 2) JobSel I stored the data by specify type:JobRec similarly I specify type:JobSel while indexing .If I want to retrieve the data i will get by querying with type:job rec. This is

Re: replication handler - compression

2008-10-29 Thread Bill Au
Do keep in mind that compression is a CPU intensive process so it is a trade off between CPU utilization and network bandwidth. I have see cases where compressing the data before a network transfer ended up being slower than without compression because the cost of compression and un-compression

Distributed search, standard request handler and more like this

2008-10-29 Thread Jaco
Hello, I'm doing some expirements with the morelikethis functionality using the standard request handler to see if it also works with distributed search (I saw that it will not yet work with the MoreLikeThis handler, https://issues.apache.org/jira/browse/SOLR-788). As far as I can see, this also

Re: replication handler - compression

2008-10-29 Thread Walter Underwood
Why invent something when compression is standard in HTTP? --wunder On 10/29/08 4:35 AM, Noble Paul നോബിള്‍ नोब्ळ् [EMAIL PROTECTED] wrote: open a JIRA issue. we will use a gzip on both ends of the pipe . On the slave side you can say str name=ziptruestr as an extra option to compress and

Re: Search by example

2008-10-29 Thread Marian Steinbach
Awesome! Thanks for the pointer, I will check this out. Marian On Wed, Oct 29, 2008 at 1:52 PM, Jaco wrote: Hi, This can be done with 'more like this' functionality in Solr: http://wiki.apache.org/solr/MoreLikeThis

RE: Best way to prevent max warmers error

2008-10-29 Thread Chris Hostetter
: As far as our application goes, Commits and reads are done to the index : during the normal business hours. However, we observed the max warmers : error happening during a nightly job when the only operation is 4 : parallel threads commits data to index and Optimizes it finally. We :

Re: how to improve concurrent request performance and stress testing

2008-10-29 Thread sunnyfr
just a question about your httpstone's configuration ? I would like to know how did you simulate several word search ... ?? Did you create a lot of different workers with lof of different word search ? Thanks, zqzuk wrote: Hi, try to firstly have a look at

Re: FileNotFoundException on slave after replication - script bug?

2008-10-29 Thread Chris Hostetter
I think you may be right i've opened SOLR-830 : We may have identified the root cause but wanted to run it by the community. : We figure there is a bug in the snappuller shell script, line 181: -Hoss

Highlighting and fields

2008-10-29 Thread christophe
Hi, I'm doing the following query: q=text:abc AND type:typeA And I ask to return highlighting (query.setHighlight(true);). The search term for field type (typeA) is also highlighted in the text field. Anyway to avoid this ? Thanks Christophe

Re: Solr 1.3 Maven Artifact Problem

2008-10-29 Thread Chris Hostetter
: 1) solr-core artifact contains org.apache.solr.client.solrj packages, and at : the same time, the solr-core artifact depends on the solr-solrj artifact. what you are seeing isn't specific to the maven jars, that's the way it is in hte standard release. i believe the inclusion of solrj code

Re: Solr 1.3 Maven Artifact Problem

2008-10-29 Thread Shalin Shekhar Mangar
On Wed, Oct 29, 2008 at 9:11 PM, Chris Hostetter [EMAIL PROTECTED]wrote: i believe the inclusion of solrj code in the core jar is intentional, the core jar is intended (as i understand it) to encapsulate everything needed to run Solr (and because of the built in distributed search features,

Re: Multicore

2008-10-29 Thread Mark Miller
Depends on your use cases. Having things in one index will generally make things easier in the long run, and generally shouldn't be a bottleneck. However, if the two types will be treated very differently it may make sense to have two cores - say one type is not changed very often, while the

Re: Lucene project subprojects news RSS feed?

2008-10-29 Thread Chris Hostetter
: On the main lucene web page: http://lucene.apache.org/index.html : There is a list of news items spanning all the lucene subprojects. Does FYI: that news section is just a manually maintained list of items as regular forrest content (forrest is the tool used to generate the site and build

Re: Highlighting and fields

2008-10-29 Thread Mark Miller
christophe wrote: Hi, I'm doing the following query: q=text:abc AND type:typeA And I ask to return highlighting (query.setHighlight(true);). The search term for field type (typeA) is also highlighted in the text field. Anyway to avoid this ? Thanks Christophe I havn't used solrj really,

Re: Solr 1.3 Maven Artifact Problem

2008-10-29 Thread Chris Hostetter
: I'm not sure if there's any reason for solr-core to declare a maven : dependency on solr-solrj. : When creating the POMs, I had (incorrectly) assumed that the core jar does : not contain SolrJ classes, hence the dependency. I consider it a totally justifiable assumption. the current

Re: replication handler - compression

2008-10-29 Thread Noble Paul നോബിള്‍ नोब्ळ्
we are not doing anything non-standard GZipInputStream/GZipOutputStream are standards. But asking users to setup an extra apache is not fair if we can manage it with say 5 lines of code On Wed, Oct 29, 2008 at 7:44 PM, Walter Underwood [EMAIL PROTECTED] wrote: Why invent something when

Re: replication handler - compression

2008-10-29 Thread Walter Underwood
You propose to do compressed transfers over HTTP ignoring the standard support for compressed transfers in HTTP. Programming that with a library doesn't make it standard. In Ultraseek, we implemented index synchronization over HTTP with compression. It wasn't that hard. I doubt that compression

Re: Index partitioning

2008-10-29 Thread Chris Hostetter
: I want to partition my index based on category information. Also, while : indexing I want to store particular category data to corresponding index : partition. In the same way I need to search for category information on : corresponding partition..   I found some information on wiki link :

exceeded limit of maxWarmingSearchers

2008-10-29 Thread Jon Drukman
I am getting this error quite frequently on my Solr installation: SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=8, try again later. I've done some googling but the common explanation of it being related to autocommit doesn't

RE: exceeded limit of maxWarmingSearchers

2008-10-29 Thread Feak, Todd
Have you looked at how long your warm up is taking? If it's taking longer to warm up a searcher then it does for you to do an update, you will be behind the curve and eventually run into this no matter how big that number. -Original Message- From: news [mailto:[EMAIL PROTECTED] On

Re: Error in Integrating JBoss 4.2 and Solr-1.3.0:

2008-10-29 Thread sbutalia
I'm having the same issue.. have you had any progress with this? -- View this message in context: http://www.nabble.com/Error-in-Integrating-JBoss-4.2-and-Solr-1.3.0%3A-tp20202032p20234054.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Highlighting and fields

2008-10-29 Thread Lars Kotthoff
I'm doing the following query: q=text:abc AND type:typeA And I ask to return highlighting (query.setHighlight(true);). The search term for field type (typeA) is also highlighted in the text field. Anyway to avoid this ? Use setHighlightRequireFieldMatch(true) on the query object [1]. Lars

Qsol (or surround or xmlqueryparser...) in Solr

2008-10-29 Thread Chris Harris
I was just looking at Mark Miller's Qsol parser for Lucene ( http://www.myhardshadow.com/qsol.php), and my users would really like to have a similar ability to combine proximity and boolean search in arbitrary, nested ways. The simplest use case I'm interested in is phrase proximity, where you say

date range query performance

2008-10-29 Thread Alok Dhir
Hi -- using solr 1.3 -- roughly 11M docs on a 64 gig 8 core machine. Fairly simple schema -- no large text fields, standard request handler. 4 small facet fields. The index is an event log -- a primary search/retrieval requirement is date range queries. A simple query without a date

Re: date range query performance

2008-10-29 Thread Chris Harris
Do you need to search down to the minutes and seconds level? If searching by date provides sufficient granularity, for instance, you can normalize all the time-of-day portions of the timestamps to midnight while indexing. (So index any event happening on Oct 01, 2008 as 2008-10-01T00:00:00Z.) That

Re: date range query performance

2008-10-29 Thread Alok Dhir
Well, no - we don't care so much about the seconds, but hours minutes are indeed crucial. --- Alok K. Dhir Symplicity Corporation www.symplicity.com (703) 351-0200 x 8080 [EMAIL PROTECTED] On Oct 29, 2008, at 4:41 PM, Chris Harris wrote: Do you need to search down to the minutes and seconds

Re: exceeded limit of maxWarmingSearchers

2008-10-29 Thread Jon Drukman
Feak, Todd wrote: Have you looked at how long your warm up is taking? If it's taking longer to warm up a searcher then it does for you to do an update, you will be behind the curve and eventually run into this no matter how big that number. Most of them say warmupTime=0. It ranges from 0 to

RE: date range query performance

2008-10-29 Thread Feak, Todd
It strikes me that removing just the seconds could very well reduce overhead to 1/60 of original. 30 second query turns into 500ms query. Just a swag though. -Todd -Original Message- From: Alok Dhir [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 29, 2008 1:48 PM To:

Re: replication handler - compression

2008-10-29 Thread Chris Hostetter
My understanding of Noble's comment (and i could be wrong, i'm reading between the lines) is that if you specify the new setting he's suggesting when initializing the replication handler on the slave, then the slave should start using an Accept-Encoding: gzip header when querying the master,

Re: Qsol (or surround or xmlqueryparser...) in Solr

2008-10-29 Thread Mark Miller
Chris Harris wrote: I was just looking at Mark Miller's Qsol parser for Lucene ( http://www.myhardshadow.com/qsol.php), and my users would really like to have a similar ability to combine proximity and boolean search in arbitrary, nested ways. The simplest use case I'm interested in is phrase

Re: DocSet: BitDocSet or HashDocSet ?

2008-10-29 Thread Chris Hostetter
: The doc of HashDocSet says t can be a better choice if there are few : docs in the set . What does 'few' means in this context ? it's relative the total size of your index. if you have a million docs, but you are dealing with DocSets that are only going to contain 10 docs, then both the

RE: timeouts

2008-10-29 Thread Chris Hostetter
: Tomcat is using about 98mb memory, mysql is about 500mb. Tomcat : completely freezes up - can't do anything other than restart the : service. a thread dump from the jvm running tomcat would probably be helpful in figuring out what's going on : timing out well before getting to the commit. As

Re: date range query performance

2008-10-29 Thread Erick Erickson
I've also seen the suggestion (more from a pure Lucene perspective) of breaking apart your dates. Remember that the time/space issues are due to the number of terms. So it's possible (although I haven't tried it) to, index many fewer distinct terms. e.g. break your dates into some number of

where's the bottleneck

2008-10-29 Thread Barnett, Jeffrey
I saw a similar subject posted earlier. This is not a continuation of that thread, but the problem is similar. I have a large, fast, dedicated machine, that despite boosting various parameters in solrconfig.xml (attached) and in the JVM, utilizes at most 10% of the cpu while importing: (from

Re: where's the bottleneck

2008-10-29 Thread Yonik Seeley
On Wed, Oct 29, 2008 at 9:48 PM, Barnett, Jeffrey [EMAIL PROTECTED] wrote: Reported import rates start a 70 docs per second, and decrease as more records are added. It might just be segment merges (that takes more time as segments grow in size). From the solrconfig.xml I see you have

Re: replication handler - compression

2008-10-29 Thread Noble Paul നോബിള്‍ नोब्ळ्
Hoss, You are partially right. Instead of the HTTP header , we use a request parameter. (RequestHandlers cannot read HTP headers). If the param is present it wraps the response in an zip outputstream. It is configured in the slave because Every slave may not want compression. . Slaves which are

Re: replication handler - compression

2008-10-29 Thread Chris Hostetter
: You are partially right. Instead of the HTTP header , we use a request : parameter. (RequestHandlers cannot read HTP headers). If the param is hmmm, i'm with walter: we shouldn't invent new mechanisms for clients to request compression over HTTP from servers. replicatoin is both special

RE: where's the bottleneck

2008-10-29 Thread Barnett, Jeffrey
I thought it was turned off already. ( Lucene vs Solr ?) Where do I make this change? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Wednesday, October 29, 2008 11:28 PM To: solr-user@lucene.apache.org Subject: Re: where's the

Re: exceeded limit of maxWarmingSearchers

2008-10-29 Thread Shalin Shekhar Mangar
On Thu, Oct 30, 2008 at 2:46 AM, Jon Drukman [EMAIL PROTECTED] wrote: Most of them say warmupTime=0. It ranges from 0 to 37. I hope that is msec and not seconds!! Correct, that is in milliseconds. -- Regards, Shalin Shekhar Mangar.