Hi,
Is the new replication feature based on HTTP requests between sites ?
If yes, then I guess it might be possible to configure an HTTP server
with mod_deflate so the data is compressed on the fly.
C.
Simon Collins wrote:
I have now optimized the index - down to 325mb, it compresses down
Hi,
I'm trying as well to stress test solr. I would love some advice to manage
it properly.
I'm using solr 1.3 and tomcat55.
Thanks a lot,
zqzuk wrote:
Hi, I am doing a stress testing of my solr application to see how many
concurrent requests it can handle and how long it takes. But I m
Hi,
try to firstly have a look at http://wiki.apache.org/solr/SolrCaching the
section on firstsearcher and warming. Search engines rely on caching, so
first searches will be slow. I think to be fair testing it is necessary to
warm up the search engine by sending most frequently used and/or most
Hi!
Out of curiosity: How would one implement search by example with Solr?
What I mean:
Say I have a result entry with these fields/attributes:
id: 1
title: blue big slow car
color: blue
size: 30
maxspeed: 100
make: buses inc.
What would I have to do in order to find similar items? Do a
Hi
Now I am using SOLR and two different type of data indexed and
searched.For ex:
1) JobRec
2) JobSel
I stored the data by specify type:JobRec similarly I specify
type:JobSel while indexing .If I want to retrieve the data i will get by
querying with type:job rec.
This is
Do keep in mind that compression is a CPU intensive process so it is a trade
off between CPU utilization and network bandwidth. I have see cases where
compressing the data before a network transfer ended up being slower than
without compression because the cost of compression and un-compression
Hello,
I'm doing some expirements with the morelikethis functionality using the
standard request handler to see if it also works with distributed search (I
saw that it will not yet work with the MoreLikeThis handler,
https://issues.apache.org/jira/browse/SOLR-788). As far as I can see, this
also
Why invent something when compression is standard in HTTP? --wunder
On 10/29/08 4:35 AM, Noble Paul നോബിള് नोब्ळ् [EMAIL PROTECTED]
wrote:
open a JIRA issue. we will use a gzip on both ends of the pipe . On
the slave
side you can say
str name=ziptruestr
as an extra option to compress and
Awesome! Thanks for the pointer, I will check this out.
Marian
On Wed, Oct 29, 2008 at 1:52 PM, Jaco wrote:
Hi,
This can be done with 'more like this' functionality in Solr:
http://wiki.apache.org/solr/MoreLikeThis
: As far as our application goes, Commits and reads are done to the index
: during the normal business hours. However, we observed the max warmers
: error happening during a nightly job when the only operation is 4
: parallel threads commits data to index and Optimizes it finally. We
:
just a question about your httpstone's configuration ?
I would like to know how did you simulate several word search ... ??
Did you create a lot of different workers with lof of different word search
?
Thanks,
zqzuk wrote:
Hi,
try to firstly have a look at
I think you may be right i've opened SOLR-830
: We may have identified the root cause but wanted to run it by the community.
: We figure there is a bug in the snappuller shell script, line 181:
-Hoss
Hi,
I'm doing the following query:
q=text:abc AND type:typeA
And I ask to return highlighting (query.setHighlight(true);). The search
term for field type (typeA) is also highlighted in the text field.
Anyway to avoid this ?
Thanks
Christophe
: 1) solr-core artifact contains org.apache.solr.client.solrj packages, and at
: the same time, the solr-core artifact depends on the solr-solrj artifact.
what you are seeing isn't specific to the maven jars, that's the way it is
in hte standard release.
i believe the inclusion of solrj code
On Wed, Oct 29, 2008 at 9:11 PM, Chris Hostetter
[EMAIL PROTECTED]wrote:
i believe the inclusion of solrj code in the core jar is intentional, the
core jar is intended (as i understand it) to encapsulate everything needed
to run Solr (and because of the built in distributed search features,
Depends on your use cases. Having things in one index will generally
make things easier in the long run, and generally shouldn't be a
bottleneck. However, if the two types will be treated very differently
it may make sense to have two cores - say one type is not changed very
often, while the
: On the main lucene web page: http://lucene.apache.org/index.html
: There is a list of news items spanning all the lucene subprojects. Does
FYI: that news section is just a manually maintained list of items as
regular forrest content (forrest is the tool used to generate the site and
build
christophe wrote:
Hi,
I'm doing the following query:
q=text:abc AND type:typeA
And I ask to return highlighting (query.setHighlight(true);). The
search term for field type (typeA) is also highlighted in the text
field.
Anyway to avoid this ?
Thanks
Christophe
I havn't used solrj really,
: I'm not sure if there's any reason for solr-core to declare a maven
: dependency on solr-solrj.
: When creating the POMs, I had (incorrectly) assumed that the core jar does
: not contain SolrJ classes, hence the dependency.
I consider it a totally justifiable assumption. the current
we are not doing anything non-standard
GZipInputStream/GZipOutputStream are standards. But asking users to
setup an extra apache is not fair if we can manage it with say 5 lines
of code
On Wed, Oct 29, 2008 at 7:44 PM, Walter Underwood
[EMAIL PROTECTED] wrote:
Why invent something when
You propose to do compressed transfers over HTTP ignoring the standard
support for compressed transfers in HTTP. Programming that with a
library doesn't make it standard.
In Ultraseek, we implemented index synchronization over HTTP with
compression. It wasn't that hard.
I doubt that compression
: I want to partition my index based on category information. Also, while
: indexing I want to store particular category data to corresponding index
: partition. In the same way I need to search for category information on
: corresponding partition.. I found some information on wiki link
:
I am getting this error quite frequently on my Solr installation:
SEVERE: org.apache.solr.common.SolrException: Error opening new
searcher. exceeded limit of maxWarmingSearchers=8, try again later.
I've done some googling but the common explanation of it being related
to autocommit doesn't
Have you looked at how long your warm up is taking?
If it's taking longer to warm up a searcher then it does for you to do
an update, you will be behind the curve and eventually run into this no
matter how big that number.
-Original Message-
From: news [mailto:[EMAIL PROTECTED] On
I'm having the same issue.. have you had any progress with this?
--
View this message in context:
http://www.nabble.com/Error-in-Integrating-JBoss-4.2-and-Solr-1.3.0%3A-tp20202032p20234054.html
Sent from the Solr - User mailing list archive at Nabble.com.
I'm doing the following query:
q=text:abc AND type:typeA
And I ask to return highlighting (query.setHighlight(true);). The search
term for field type (typeA) is also highlighted in the text field.
Anyway to avoid this ?
Use setHighlightRequireFieldMatch(true) on the query object [1].
Lars
I was just looking at Mark Miller's Qsol parser for Lucene (
http://www.myhardshadow.com/qsol.php), and my users would really like to
have a similar ability to combine proximity and boolean search in arbitrary,
nested ways. The simplest use case I'm interested in is phrase proximity,
where you say
Hi -- using solr 1.3 -- roughly 11M docs on a 64 gig 8 core machine.
Fairly simple schema -- no large text fields, standard request
handler. 4 small facet fields.
The index is an event log -- a primary search/retrieval requirement is
date range queries.
A simple query without a date
Do you need to search down to the minutes and seconds level? If searching by
date provides sufficient granularity, for instance, you can normalize all
the time-of-day portions of the timestamps to midnight while indexing. (So
index any event happening on Oct 01, 2008 as 2008-10-01T00:00:00Z.) That
Well, no - we don't care so much about the seconds, but hours
minutes are indeed crucial.
---
Alok K. Dhir
Symplicity Corporation
www.symplicity.com
(703) 351-0200 x 8080
[EMAIL PROTECTED]
On Oct 29, 2008, at 4:41 PM, Chris Harris wrote:
Do you need to search down to the minutes and seconds
Feak, Todd wrote:
Have you looked at how long your warm up is taking?
If it's taking longer to warm up a searcher then it does for you to do
an update, you will be behind the curve and eventually run into this no
matter how big that number.
Most of them say warmupTime=0. It ranges from 0 to
It strikes me that removing just the seconds could very well reduce
overhead to 1/60 of original. 30 second query turns into 500ms query.
Just a swag though.
-Todd
-Original Message-
From: Alok Dhir [mailto:[EMAIL PROTECTED]
Sent: Wednesday, October 29, 2008 1:48 PM
To:
My understanding of Noble's comment (and i could be wrong, i'm reading
between the lines) is that if you specify the new setting he's suggesting
when initializing the replication handler on the slave, then the slave
should start using an Accept-Encoding: gzip header when querying the
master,
Chris Harris wrote:
I was just looking at Mark Miller's Qsol parser for Lucene (
http://www.myhardshadow.com/qsol.php), and my users would really like to
have a similar ability to combine proximity and boolean search in arbitrary,
nested ways. The simplest use case I'm interested in is phrase
: The doc of HashDocSet says t can be a better choice if there are few
: docs in the set . What does 'few' means in this context ?
it's relative the total size of your index. if you have a million docs,
but you are dealing with DocSets that are only going to contain 10 docs,
then both the
: Tomcat is using about 98mb memory, mysql is about 500mb. Tomcat
: completely freezes up - can't do anything other than restart the
: service.
a thread dump from the jvm running tomcat would probably be helpful in
figuring out what's going on
: timing out well before getting to the commit. As
I've also seen the suggestion (more from a pure Lucene perspective) of
breaking
apart your dates. Remember that the time/space issues are due to the number
of
terms. So it's possible (although I haven't tried it) to, index many fewer
distinct
terms. e.g. break your dates into some number of
I saw a similar subject posted earlier. This is not a continuation of that
thread, but the problem is similar. I have a large, fast, dedicated machine,
that despite boosting various parameters in solrconfig.xml (attached) and in
the JVM, utilizes at most 10% of the cpu while importing: (from
On Wed, Oct 29, 2008 at 9:48 PM, Barnett, Jeffrey
[EMAIL PROTECTED] wrote:
Reported import rates start a 70 docs per second, and decrease as more
records are added.
It might just be segment merges (that takes more time as segments grow in size).
From the solrconfig.xml I see you have
Hoss,
You are partially right. Instead of the HTTP header , we use a request
parameter. (RequestHandlers cannot read HTP headers). If the param is
present it wraps the response in an zip outputstream. It is configured
in the slave because Every slave may not want compression. . Slaves
which are
: You are partially right. Instead of the HTTP header , we use a request
: parameter. (RequestHandlers cannot read HTP headers). If the param is
hmmm, i'm with walter: we shouldn't invent new mechanisms for
clients to request compression over HTTP from servers.
replicatoin is both special
I thought it was turned off already. ( Lucene vs Solr ?) Where do I make this
change?
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley
Sent: Wednesday, October 29, 2008 11:28 PM
To: solr-user@lucene.apache.org
Subject: Re: where's the
On Thu, Oct 30, 2008 at 2:46 AM, Jon Drukman [EMAIL PROTECTED] wrote:
Most of them say warmupTime=0. It ranges from 0 to 37. I hope that is
msec and not seconds!!
Correct, that is in milliseconds.
--
Regards,
Shalin Shekhar Mangar.
43 matches
Mail list logo