Due to some emergency maintenance I needed to run delete on a large
number of documents in a 200Gb index.
The problem is that it's taking an inordinately long amount of time (2+
hours so far and counting) and is steadily eating up disk space -
presumably up to 2x index size which is getting
Since you r using expand=true , so, every time a matching synonym entry is
found the analyzer is expanding the term with all synonyms set in the index.
This may cause the index to grow in size.
--
View this message in context:
SOLR1.3+ logs only the fresh queries in the logs. If you re-run the same
query then it is served from cache, and not printed on the logs(unless
cache(s) are not warmed or sercher is reopened).
So, Otis's proposal would definitely help in doing some benchmarks
baselining the search :)
--
View
Hi,
Using Solr 3.1 I'm getting errors when trying to sort on fields containing
dashes in the name...
So that's true stay away from dashes if you can.
Marc.
On Sun, Jun 5, 2011 at 3:46 PM, Erick Erickson erickerick...@gmail.comwrote:
I'd stay away from dashes too. It's too easy for the query
The Apache Software Foundation (ASF)'s Travel Assistance Committee (TAC) is
now accepting applications for ApacheCon North America 2011, 7-11 November
in Vancouver BC, Canada.
The TAC is seeking individuals from the Apache community at-large --users,
developers, educators, students, Committers,
Is there a way where in I can apply all those file to same
tag with some
delimiter separated?
like this:
filter
class=solr.SynonymFilterFactory
synonyms=BODYTaxonomy.txt
, ClinicalObs.txt, MicTaxo.txt, SPTaxo.txt
ignoreCase=true
expand=true/
Yes, you can perfectly feed
You can drop your mergeFactor to 2 and then run expungeDeletes?
This will make the operation take longer but (assuming you have 3
segments in your index) should use less transient disk space.
You could also make a custom merge policy, that expunges one segment
at a time (even slower but even
What does call synonym methods in Java mean? That is, what are
you trying to accomplish and from where?
Best
Erick
On Sun, Jun 5, 2011 at 9:48 PM, deniz denizdurmu...@gmail.com wrote:
well i have changed it into text... but still confused about how to use
synonyms...
and also I want to know
1. About the commit strategy, all the ExtractingRequestHandler (request
handler that uses Tika to extract content from the input file) will do is
extract the content of your file and add it to a SolrInputDocument. The
commit strategy should not change because of this, compared to other
documents
Yep, but note the discussion. It's not at all clear that Solr is the
place to deal with an
unreliable network, and it sounds like that's the root of your issue.
It doesn't look like anyone's hot to change Solr's behavior here, and
it's arguable that
Solr isn't the place to compensate for an
Have you considered query-time expansion rather than index-time expansion?
In general this will lead to more complex queries, but smaller indexes.
Take a look at the analysis page available from the admin page to see exactly
what happens.
What is the high-legel problem you're trying to solve?
Hi folk,
I am using solr to index around 100mn docs.
now I am planning to move to cluster based solr, so that I can scale the
indexing and searching process.
since solrCloud is in development stage, I am trying to index in shard
based environment using zooKeeper.
I followed the steps from
Thanks once again for the helpful suggestions!
Regarding the selection of facet fields, I think publishDate (which is actually
just a year) and callnumber-first (which is actually a very broad, high-level
category) are okay. authorStr is an interesting problem: it's definitely a
useful facet
So i am trying to setup an auto-scaling search system of ec2 solr-slaves
which scale up as number of requests increase and vice versa
Here is what I have
1. A solr master and underlying slaves(scalable). And an elastic load
balancer to distribute the load.
2. The ec2-auto-scaling setup fires nodes
For now i have a collection with:
id (int)
price (double) multivalue
brand_id (int)
filters (string) multivalue
I need to get available brand_id, filters, price values and list of
id's for current query. For example now i'm doing queries with
facet.field=brand_id/filters/price:
1) to
Hi,
I have configured my master slave server and everything seems to be running
fine, the replication completed the firsttime it ran. But everytime I go the
the
replication link in the admin panel after restarting the server or server
startup I notice the replication starting from scratch or
On 6/5/2011 3:36 AM, occurred wrote:
Ok, thx for the answer.
My idea now is to store both field-values in one field and pre- and
suffix the values from field2 with something very special.
Also then the synonyms have to have the special pre- and suffixes.
What are you actually trying to do?
On 5 June 2011 14:42, Erick Erickson erickerick...@gmail.com wrote:
See: http://wiki.apache.org/solr/SchemaXml
By adding ' multiValued=true ' to the field, you can add
the same field multiple times in a doc, something like
add
doc
field name=mvvalue1/field
field name=mvvalue2/field
Polling interval was in reference to slaves in a multi-machine
master/slave setup. so probably not
a concern just at present.
Warmup time of 0 is not particularly normal, I'm not quite sure what's
going on there but you may
want to look at firstsearcher, newsearcher and autowarm parameters in
Dear list,
i got a question regarding my address search:
I am searching for address data. If there is one address field not definied
(in this case the housenumber) for the specific query (e.g. city = a, street
= b, housenumber=14), I am getting no result. For every street there exists
at least
Hi all,
Is it possible to change the query parser operator for a specific field
without having to explicitly type it in the search field?
For example, I'd like to use:
http://localhost:8983/solr/search/?q=field1:word token field2:parser syntax
instead of
Hi Richard, are you setting the value to 0 at index time when the
housenumber is not present? If you are, this would be as simple as modify
the query at the application layer to city = a, street= b, housenumber=(14
OR 0).
If you are not doing anything at index time with the not present
Hi Tomas,
1. Regarding SolrInputDocument,
We are not using java client, rather we are using php solr, wrapping content
in SolrInputDocument, i am not sure how to do in PHP client? In this case,
we need tika related jars to avail the metadata such as content .. we
certainly don't want to handle
Thanks Martijn. I pulled your patch and it looks like what I was looking
for. The original FacetField class has a getAsFilterQuery method which
returns the criteria to use as an fq parameter, I have logic which does this
in my class which works, any chance of getting something like this added to
Small error, shouldn't be using this.start but should instead be using
Double.parseDouble(this.getValue());
and
sdf.parse(count.getValue());
respectfully.
On Mon, Jun 6, 2011 at 1:16 PM, Jamie Johnson jej2...@gmail.com wrote:
Thanks Martijn. I pulled your patch and it looks like what I was
The HTTP interface (http://wiki.apache.org/solr/SolrReplication#HTTP_API)
can be used to control lots of parts of replication.
As to warmups, I don't know of a good way to test that. I don't know whether
getting the current status on the slave includes whether warmup is completed
or not. At
All of my cache autowarmCount settings are either 1 or 5.
maxWarmingSearchers is set to 2. I previously shared the contents of my
firstSearcher and newSearcher events -- just a queries array surrounded by a
standard-looking listener tag. The events are definitely firing -- in
Yes sadly .. I too have not much clue about AWS.
The SolrReplication API doesnt give me what i want exactly.. For the time
being i have hacked my way into the amazon image bootstrapping the
replication check in a shell script ((curl awk) very dirty way) . Once the
check suceeds I enable the
See Tagging and excluding Filters section
*
http://wiki.apache.org/solr/SimpleFacetParameters#Tagging_and_excluding_Filters
2011/6/6 Denis Kuzmenok forward...@ukr.net:
For now i have a collection with:
id (int)
price (double) multivalue
brand_id (int)
filters (string) multivalue
I need
On Mon, Jun 6, 2011 at 1:47 PM, Naveen Gupta nkgiit...@gmail.com wrote:
Hi Tomas,
1. Regarding SolrInputDocument,
We are not using java client, rather we are using php solr, wrapping
content
in SolrInputDocument, i am not sure how to do in PHP client? In this case,
we need tika related
#Everybody# (including me) who has any RDBMS background
doesn't want to flatten data, but that's usually the way to go in
Solr.
Part of whether it's a good idea or not depends on how big the index
gets, and unfortunately the only way to figure that out is to test.
But that's the first approach
If you're seeing results, things must be OK. It's a little strange,
though, I'm seeing
warmup times of 1 on the trivial reload of the example documents.
But I wouldn't worry too much here. Those are pretty high autowarm counts, you
might have room to reduce them but absent long autowarm times
Thanks
On 6 June 2011 19:32, Erick Erickson erickerick...@gmail.com wrote:
#Everybody# (including me) who has any RDBMS background
doesn't want to flatten data, but that's usually the way to go in
Solr.
Part of whether it's a good idea or not depends on how big the index
gets, and
I do think that Solr would be better served if there was a *best practice
section *of the site.
Looking at the majority of emails to this list they resolve around how do I
do X?.
Seems like tutorials with real world examples would serve Solr no end of
good.
I still do not have an example of the
This is a start, for many common best practices:
http://wiki.apache.org/solr/SolrRelevancyFAQ
Many of the questions in there have an answer that involves
de-normalizing. As an example. It may be that even if your specific
problem isn't in there, I myself anyway found reading through there
Hello,
I've seen that through boosting it's possible to influence the scoring
function, but what I would like is sort of a boolean property. In some way
it's to search only the indexed documents by that keyword (or the
intersection/union) rather than the whole set.
Is this supported in any way?
I'm continuing to work on tuning my Solr server, and now I'm noticing that my
biggest bottleneck is the SpellCheckComponent. This is eating multiple seconds
on most first-time searches, and still taking around 500ms even on cached
searches. Here is my configuration:
searchComponent
I'm having a hard time understanding what you're driving at, can
you provide some examples? This *looks* like filter queries,
but I think you already know about those...
Best
Erick
On Mon, Jun 6, 2011 at 4:00 PM, Gabriele Kahlout
gabri...@mysimpatico.com wrote:
Hello,
I've seen that through
Hmmm, how are you configuring you spell checker? The first-time slowdown
is probably due to cache warming, but subsequent 500 ms slowdowns
seem odd. How many unique terms are there in your spellecheck index?
It'd probably be best if you showed us your fieldtype and field definition...
Best
Erick
well i was trying to say that; i have changed the config files for synonyms
and so on but nothing happens so i thought i needed to do something in java
code too... i was trying to ask about that...
-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context:
Do you mean the replication happens everytime you restart the server ?
If so, you would need to modify the events you want the replication to happen.
Check for the replicateAfter tag and remove the startup option, if you
don't need it.
requestHandler name=/replication
Instead of integrating zookeeper, you could create shards over multiple
machines and specify the shards while you are querying solr.
Eg: http://localhost:8983/solr/select?shards=*Machine:Port/Solr Path,*
*Machine:Port/Solr Path*indent=trueq=query
On Mon, Jun 6, 2011 at 5:59 PM, Mohammad Shariq
42 matches
Mail list logo