Hi,
Not a direct answer to your question, sorry, but since 4.6.0 is relatively
old and there have been a ton of changes around leader election, syncing,
replication, etc., I'd first jump to the latest Solr and then see if this
is still a problem.
Otis
--
Monitoring * Alerting * Anomaly Detection
: An oversight I think. If you create a patch, let me know and we can
: get it committed.
that definitely sounds bad we should certainly try to fix that before 5.0
comes out since it does have back-compat implictations...
https://issues.apache.org/jira/browse/SOLR-6718
...better to hav
: For sorting DocValues are the best option I think.
yep, definitely a good idea.
: > I have a usecase for using cursorpage and when tried to check this, I got
: > outOfMemory just for sorting by id.
what does the field/fieldType for your uniqueKey field look like?
If you aren't using DocValue
Sounds great - thanks all.
On Fri, Nov 7, 2014 at 2:06 PM, Erick Erickson
wrote:
> bq: I think ADD/DELETE replica APIs are best for within a SolrCloud
>
> I second this, if for no other reason than I'd expect this to get
> more attention than the underlying core admin API.
>
> That said, I belie
Right, that is why we batch.
When a batch of 1000 fails, drop to a batch size of 1 and start the batch over.
Then it can report the exact document with problems.
If you want to continue, go back to the bigger batch size. I usually fail the
whole batch on one error.
wunder
Walter Underwood
wun.
I'm seeing 9X throughput with 1000 docs/batch vs 1 doc/batch, with a single
thread, so it's certainly worth it.
Thanks,
Peter
On Fri, Nov 7, 2014 at 2:18 PM, Erick Erickson
wrote:
> And Walter has also been around for a _long_ time ;)
>
> (sorry, couldn't resist)
>
> Erick
>
> On Fri, Nov
Hi Group,
I am working on implementing synonym for number like
10,2010
14,2014
2 digit number to get documents with four digit, I added the above lines in
synonym and everything works. But now I have to get for one direction,
I tried 10=>2010 but it is still gets the record belongs to 10 , if I
And Walter has also been around for a _long_ time ;)
(sorry, couldn't resist)
Erick
On Fri, Nov 7, 2014 at 11:12 AM, Walter Underwood wrote:
> Yes, I implemented exactly that fallback for Solr 1.2 at Netflix.
>
> It isn’t to hard if the code is structured for it; retry with a batch size of
How did you create the replica? Does the admin screen show it
attached to the proper shard?
What I'd do is set up my SolrCloud instance with (presumably)
a single node (leader) and insure my searches were working.
Then (and only then) use the Collection API ADDREPLICA
command. You should see your
Yes, I implemented exactly that fallback for Solr 1.2 at Netflix.
It isn’t to hard if the code is structured for it; retry with a batch size of 1.
wunder
On Nov 7, 2014, at 11:01 AM, Erick Erickson wrote:
> Yeah, this has been an ongoing issue for a _long_ time. Basically,
> you can't. So far,
bq: My question is if I can delete the field definition from the
schema.xml and do an optimize and the fields “magically” disappears
no. schema.xml is really just about regularizing how Lucene indexes
things. Lucene (where this would have to take place) doesn't have any
understanding of schema.xml
Another thing is to put in some autowarming, both
on the caches and firstSearcher and newSearcher.
These will pre-fill the caches before having new
searchers handle queries.
Don't go overboard here, for things like filterCache
try, say, autowarm counts of 16 and work your way up.
firstSearcher is
bq: I think ADD/DELETE replica APIs are best for within a SolrCloud
I second this, if for no other reason than I'd expect this to get
more attention than the underlying core admin API.
That said, I believe ADD/DELETE replica just makes use of the core
admin API under the covers, in which case you
Ian:
Thanks much for the writeup! It's always good to have real-world documentation!
Best,
Erick
On Fri, Nov 7, 2014 at 8:26 AM, Shawn Heisey wrote:
> On 11/7/2014 7:17 AM, Ian Rose wrote:
>> *tl;dr: *Routing updates to a random Solr node (and then letting it forward
>> the docs to where they n
Yeah, this has been an ongoing issue for a _long_ time. Basically,
you can't. So far, people have essentially written fallback logic to
index the docs of a failing packet one at a time and report it.
I'd really like better reporting back, but we haven't gotten there yet.
Best,
Erick
On Fri, Nov
Each of those data dirs is relative to the instance in question.
So if you're running on different machines, they're physically
separate even though named identically.
If you're running multiple nodes on a single machine a-la the
getting started docs, then each one is in it's own directory
(e.g.
Hi all,
I have a Solrcloud setup with a manually created collection with the index
obtained via other means than Solr (data come from Lucene).
I created a replica for the index and expected to see the data being copied
to the replica, which does not happen. In the Admin interface I see
something
On 7 November 2014 06:57, andrey prokopenko wrote:
> Full list of updateprocessors for 4.10 version can be found here:
> http://lucene.apache.org/solr/4_10_2/solr-core/org/apache/solr/update/processor/UpdateRequestProcessorFactory.html
Actually, that's just the top level of the inheritance hiera
The usual solution to that is to have dynamic fields with suffixes
indicating the types. So, your int fields are mapped to *_i, your date
fields to *_d.
Solr has schemaless support, but it is auto-detect for now. Creating
fields of particular types via API I think is in JIRA on the trunk for
5.0.
You encode that knowledge by using UpdateRequestProcessor. Clone the
field, replace it with true, map it to boolean. That way, you will pay
the price once per document indexed not (documentCount*) times per
request.
Regards,
Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resour
mizayah [miza...@gmail.com] wrote:
> What i see is that after restart jvm usage is realy low and raise slowly
> while system cpu ussage is high.
> My select queries are realy slow during that time.
The first searches tend to be slow while Solr fills internal caches and the OS
file cache is warmed
I think ADD/DELETE replica APIs are best for within a SolrCloud,
however if you need to move data across SolrClouds you will have to
resort to older APIs, which I didn't find good documentation of but
many references to. So I wrote up the instructions to do so here:
https://gist.github.com/ralph-t
On 11/7/2014 7:17 AM, Ian Rose wrote:
> *tl;dr: *Routing updates to a random Solr node (and then letting it forward
> the docs to where they need to go) is very expensive, more than I
> expected. Using a "smart" router that uses the cluster config to route
> documents directly to their shard resul
How are folks handling Solr exceptions that occur during batch indexing?
Solr stops parsing the docs stream when an error occurs (e.g. a doc with a
missing mandatory field), and stops indexing the batch. The bad document is
not identified, so it would be hard for the client to recover by skipping
o
Hi!
I'm fairly new to Solr. Is there a feature which enforces minimum term
matching for MLT Queries? More precisely, that is, a document will match
the MLT query if and only if at least x terms in the query are found in the
document, with x defined by the user. I could not find such a feature i
1. The new replica will not begin serving data until it's all there and
caught up. You can watch the replica status on the Cloud screen to see
it catch up; when it's green, you're done. If you're trying to automate
this, you're going to look for the replica that says "recovering" in
clusterstat
Can somebody help, please? I don't think my use-case is so uncommon.
I read this JIRA comment where the idea of using config sets as
templates was brought up
https://issues.apache.org/jira/browse/SOLR-4478?focusedCommentId=13711098&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-ta
One quick improvement can be to add add -Xm*s*6144m along with -Xmx6144m
this causes jvm to acquire all memory before hand and it would not waste
time in allocating more memory by requesting to kernel.
On restart, I am not sure but I guess solr does some syncing of indexes, so
it might be slow to
Howdy -
What is the current best practice for migrating shards to another machine?
I have heard suggestions that it is "add replica on new machine, wait for
it to catch up, delete original replica on old machine". But I wanted to
check to make sure...
And if that is the best method, two follow-u
We've moved from an asterisk based autosuggest functionality
("searchterm*") to a version using a special field called autosuggest,
filled via copyField directives. The field definition:
positionIncrementGap="100">
class=
Hi again, all -
Since several people were kind enough to jump in to offer advice on this
thread, I wanted to follow up in case anyone finds this useful in the
future.
*tl;dr: *Routing updates to a random Solr node (and then letting it forward
the docs to where they need to go) is very expensive,
Andrey, thank you for reply. Can you explain what do you mean "faceting query
with prefix"? I'm newer in the wolrd of Solr, can you give me example of
this query?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Term-count-in-multivalue-fields-tp4168138p4168167.html
Sent from
Hi Everyone,
I am quite a bit confused about managing configuration files with Zookeeper
for running Solr in cloud mode.
To be precise, I was able to upload the config files (schema.xml,
solrconfig.xml) into the Zookeeper and run Solrcloud.
What confuses me are properties like "data.dir", or re
For sorting DocValues are the best option I think.
—
/Yago Riveiro
On Fri, Nov 7, 2014 at 12:45 PM, adfel70 wrote:
> hi
> I have 11 machines in my cluster.
> each machine 128GB memory, 2 solr jvm's with 12gb heap each.
> cluster has 7 shard, 3 replicas.
> 1.5 billion docs total.
> most user que
hi
I have 11 machines in my cluster.
each machine 128GB memory, 2 solr jvm's with 12gb heap each.
cluster has 7 shard, 3 replicas.
1.5 billion docs total.
most user queries are pretty simple for now, sorting by date fields and
another field the has around 1000 unique values.
I have a usecase for u
Jack,
I have some data indexed that I don’t need any more. My question is if I can
delete the field definition from the schema.xml and do an optimize and the
fields “magically” disappears (and free space from disk).
Re-index data to delete fields is to expensive in collections with hundr
Could you clarify exactly what you are trying to do, like with an example? I
mean, how exactly are you determining what fields are "unwanted"? Are you
simply asking whether fields can be deleted from the index (and schema)?
-- Jack Krupansky
-Original Message-
From: yriveiro
Sent: Th
With omitTermFremFreqAndPositions set to true and multivalued field you
have no information how many times "zip" term or any other term has
appeared in the particular field. If the number of unique values is low,
you can try faceting query with prefix, but it will not give you accurate
results due
Take a look over here: https://wiki.apache.org/solr/UpdateRequestProcessor
Full list of updateprocessors for 4.10 version can be found here:
http://lucene.apache.org/solr/4_10_2/solr-core/org/apache/solr/update/processor/UpdateRequestProcessorFactory.html
You may pick up the most suitable for you
I want to sort by multivalued field like boolean values.
Something like that:
*sort exist(multivalued field name) desc*
Is it possible?
P.S. I know that sorting doesn't work for multivalued fields, but it work
for single boolean field...
--
View this message in context:
http://lucene.472066.n
How can I sort documents by first value in multivalued field? (without adding
copyField and without some changes in schema.xml)?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Sort-documents-by-first-value-in-multivalued-field-tp4168140.html
Sent from the Solr - User mailin
Andrey
Can you point me to any tutorial or howto where I can see how develop custom
UpdateProcessor class?
—
/Yago Riveiro
On Fri, Nov 7, 2014 at 10:39 AM, andrey prokopenko
wrote:
> With "out of the box" functionality, no. You have to develop custom
> UpdateProcessor and add it to the upda
I have multivalue field in my schema.xml:
I have indexed the following documents:
...
bmp
zip
bmp
...
...
zip
zip
bmp
...
How can I retrieve the count of the term "zip" (in this example it must be
3) from all multivalued field in this index without add
SolrCoud cluster heavily depends on data locality and high I/O, thus any
NFS with access to disk array over the network is multitude times slower
than direct I/O and must be avoided. Classical JBOD (just a bunch of disks)
config + memory mapped files ensure high performance.
On Wed, Nov 5, 2014 at
With "out of the box" functionality, no. You have to develop custom
UpdateProcessor and add it to the updateprocessors chain.
On Thu, Nov 6, 2014 at 3:19 PM, yriveiro wrote:
> Hi,
>
> It's possible remove store data of an index deleting the unwanted fields
> from schema.xml and after do an optim
I echo that. Atomic update is merely a decoration over the same
delete/insert pattern, where Solr processor inplace retrieves all the
stored fields of the the document, updates the field, then checks _version_
field prior to update and if it was correct, deletes, then insert new
version of the docu
Thanks for the reply. I've considered DataStax, but dropped it first due to
the commercial model they're using and second due to the integration model
they have chosen to integrate with Cassandra. In their docs (can be found
here:
http://www.datastax.com/docs/datastax_enterprise3.1/solutions/dse_se
Hello,
Im running few solr cores on one pretty good server. After some time i
discover that restarting solr makes queries last longer.
What i see is that after restart jvm usage is realy low and raise slowly
while system cpu ussage is high.
My select queries are realy slow during that time.
Af
48 matches
Mail list logo