Re: recovery process - node with stale data elected leader

2014-11-07 Thread Otis Gospodnetic
Hi, Not a direct answer to your question, sorry, but since 4.6.0 is relatively old and there have been a ton of changes around leader election, syncing, replication, etc., I'd first jump to the latest Solr and then see if this is still a problem. Otis -- Monitoring * Alerting * Anomaly Detection

Re: solr.xml coreRootDirectory relative to solr home

2014-11-07 Thread Chris Hostetter
: An oversight I think. If you create a patch, let me know and we can : get it committed. that definitely sounds bad we should certainly try to fix that before 5.0 comes out since it does have back-compat implictations... https://issues.apache.org/jira/browse/SOLR-6718 ...better to hav

Re: out of memory when trying to sort by id in a 1.5 billion index

2014-11-07 Thread Chris Hostetter
: For sorting DocValues are the best option I think. yep, definitely a good idea. : > I have a usecase for using cursorpage and when tried to check this, I got : > outOfMemory just for sorting by id. what does the field/fieldType for your uniqueKey field look like? If you aren't using DocValue

Re: Migrating shards

2014-11-07 Thread Ian Rose
Sounds great - thanks all. On Fri, Nov 7, 2014 at 2:06 PM, Erick Erickson wrote: > bq: I think ADD/DELETE replica APIs are best for within a SolrCloud > > I second this, if for no other reason than I'd expect this to get > more attention than the underlying core admin API. > > That said, I belie

Re: Solr exceptions during batch indexing

2014-11-07 Thread Walter Underwood
Right, that is why we batch. When a batch of 1000 fails, drop to a batch size of 1 and start the batch over. Then it can report the exact document with problems. If you want to continue, go back to the bigger batch size. I usually fail the whole batch on one error. wunder Walter Underwood wun.

Re: Solr exceptions during batch indexing

2014-11-07 Thread Peter Keegan
I'm seeing 9X throughput with 1000 docs/batch vs 1 doc/batch, with a single thread, so it's certainly worth it. Thanks, Peter On Fri, Nov 7, 2014 at 2:18 PM, Erick Erickson wrote: > And Walter has also been around for a _long_ time ;) > > (sorry, couldn't resist) > > Erick > > On Fri, Nov

Synonymn for Numbers

2014-11-07 Thread EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)
Hi Group, I am working on implementing synonym for number like 10,2010 14,2014 2 digit number to get documents with four digit, I added the above lines in synonym and everything works. But now I have to get for one direction, I tried 10=>2010 but it is still gets the record belongs to 10 , if I

Re: Solr exceptions during batch indexing

2014-11-07 Thread Erick Erickson
And Walter has also been around for a _long_ time ;) (sorry, couldn't resist) Erick On Fri, Nov 7, 2014 at 11:12 AM, Walter Underwood wrote: > Yes, I implemented exactly that fallback for Solr 1.2 at Netflix. > > It isn’t to hard if the code is structured for it; retry with a batch size of

Re: Solrcloud replicas do not match

2014-11-07 Thread Erick Erickson
How did you create the replica? Does the admin screen show it attached to the proper shard? What I'd do is set up my SolrCloud instance with (presumably) a single node (leader) and insure my searches were working. Then (and only then) use the Collection API ADDREPLICA command. You should see your

Re: Solr exceptions during batch indexing

2014-11-07 Thread Walter Underwood
Yes, I implemented exactly that fallback for Solr 1.2 at Netflix. It isn’t to hard if the code is structured for it; retry with a batch size of 1. wunder On Nov 7, 2014, at 11:01 AM, Erick Erickson wrote: > Yeah, this has been an ongoing issue for a _long_ time. Basically, > you can't. So far,

Re: Delete data from stored documents

2014-11-07 Thread Erick Erickson
bq: My question is if I can delete the field definition from the schema.xml and do an optimize and the fields “magically” disappears no. schema.xml is really just about regularizing how Lucene indexes things. Lucene (where this would have to take place) doesn't have any understanding of schema.xml

Re: High system cpu usage while starting solr

2014-11-07 Thread Erick Erickson
Another thing is to put in some autowarming, both on the caches and firstSearcher and newSearcher. These will pre-fill the caches before having new searchers handle queries. Don't go overboard here, for things like filterCache try, say, autowarm counts of 16 and work your way up. firstSearcher is

Re: Migrating shards

2014-11-07 Thread Erick Erickson
bq: I think ADD/DELETE replica APIs are best for within a SolrCloud I second this, if for no other reason than I'd expect this to get more attention than the underlying core admin API. That said, I believe ADD/DELETE replica just makes use of the core admin API under the covers, in which case you

Re: Ideas for debugging poor SolrCloud scalability

2014-11-07 Thread Erick Erickson
Ian: Thanks much for the writeup! It's always good to have real-world documentation! Best, Erick On Fri, Nov 7, 2014 at 8:26 AM, Shawn Heisey wrote: > On 11/7/2014 7:17 AM, Ian Rose wrote: >> *tl;dr: *Routing updates to a random Solr node (and then letting it forward >> the docs to where they n

Re: Solr exceptions during batch indexing

2014-11-07 Thread Erick Erickson
Yeah, this has been an ongoing issue for a _long_ time. Basically, you can't. So far, people have essentially written fallback logic to index the docs of a failing packet one at a time and report it. I'd really like better reporting back, but we haven't gotten there yet. Best, Erick On Fri, Nov

Re: Solrcloud solrconfig.xml

2014-11-07 Thread Erick Erickson
Each of those data dirs is relative to the instance in question. So if you're running on different machines, they're physically separate even though named identically. If you're running multiple nodes on a single machine a-la the getting started docs, then each one is in it's own directory (e.g.

Solrcloud replicas do not match

2014-11-07 Thread Michal Krajňanský
Hi all, I have a Solrcloud setup with a manually created collection with the index obtained via other means than Solr (data come from Lucene). I created a replica for the index and expected to see the data being copied to the replica, which does not happen. In the Admin interface I see something

Re: Delete data from stored documents

2014-11-07 Thread Alexandre Rafalovitch
On 7 November 2014 06:57, andrey prokopenko wrote: > Full list of updateprocessors for 4.10 version can be found here: > http://lucene.apache.org/solr/4_10_2/solr-core/org/apache/solr/update/processor/UpdateRequestProcessorFactory.html Actually, that's just the top level of the inheritance hiera

Re: How to dynamically create Solr cores with schema

2014-11-07 Thread Alexandre Rafalovitch
The usual solution to that is to have dynamic fields with suffixes indicating the types. So, your int fields are mapped to *_i, your date fields to *_d. Solr has schemaless support, but it is auto-detect for now. Creating fields of particular types via API I think is in JIRA on the trunk for 5.0.

Re: Sort documents by exist(multivalued field)

2014-11-07 Thread Alexandre Rafalovitch
You encode that knowledge by using UpdateRequestProcessor. Clone the field, replace it with true, map it to boolean. That way, you will pay the price once per document indexed not (documentCount*) times per request. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resour

RE: High system cpu usage while starting solr

2014-11-07 Thread Toke Eskildsen
mizayah [miza...@gmail.com] wrote: > What i see is that after restart jvm usage is realy low and raise slowly > while system cpu ussage is high. > My select queries are realy slow during that time. The first searches tend to be slow while Solr fills internal caches and the OS file cache is warmed

Re: Migrating shards

2014-11-07 Thread ralph tice
I think ADD/DELETE replica APIs are best for within a SolrCloud, however if you need to move data across SolrClouds you will have to resort to older APIs, which I didn't find good documentation of but many references to. So I wrote up the instructions to do so here: https://gist.github.com/ralph-t

Re: Ideas for debugging poor SolrCloud scalability

2014-11-07 Thread Shawn Heisey
On 11/7/2014 7:17 AM, Ian Rose wrote: > *tl;dr: *Routing updates to a random Solr node (and then letting it forward > the docs to where they need to go) is very expensive, more than I > expected. Using a "smart" router that uses the cluster config to route > documents directly to their shard resul

Solr exceptions during batch indexing

2014-11-07 Thread Peter Keegan
How are folks handling Solr exceptions that occur during batch indexing? Solr stops parsing the docs stream when an error occurs (e.g. a doc with a missing mandatory field), and stops indexing the batch. The bad document is not identified, so it would be hard for the client to recover by skipping o

Minimum Term Matching in More Like This Queries

2014-11-07 Thread Tim Hearn
Hi! I'm fairly new to Solr. Is there a feature which enforces minimum term matching for MLT Queries? More precisely, that is, a document will match the MLT query if and only if at least x terms in the query are found in the document, with x defined by the user. I could not find such a feature i

Re: Migrating shards

2014-11-07 Thread Michael Della Bitta
1. The new replica will not begin serving data until it's all there and caught up. You can watch the replica status on the Cloud screen to see it catch up; when it's green, you're done. If you're trying to automate this, you're going to look for the replica that says "recovering" in clusterstat

Re: How to dynamically create Solr cores with schema

2014-11-07 Thread Andreas Hubold
Can somebody help, please? I don't think my use-case is so uncommon. I read this JIRA comment where the idea of using config sets as templates was brought up https://issues.apache.org/jira/browse/SOLR-4478?focusedCommentId=13711098&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-ta

Re: High system cpu usage while starting solr

2014-11-07 Thread Vikas Agarwal
One quick improvement can be to add add -Xm*s*6144m along with -Xmx6144m this causes jvm to acquire all memory before hand and it would not waste time in allocating more memory by requesting to kernel. On restart, I am not sure but I guess solr does some syncing of indexes, so it might be slow to

Migrating shards

2014-11-07 Thread Ian Rose
Howdy - What is the current best practice for migrating shards to another machine? I have heard suggestions that it is "add replica on new machine, wait for it to catch up, delete original replica on old machine". But I wanted to check to make sure... And if that is the best method, two follow-u

Autosuggest using EdgeNGrams with strange highlighting

2014-11-07 Thread Thomas Michael Engelke
We've moved from an asterisk based autosuggest functionality ("searchterm*") to a version using a special field called autosuggest, filled via copyField directives. The field definition: positionIncrementGap="100"> class=

Re: Ideas for debugging poor SolrCloud scalability

2014-11-07 Thread Ian Rose
Hi again, all - Since several people were kind enough to jump in to offer advice on this thread, I wanted to follow up in case anyone finds this useful in the future. *tl;dr: *Routing updates to a random Solr node (and then letting it forward the docs to where they need to go) is very expensive,

Re: Term count in multivalue fields

2014-11-07 Thread Nickolay41189
Andrey, thank you for reply. Can you explain what do you mean "faceting query with prefix"? I'm newer in the wolrd of Solr, can you give me example of this query? -- View this message in context: http://lucene.472066.n3.nabble.com/Term-count-in-multivalue-fields-tp4168138p4168167.html Sent from

Solrcloud solrconfig.xml

2014-11-07 Thread Michal Krajňanský
Hi Everyone, I am quite a bit confused about managing configuration files with Zookeeper for running Solr in cloud mode. To be precise, I was able to upload the config files (schema.xml, solrconfig.xml) into the Zookeeper and run Solrcloud. What confuses me are properties like "data.dir", or re

Re: out of memory when trying to sort by id in a 1.5 billion index

2014-11-07 Thread Yago Riveiro
For sorting DocValues are the best option I think. — /Yago Riveiro On Fri, Nov 7, 2014 at 12:45 PM, adfel70 wrote: > hi > I have 11 machines in my cluster. > each machine 128GB memory, 2 solr jvm's with 12gb heap each. > cluster has 7 shard, 3 replicas. > 1.5 billion docs total. > most user que

out of memory when trying to sort by id in a 1.5 billion index

2014-11-07 Thread adfel70
hi I have 11 machines in my cluster. each machine 128GB memory, 2 solr jvm's with 12gb heap each. cluster has 7 shard, 3 replicas. 1.5 billion docs total. most user queries are pretty simple for now, sorting by date fields and another field the has around 1000 unique values. I have a usecase for u

Re: Delete data from stored documents

2014-11-07 Thread Yago Riveiro
Jack,  I have some data indexed that I don’t need any more. My question is if I can delete the field definition from the schema.xml and do an optimize and the fields “magically” disappears (and free space from disk). Re-index data to delete fields is to expensive in collections with hundr

Re: Delete data from stored documents

2014-11-07 Thread Jack Krupansky
Could you clarify exactly what you are trying to do, like with an example? I mean, how exactly are you determining what fields are "unwanted"? Are you simply asking whether fields can be deleted from the index (and schema)? -- Jack Krupansky -Original Message- From: yriveiro Sent: Th

Re: Term count in multivalue fields

2014-11-07 Thread andrey prokopenko
With omitTermFremFreqAndPositions set to true and multivalued field you have no information how many times "zip" term or any other term has appeared in the particular field. If the number of unique values is low, you can try faceting query with prefix, but it will not give you accurate results due

Re: Delete data from stored documents

2014-11-07 Thread andrey prokopenko
Take a look over here: https://wiki.apache.org/solr/UpdateRequestProcessor Full list of updateprocessors for 4.10 version can be found here: http://lucene.apache.org/solr/4_10_2/solr-core/org/apache/solr/update/processor/UpdateRequestProcessorFactory.html You may pick up the most suitable for you

Sort documents by exist(multivalued field)

2014-11-07 Thread Nickolay41189
I want to sort by multivalued field like boolean values. Something like that: *sort exist(multivalued field name) desc* Is it possible? P.S. I know that sorting doesn't work for multivalued fields, but it work for single boolean field... -- View this message in context: http://lucene.472066.n

Sort documents by first value in multivalued field

2014-11-07 Thread Nickolay41189
How can I sort documents by first value in multivalued field? (without adding copyField and without some changes in schema.xml)? -- View this message in context: http://lucene.472066.n3.nabble.com/Sort-documents-by-first-value-in-multivalued-field-tp4168140.html Sent from the Solr - User mailin

Re: Delete data from stored documents

2014-11-07 Thread Yago Riveiro
Andrey Can you point me to any tutorial or howto where I can see how develop custom UpdateProcessor class? — /Yago Riveiro On Fri, Nov 7, 2014 at 10:39 AM, andrey prokopenko wrote: > With "out of the box" functionality, no. You have to develop custom > UpdateProcessor and add it to the upda

Term count in multivalue fields

2014-11-07 Thread Nickolay41189
I have multivalue field in my schema.xml: I have indexed the following documents: ... bmp zip bmp ... ... zip zip bmp ... How can I retrieve the count of the term "zip" (in this example it must be 3) from all multivalued field in this index without add

Re: A bad idea to store core data directory over NAS?

2014-11-07 Thread andrey prokopenko
SolrCoud cluster heavily depends on data locality and high I/O, thus any NFS with access to disk array over the network is multitude times slower than direct I/O and must be avoided. Classical JBOD (just a bunch of disks) config + memory mapped files ensure high performance. On Wed, Nov 5, 2014 at

Re: Delete data from stored documents

2014-11-07 Thread andrey prokopenko
With "out of the box" functionality, no. You have to develop custom UpdateProcessor and add it to the updateprocessors chain. On Thu, Nov 6, 2014 at 3:19 PM, yriveiro wrote: > Hi, > > It's possible remove store data of an index deleting the unwanted fields > from schema.xml and after do an optim

Re: Updating an index

2014-11-07 Thread andrey prokopenko
I echo that. Atomic update is merely a decoration over the same delete/insert pattern, where Solr processor inplace retrieves all the stored fields of the the document, updates the field, then checks _version_ field prior to update and if it was correct, deletes, then insert new version of the docu

Re: on regards to Solr and NoSQL storages integration

2014-11-07 Thread andrey prokopenko
Thanks for the reply. I've considered DataStax, but dropped it first due to the commercial model they're using and second due to the integration model they have chosen to integrate with Cassandra. In their docs (can be found here: http://www.datastax.com/docs/datastax_enterprise3.1/solutions/dse_se

High system cpu usage while starting solr

2014-11-07 Thread mizayah
Hello, Im running few solr cores on one pretty good server. After some time i discover that restarting solr makes queries last longer. What i see is that after restart jvm usage is realy low and raise slowly while system cpu ussage is high. My select queries are realy slow during that time. Af