Re: rough maximum cores (shards) per machine?

2015-03-24 Thread Shai Erera
While it's hard to answer this question because as others have said, "it depends", I think it will be good of we can quantify or assess the cost of running a SolrCore. For instance, let's say that a server can handle a load of 10M indexed documents (I omit search load on purpose for now) in a sing

Re: Using G1 with Apache Solr

2015-03-24 Thread Shawn Heisey
On 3/24/2015 9:52 PM, Shawn Heisey wrote: > On 3/24/2015 3:48 PM, Kamran Khawaja wrote: >> I'm running Solr 4.7.2 with Java 7u75 with the following JVM params: I really got my wires crossed. Kamran sent his message to the hostpot-gc-use mailing list, not the solr-user list! Thanks, Shawn

Re: Using G1 with Apache Solr

2015-03-24 Thread Shawn Heisey
On 3/24/2015 3:48 PM, Kamran Khawaja wrote: > I'm running Solr 4.7.2 with Java 7u75 with the following JVM params: > > -verbose:gc > -XX:+PrintGCDateStamps > -XX:+PrintGCDetails > -XX:+PrintAdaptiveSizePolicy > -XX:+PrintReferenceGC > -Xmx3072m > -Xms3072m >

Re: rough maximum cores (shards) per machine?

2015-03-24 Thread Damien Kamerman
>From my experience on a high-end sever (256GB memory, 40 core CPU) testing collection numbers with one shard and two replicas, the maximum that would work is 3,000 cores (1,500 collections). I'd recommend much less (perhaps half of that), depending on your startup-time requirements. (Though I have

Re: Unable to setup solr cloud with multiple collections.

2015-03-24 Thread sthita
Thanks Erick for your reply. I am trying to create a new core i.e dict_cn , which is totally different in terms of index data, configs etc from the existing core "abc". The core is created successfully in my master (i.e mail) and i can do solr query on this newly created core . All the config file

Re: rough maximum cores (shards) per machine?

2015-03-24 Thread Ian Rose
First off thanks everyone for the very useful replies thus far. Shawn - thanks for the list of items to check. #1 and #2 should be fine for us and I'll check our ulimit for #3. To add a bit of clarification, we are indeed using SolrCloud. Our current setup is to create a new collection for each

Re: Custom TokenFilter

2015-03-24 Thread Erick Erickson
bq: 13 moreCaused by: java.lang.ClassCastException: class com.tamingtext.texttamer.solr. This usually means you have jar files from different versions of Solr in your classpath. Best, Erick On Tue, Mar 24, 2015 at 2:38 PM, Test Test wrote: > Hi there, > I'm trying to create my own TokenizerFact

Re: rough maximum cores (shards) per machine?

2015-03-24 Thread Erick Erickson
Test Test: >From Hossman's apache page: When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is "h

Custom TokenFilter

2015-03-24 Thread Test Test
Hi there,  I'm trying to create my own TokenizerFactory (from tamingtext's book).After setting schema.xml and have adding path in solrconfig.xml, i start solr.I have this error message : Caused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType "text": Plugi

Problem with Terms Query Parser

2015-03-24 Thread Shamik Bandopadhyay
Hi, I'm trying to use Terms Query Parser for one of my use cases where I use an implicit filter on bunch of sources. When I'm trying to run the following query, fq={!terms f=Source}help,documentation,sfdc I'm getting the following error. Unknown query parser 'terms'400 What am I missing her

RE: rough maximum cores (shards) per machine?

2015-03-24 Thread Toke Eskildsen
Jack Krupansky [jack.krupan...@gmail.com] wrote: > I'm sure that I am quite unqualified to describe his hypothetical setup. I > mean, he's the one using the term multi-tenancy, so it's for him to be > clear. It was my understanding that Ian used them interchangeably, but of course Ian it the only

Re: rough maximum cores (shards) per machine?

2015-03-24 Thread Test Test
Hi there,  I'm trying to create my own TokenizerFactory (from tamingtext's book).After setting schema.xml and have adding path in solrconfig.xml, i start solr.I have this error message : Caused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType "text": Plugi

Re: rough maximum cores (shards) per machine?

2015-03-24 Thread Jack Krupansky
I'm sure that I am quite unqualified to describe his hypothetical setup. I mean, he's the one using the term multi-tenancy, so it's for him to be clear. For me, it's a question of who has control over the config and schema and collection creation. Having more than one business entity controlling t

Re: Need help using DIH with FileListEntityProcessor with XPathEntityProcessor

2015-03-24 Thread Martin Wunderlich
Very interesting. Thanks, Shawn. Here is what the config file looks like in the Solr admin console: https://www.dropbox.com/s/qtfclbvs8oze7lp/Bildschirmfoto%202015-03-24%20um%2021.11.12.png?dl=0 N

Re: Need help using DIH with FileListEntityProcessor with XPathEntityProcessor

2015-03-24 Thread Shawn Heisey
On 3/24/2015 1:41 PM, Martin Wunderlich wrote: > The file was created in a text editor. I am not sure which quotes you > are referring to. They look fine to me and the XML file valides > alright. Could you perhaps be more specific? This partial screenshot is your email to the list showing your dat

RE: rough maximum cores (shards) per machine?

2015-03-24 Thread Toke Eskildsen
Jack Krupansky [jack.krupan...@gmail.com] wrote: > Don't confuse customers and tenants. Perhaps you could explain what you mean by multi-tenant in the context of Ian's setup? It is not clear to me what the distinction is in this case. - Toke Eskildsen

Re: Need help using DIH with FileListEntityProcessor with XPathEntityProcessor

2015-03-24 Thread Alexandre Rafalovitch
>type=„FileDataSource /> I am getting both missing closing quote and the opening quote is a funny one ("aligns on the bottom"). But your response email also does that, so maybe you are using some "smart" editor. Try checking this conversation in a web archive if you can't see the unusual quote

Re: Need help using DIH with FileListEntityProcessor with XPathEntityProcessor

2015-03-24 Thread Martin Wunderlich
Hi Alex, Thanks again for the reply. See my response below inline. > Am 22.03.2015 um 20:14 schrieb Alexandre Rafalovitch : > > I am not entirely sure your problem is at the XSL level yet? > > *) I see problems with quotes in two places (in datasource, and in > outer entity). Did you paste de

Re: Solr and HDFS configuration

2015-03-24 Thread Michael Della Bitta
The ultimate answer is that you need to test your configuration with your expected workflow. However, the thing that mitigates the remote IO factor (hopefully) is that the Solr HDFS stuff features a blockcache that should (when tuned correctly) cache in RAM the blocks your Solr process needs the m

Re: rough maximum cores (shards) per machine?

2015-03-24 Thread Shawn Heisey
On 3/24/2015 11:22 AM, Ian Rose wrote: > Let me give a bit of background. Our Solr cluster is multi-tenant, where > we use one collection for each of our customers. In many cases, these > customers are very tiny, so their collection consists of just a single > shard on a single Solr node. In fac

Solr and HDFS configuration

2015-03-24 Thread Joseph Obernberger
Hi All - does it make sense to run a solr shard on a node within an Hadoop cluster that is not a data node? In that case all the data that node processes would need to come over the network, but you get the benefit of more CPU for things like faceting. Thank you! -Joe

Re: rough maximum cores (shards) per machine?

2015-03-24 Thread Jack Krupansky
Don't confuse customers and tenants. -- Jack Krupansky On Tue, Mar 24, 2015 at 2:24 PM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > Sorry Jack. That doesn't scale when you have millions of customers. And > these are good problems to have! > > On Tue, Mar 24, 2015 at 10:47 AM, Jack K

Re: rough maximum cores (shards) per machine?

2015-03-24 Thread Shalin Shekhar Mangar
Sorry Jack. That doesn't scale when you have millions of customers. And these are good problems to have! On Tue, Mar 24, 2015 at 10:47 AM, Jack Krupansky wrote: > Multi-tenancy is a bad idea for a single solr Cluster. Better to give each > tenant a separate Solr instance that you spin up and spi

Re: rough maximum cores (shards) per machine?

2015-03-24 Thread Jack Krupansky
Multi-tenancy is a bad idea for a single solr Cluster. Better to give each tenant a separate Solr instance that you spin up and spin down based on demand. Think about it: If there are a small number of tenants, just giving each their own machine will be cheaper than the effort spent managing a mul

Re: rough maximum cores (shards) per machine?

2015-03-24 Thread Ian Rose
Let me give a bit of background. Our Solr cluster is multi-tenant, where we use one collection for each of our customers. In many cases, these customers are very tiny, so their collection consists of just a single shard on a single Solr node. In fact, a non-trivial number of them are totally emp

Regarding detection of duplication

2015-03-24 Thread Iniyan
Hi, My requirement is to detect duplication in title after removing punctuation marks, stop words, accented characters. I am trying to do exact match . After that I am thinking of applying filters. I have tried solr. KeywordTokenizerFactory . It does exact matching. But when I add Stop filt

Re: How to verify a document is indexed by all replicas

2015-03-24 Thread Shai Erera
> > You can add a min_rf=true parameter to your indexing > Yeah I read about it, but it doesn't help me as in this case, I'm implementing some monitoring component over a SolrCloud instance, so I have no handle to the indexing client. I would like the monitor to check the replicas and report somet

Re: Auto naming replicas via ADDREPLICA

2015-03-24 Thread Anshum Gupta
Either of them works for me. If you want to get your hands dirty, please go ahead. I can review/provide feedback if you need anything there. I'll just create a JIRA to begin with. On Tue, Mar 24, 2015 at 9:15 AM, Shai Erera wrote: > I use vanilla 5.0. I intended to fix it myself, but if you want

One of three cores is missing userData and lastModified fields from /admin/cores

2015-03-24 Thread Aaron Daubman
Hey All, On a Solr server running 4.10.2 with three cores, two return the expected info from /solr/admin/cores?wt=json but the third is missing userData and lastModified. The first (artists) and third (tracks) cores from the linked screenshot are the ones I care about. Unfortunately, the third (t

Re: How to verify a document is indexed by all replicas

2015-03-24 Thread Shalin Shekhar Mangar
Hi Shai, To your original question on how to know if a document has been indexed at all replicas -- You can add a min_rf=true parameter to your indexing request and then Solr will add information to the response about how many replicas gave an ack' to the leader. So if the returned number is equal

Re: Solr 5.0 --> "IllegalStateException: unexpected docvalues type NONE" on result grouping

2015-03-24 Thread Shawn Heisey
On 3/12/2015 3:36 PM, Alexandre Rafalovitch wrote: Manual optimize is no longer needed for modern Solr. It does great optimization automatically. The only reason I recommended it here is to make sure that all segments are brought up to the latest version and the deleted documents are purged. That

Re: Auto naming replicas via ADDREPLICA

2015-03-24 Thread Shai Erera
I use vanilla 5.0. I intended to fix it myself, but if you want to go ahead, I'd be happy to review the patch. Shai On Tue, Mar 24, 2015 at 6:11 PM, Anshum Gupta wrote: > It's certainly looks like a bug and the name shouldn't be added to the > request automatically. > Can you confirm what versi

Re: maxReplicasPerNode

2015-03-24 Thread Shai Erera
Thanks guys, this makes sense I guess, from Solr's side. Perhaps we can have a new Collections API like REDIRECTREPLICA or something, that will redirect a replica to the new node. This API can simply do ADDREPLICA on the new node, and DELETEREPLICA of the node that doesn't exist anymore. I guess

Re: Auto naming replicas via ADDREPLICA

2015-03-24 Thread Anshum Gupta
It's certainly looks like a bug and the name shouldn't be added to the request automatically. Can you confirm what version of Solr are you using? If it turns out to be a bug in 5x/trunk I'll create a JIRA and fix it to both #1 and #2. On Mon, Mar 23, 2015 at 9:48 AM, Shai Erera wrote: > Shawn,

Re: maxReplicasPerNode

2015-03-24 Thread Anshum Gupta
Yes, it applies to both. Solr wouldn't auto-add replicas in either of those cases (or any other case) to meet the rf specified at create time. On Tue, Mar 24, 2015 at 2:22 AM, Shai Erera wrote: > Thanks Anshum, > > About #3, i line with my answer to the previous question, Solr wouldn't > > auto-

Setting up SOLR 5 from an RPM

2015-03-24 Thread Tom Evans
Hi all We're migrating to SOLR 5 (from 4.8), and our infrastructure guys would prefer we installed SOLR from an RPM rather than extracting the tarball where we need it. They are creating the RPM file themselves, and it installs an init.d script and the equivalent of the tarball to /opt/solr. We'r

Re: How to verify a document is indexed by all replicas

2015-03-24 Thread Shai Erera
Thanks Erick, When a replica is down, no updates are sent to it. When it comes back up, it discovers that it needs to catch-up with the leader. If there are many events it falls back to index replication (slower). During this period of time, is the replica considered ACTIVE or RECOVERING? And, ca

Re: Issues to create new core

2015-03-24 Thread Erick Erickson
Tell us all the steps you went through to do this. Note that you should _not_ be using the core admin in the admin UI if you're working with SolrCloud. For stand-alone Solr, the message above is probably caused by your not having a conf directory set up already. The core admin UI expects that you

Re: rough maximum cores (shards) per machine?

2015-03-24 Thread Jack Krupansky
Shards per collection, or across all collections on the node? It will all depend on: 1. Your ingestion/indexing rate. High, medium or low? 2. Your query access pattern. Note that a typical query fans out to all shards, so having more shards than CPU cores means less parallelism. 3. How many colle

Re: rough maximum cores (shards) per machine?

2015-03-24 Thread Erick Erickson
Well, there's a ticket out there for "thousands of collections on a single machine", although this is wy out there. I often see 10-20 small cores on a 4-8 core machine if they're reasonably small (a few million docs). I see a single replica strain a 128G 16 core machine if it has 300M docs

Re: Unable to setup solr cloud with multiple collections.

2015-03-24 Thread Erick Erickson
Why are you doing this in the first place? SolrCloud and master/slave are fundamentally different. When running in SolrCloud mode, there is no need whatsoever to configure replication as per the Wiki link you've outlined above, that's for the older style master/slave setups. Just change it back an

Re: Auto naming replicas via ADDREPLICA

2015-03-24 Thread Shawn Heisey
On 3/23/2015 10:48 AM, Shai Erera wrote: > The 'name' param isn't set when I send the URL request (and it's also not > specified in the reference guide), but only when I add the replica using > SolrJ. I then tweaked my code to do the following: > > final CollectionAdminRequest.AddReplica addRepl

Re: Solr replicas going in recovering state during heavy indexing

2015-03-24 Thread Erick Erickson
What do the Solr logs show happens on those servers when they go into recovery? What have you tried to do to diagnose the problem? You might review: http://wiki.apache.org/solr/UsingMailingLists The first thing I'd check, though, is whether you're seeing large GC pauses that exceed the Zookeeper t

Re: How to verify a document is indexed by all replicas

2015-03-24 Thread Erick Erickson
You can always issue a *:* query, but it'd have to be at least your autoSoftCommit interval ago since the soft commit trigger will have slightly different wall clock times. But it shouldn't be necessary to wait I don't think. Since the indexing request doesn't succeed until the docs have been writ

Re: TooManyBasicQueries?

2015-03-24 Thread Ian Rose
Ah yes, right you are. I had thought that `surround` required a different endpoint, but I see now that someone is using a surround query. Many thanks! On Tue, Mar 24, 2015 at 10:02 AM, Erik Hatcher wrote: > Somehow a surround query is being constructed along the way. Search your > logs for “s

Re: document contained more than 100000 characters

2015-03-24 Thread Shawn Heisey
On 3/23/2015 3:08 AM, Srinivas wrote: > Present in my project we are using apache tika for reading metadata of the > file,So whenever we handled large files(contained more than 10 > characters file) tika generating the error is file contained more than > 10 characters, So is it possible or

Re: How to remove an Alert

2015-03-24 Thread Shawn Heisey
On 3/23/2015 2:35 PM, jack.met...@hp.com wrote: > I have a problem with [ ... briefly describe your problem here ... ] > > [ ... insert additional info here - keep it short and to the point ... ] > > Below are some SPM graphs showing the state of my system. > Here's the 'Threads' graph: > htt

Re: maxReplicasPerNode

2015-03-24 Thread Shawn Heisey
On 3/24/2015 3:22 AM, Shai Erera wrote: >>> If this is explained somewhere, I'd appreciate if you can give me a >>> pointer. I don't think it's explained anywhere, so that's a lack in the documentation. One problem with automatic replica addition in response to cluster problems is that there is n

Re: TooManyBasicQueries?

2015-03-24 Thread Erik Hatcher
Somehow a surround query is being constructed along the way. Search your logs for “surround” and see if someone is maybe sneaking a q={!surround}… in there. If you’re passing input directly through from your application to Solr’s q parameter without any sanitizing or filtering, it’s possible a

Issues to create new core

2015-03-24 Thread Alejandro Jesus Mariño Molerio
Dear Solr Community: I just began to work with Solr. I choose Solr 5.0, but when I try to create a new core with GUI, show the following error: " Error CREATEing SolrCore 'datos': Unable to create core [datos] Caused by: Can't find resource 'solrconfig.xml' in classpath or 'C:\solr\server\solr\

rough maximum cores (shards) per machine?

2015-03-24 Thread Ian Rose
Hi all - I'm sure this topic has been covered before but I was unable to find any clear references online or in the mailing list. Are there any rules of thumb for how many cores (aka shards, since I am using SolrCloud) is "too many" for one machine? I realize there is no one answer (depends on s

Solr replicas going in recovering state during heavy indexing

2015-03-24 Thread Gopal Jee
Hi We have a large solrcloud cluster. We have observed that during heavy indexing, large number of replicas go to recovering or down state. What could be the possible reason and/or fix for the issue. Gopal

Re: TooManyBasicQueries?

2015-03-24 Thread Ian Rose
Hi Erik - Sorry, I totally missed your reply. To the best of my knowledge, we are not using any surround queries (have to admit I had never heard of them until now). We use solr.SearchHandler for all of our queries. Does that answer the question? Cheers, Ian On Fri, Mar 13, 2015 at 10:08 AM,

RE: Read or Capture Solr Logs

2015-03-24 Thread Markus Jelsma
Hello - process the logs means you have to build your own program that reads and processes the logs, and does what ever you need it to. In a custom SearchComponent you can implement e.g. process() [1] and read the query, and do something with it. [1]: http://lucene.apache.org/solr/5_0_0/solr-c

Set search query logs into Solr

2015-03-24 Thread Nitin Solanki
Hello, I want to insert searched queries into solr log to track the input of users. I googled too much but didn't find anything. Please help. Your help will be appreciated...

How to verify a document is indexed by all replicas

2015-03-24 Thread Shai Erera
Hi Is there a recommended, preferably fast, way to check that a document is indexed by all replicas? I currently do that by issuing a search request to each replica, but was wondering if there's a faster way. Even better, is there a way to verify all replicas of a shard are "up-to-date", e.g. by

Re: Read or Capture Solr Logs

2015-03-24 Thread Nitin Solanki
Hi Markus, Can you please help me. How to do that? Using both "Process the logs" or "make a simple SearchComponent implementation that reads SolrQueryRequest" On Tue, Mar 24, 2015 at 4:25 PM, Nitin Solanki wrote: > Hi Markus, > Can you please help me. How to d

Re: Read or Capture Solr Logs

2015-03-24 Thread Nitin Solanki
Hi Markus, Can you please help me. How to do that? Using both "Process the logs" or "make a simple SearchComponent implementation that reads SolrQueryRequest." On Tue, Mar 24, 2015 at 4:17 PM, Markus Jelsma wrote: > Hello, you can either process the logs, or make a simple Searc

RE: Read or Capture Solr Logs

2015-03-24 Thread Markus Jelsma
Hello, you can either process the logs, or make a simple SearchComponent implementation that reads SolrQueryRequest. Markus -Original message- > From:Nitin Solanki > Sent: Tuesday 24th March 2015 11:38 > To: solr-user@lucene.apache.org > Subject: Read or Capture Solr Logs > > Hello

Read or Capture Solr Logs

2015-03-24 Thread Nitin Solanki
Hello, I want to read or capture all the queries which are searched by users. Any help on this?

Re: _text

2015-03-24 Thread Zheng Lin Edwin Yeo
Hi Philippe, That means you're using the physical schema.xml. You can check the file in your collection, under conf folder. For mine I don't have the _text field in my schema.xml. If you don't require it in your setup, you can try removing it and see if it's ok? Else you can use the schema.xml or

Re: _text

2015-03-24 Thread phiroc
Hi Zheng, I copied the SOLR 5 schema.xml file on Github (?), which contains the following line: - Mail original - De: "Zheng Lin Edwin Yeo" À: solr-user@lucene.apache.org Envoyé: Mardi 24 Mars 2015 10:59:49 Objet: Re: _text Hi Philippe, Are you using the default schemaFactory, i

Re: _text

2015-03-24 Thread Zheng Lin Edwin Yeo
Hi Philippe, Are you using the default schemaFactory, in which your setting in solrconfig.xml is , or you have used your own defined schema.xml, in which your setting in solrconfig.xml should be ? Regards, Edwin On 24 March 2015 at 17:40, wrote: > > Hello, > > my SOLR 5 Admin Panel displays

_text

2015-03-24 Thread phiroc
Hello, my SOLR 5 Admin Panel displays the following error: 23/03/2015 15:05:05 ERROR SolrCore org.apache.solr.common.SolrException: undefined field: "_text" How should _text be defined in schema.xml? Many thanks. Philippe

Re: maxReplicasPerNode

2015-03-24 Thread Shai Erera
Thanks Anshum, About #3, i line with my answer to the previous question, Solr wouldn't > auto-add a Replica to meet the replication factor when a node goes down. > Just to make sure the answer applies to both these cases: 1. There are two replicas on node1 and node2. Solr won't add a replica

Re: maxReplicasPerNode

2015-03-24 Thread Anshum Gupta
Hi Shai, As of now, all replicas for a collections are created to meet the specified replication factor at the time of collection creation. There's no way to defer that until more nodes are up. Your best bet is to have the nodes already up before you CREATE the collection or create the collection

maxReplicasPerNode

2015-03-24 Thread Shai Erera
Hi I saw that we can define maxShardsPerNode when creating a collection, but I don't see that I can set something similar for replicas. My scenario is the following: - I setup one Solr node - Create collection with numShards=1 and replicationFactor=2 - Hopefully, one replica is created o