Re: Specify Analyzer per field

2014-08-31 Thread Ankit Jain
Thanks for the response guys.. Let's consider I have two fields X and Y and field type of both fields are *text*. Now, i want to use whitespace analyzer for field X and standard analyzer for field Y. In Elasticsearch, we can specify the different analyzer for same field type. Is this feature is a

Re: external indexer for Solr Cloud

2014-08-31 Thread Lee Chunki
Hi Shawn and Jack, Thank you for your reply. Yes, I want to run data import hander independently and sync it to Solr Cloud. because current my DIH node do not only DB fetch & join but also many preprocessing. Thanks, Chunki. On Aug 30, 2014, at 1:34 AM, Jack Krupansky wrote: > My other thou

Re: Scaling to large Number of Collections

2014-08-31 Thread Shalin Shekhar Mangar
Yeah, I second Mark's suggestion on reducing the stack size. The default on modern 64-bit boxes is usually 1024KB which adds up to a lot when you're running 5000 cores (5000 * 2 = 1MB). I think the zk register thread can be pooled together but the search threads can't be because we'd run into d

Re: Business Name spell check

2014-08-31 Thread Benson Margulies
Trying to shoehorn business name resolution or correction purely into Solr tokenization and spell checking is not, in my opinion, a viable approach. It seems to me that you need a query parser that does something very different from pure tokenization, and you might also need a more complex approach

Re: Scaling to large Number of Collections

2014-08-31 Thread Mark Miller
> > so you might still end up with these out of threads issue again. You can also generally drop the stack size (Xss) quite a bit to to handle more threads. Beyond that, there are some thread pools you can configure. However, until we fix the distrib deadlock issue, you don't want to drop the co

Re: Business Name spell check

2014-08-31 Thread Vivek Pathak
Can you write your own spell check class and use something like edit distance to get the desired result Sent from my iPhone > On Aug 27, 2014, at 9:55 AM, Corey Gerhardt > wrote: > > Sorry to keep beating this to death. I could be looking for perfection which > isn't possible. > > I'm tr

Re: Scaling to large Number of Collections

2014-08-31 Thread Jack Krupansky
We should also consider "lightly-sharded" collections. IOW, even if a cluster has dozens or a hundred nodes or more, the goal may not be to shard all collections across all shards, which is fine for the really large collections, but to also support collections which may only need to be sharded

Re: AW: Scaling to large Number of Collections

2014-08-31 Thread Jack Krupansky
You close with two great questions for the community! We have a similar issue over in Apache Cassandra database land (thousands of tables). There is no immediate, easy, great answer. Other than the kinds of "workarounds" being suggested. -- Jack Krupansky -Original Message- From:

Re: Scaling to large Number of Collections

2014-08-31 Thread Erick Erickson
What is your access pattern? By that I mean do all the cores need to be searched at the same time or is it reasonable for them to be loaded on demand? This latter would impose the penalty of the first time a collection was accessed there would be a delay while the core loaded. I suppose I'm asking

Re: Scaling to large Number of Collections

2014-08-31 Thread Ramkumar R. Aiyengar
On 31 Aug 2014 13:24, "Mark Miller" wrote: > > > > On Aug 31, 2014, at 4:04 AM, Christoph Schmidt < christoph.schm...@moresophy.de> wrote: > > > > we see at least two problems when scaling to large number of collections. I would like to ask the community, if they are known and maybe already addres

AW: Scaling to large Number of Collections

2014-08-31 Thread Christoph Schmidt
One collection has 2 replicas, no sharding, the collections are not that big. No, they are unfortunately not independent. There are collections with customer documents (some thousand customers) and product collections. One customer has at least on customer collection and 1 to some hundred produc

Re: Scaling to large Number of Collections

2014-08-31 Thread Shawn Heisey
On 8/31/2014 8:58 AM, Joseph Obernberger wrote: > Could you add another field(s) to your application and use that instead of > creating collections/cores? When you execute a search, instead of picking > a core, just search a single large core but add in a field which contains > some core ID. This

Re: Scaling to large Number of Collections

2014-08-31 Thread Joseph Obernberger
Could you add another field(s) to your application and use that instead of creating collections/cores? When you execute a search, instead of picking a core, just search a single large core but add in a field which contains some core ID. -Joe http://www.lovehorsepower.com On Sun, Aug 31, 2014 at

Re: 4.10 ?

2014-08-31 Thread Shawn Heisey
On 8/30/2014 11:43 PM, Shawn Heisey wrote: > The release is likely to be finalized tomorrow. Once it's finalized and > uploaded, the Apache mirror system will begin replicating. It usually > takes a couple more days before a release is actually announced. The > announcement will be made when it

Re: Scaling to large Number of Collections

2014-08-31 Thread Mark Miller
> On Aug 31, 2014, at 4:04 AM, Christoph Schmidt > wrote: > > we see at least two problems when scaling to large number of collections. I > would like to ask the community, if they are known and maybe already > addressed in development: > We have a SolrCloud running with the following numbers

Re: Scaling to large Number of Collections

2014-08-31 Thread Jack Krupansky
How are the 5 servers arranged in terms of shards and replicas? 5 shards with 1 replica each, 1 shard with 5 replicas, 2 shards with 2 and 3 replicas, or... what? How big is each collection? The key strength of SolrCloud is scaling large collections via shards, NOT scaling large numbers of col

Scaling to large Number of Collections

2014-08-31 Thread Christoph Schmidt
we see at least two problems when scaling to large number of collections. I would like to ask the community, if they are known and maybe already addressed in development: We have a SolrCloud running with the following numbers: - 5 Servers (each 24 CPUs, 128 RAM) - 13.000 Collec