I should have been clearer, and others have mentioned... the "lots of cores" stuff is really outside Zookeeper/SolrCloud at present. I don't think it's incompatible, but it wasn't part of the design so it'll need some effort to make it play nice with SolrCloud. I'm not sure there's actually a compelling use-case for combining the two.
bq: Also, instead of managing cores is it not possible to manage servers which will be in tens and hundreds? Well, tens to hundreds of servers will work with SolrCloud. You could theoretically take over routing documents (i.e. custom hashing) and simply use SolrCloud without the "lots of cores" stuff. So the scenario is that you have, say, 250 machines that will hold all your data and use custom routing to get the right docs to the right core. Some of the upcoming SolrJ being capable of sending requests only to the proper shard would certainly help here. But this too is rather unexplored territory. I don't think Zookeeper would really have a problem here because it's not moving much data back and forth, the 1M limitation for data in ZK is on a per-core basis and really applies only to the conf data, NOT the index. But the current approach does lend itself to Jack's scenario. Essentially your ClusterKeeper could send the index to one of the machines and create the core there. The current approach addresses the case where you are essentially doing what Jack outlined semi-manually. That is, you're distributing your cores around your cluster based on historical access patterns. It's pretty easy to move the cores around by copying the dirs and using the auto-discovery stuff to keep things in balance, but it's in no way automatic and probably requires a restart (or at least core unload/load). Jack's idea of doing this dynamically should work in that kind of scenario. I can imagine, for instance, some relatively small number of physical machines and all the user's indexes actually being kept on a networked filesystem. The startup process is simply finding a machine with spare capacity and telling it to create the core and pointing it at the pre-existing index. On the assumption that the indexes fit into memory, you'd pay a small penalty for start-up but wouldn't need to copy indexes around. You could elaborate this as necessary, tuning the transient caches such that you "fit" the number/size of users to particular hardware. If the store were an HDFS file system, redundancy/backup/error recovery would come along "for free". But under any scenario, one of the hurdles will be figuring out how many simultaneous users of whatever size can actually be comfortably handled by a particular piece of hardware. And usually there's some kind of long tail just to make it worse. Most of your users will be under X documents, and some users will be 100X.... And updating would be "interesting". But I should emphasize that anything elaborate like this dynamic shuffling is kind of theoretical at this point, meaning we haven't actually tested it. It _should_ work, but I'm sure there will be some issues to flush out. Best Erick On Fri, Jun 7, 2013 at 6:38 AM, Noble Paul നോബിള് नोब्ळ् <noble.p...@gmail.com> wrote: > The Wiki page was built not for Cloud Solr. > > We have done such a deployment where less than a tenth of cores were active > at any given point in time. though there were tens of million indices they > were split among a large no:of hosts. > > > If you don't insist of Cloud deployment it is possible. I'm not sure if it > is possible with cloud > > > On Fri, Jun 7, 2013 at 12:38 AM, Aleksey <bitterc...@gmail.com> wrote: > >> I was looking at this wiki and linked issues: >> http://wiki.apache.org/solr/LotsOfCores >> >> they talk about a limit being 100K cores. Is that per server or per >> entire fleet because zookeeper needs to manage that? >> >> I was considering a use case where I have tens of millions of indices >> but less that a million needs to be active at any time, so they need >> to be loaded on demand and evicted when not used for a while. >> Also since number one requirement is efficient loading of course I >> assume I will store a prebuilt index somewhere so Solr will just >> download it and strap it in, right? >> >> The root issue is marked as "won;t fix" but some other important >> subissues are marked as resolved. What's the overall status of the >> effort? >> >> Thank you in advance, >> >> Aleksey >> > > > > -- > ----------------------------------------------------- > Noble Paul