John, Large sets is when you have +100M nodes, blob size is not a factor unless you want full-text indexing. You can control which metadata fields get indexed, so - https://wiki.apache.org/jackrabbit/IndexingConfiguration - you need to setup this to a minimum for you application to work. I know very little a about Mesos, so can't comment.
-- Galo On Sat, Jun 24, 2017 at 6:59 PM, Clay Ferguson <[email protected]> wrote: > Related to the updating of indexes. I'm working on a P2P capability which > will make a JCR Repo behave essentially like a distributed blockchain > database (i.e. "ledger"), where every node has a full copy of the DB/repo. > One capability required for that which i've already completed is the > implementation of a Merkle-Tree-like capability where I can tell if the > full content under any given subgraph is identical to that located on some > separate "peer" (network node), simply by comparing a SHA256 hash at both > nodes (each node being on totally independent repositories). > > The method for maintaining 'identical' copies of the repos (technically a > subgraph in each) will be to use the Merkle-tree to perform a "sync" doing > the "least effort" data transfers from peer to peer to perform the updates > (syncing). I may end up using an open source BitTorrent library to perform > the transmission of data between clients efficiently. So John, that kind of > technique (BitTorrent protocol) could theoretically help you distribute > index files across nodes rather than regenerating index files manually > every time you spin one up. > > I admit I haven't even researched "Clusters" (in jackrabbit), and I don't > know if those are sharded/federated, or whether they use a full "copy" on > each node. Interestingly, if you're a fan of blockchain, i will also be > using a public-key encryption system on this app to be able to authenticate > who added what content, by having each 'edit' (node property modification) > get hashed and then encrypted with the user's private key, and storing that > encrypted hash on the tree. So the entire app I am implementing will BE a > true blockchain, implemented as a layer built on top of the JCR. > > I think of what I'm doing as a "reference implementation" of what could > eventually become a blockchain specification for the JCR which will be an > extension to the JCR API specifically adding a blockchain protocol/layer on > top of JCR, and hopefully will become an Apache Project of it's own, and a > formal spec for how to use JCR to build out Blockchains. What I am doing is > along the lines of Ethereum, by making blockchain be a more generic, > accessible, reusable technology, but afaik Ethereum is not built on JCR, > and I believe in building on top of JCR. Anyone who understands Merkle > Trees AND the JCR and also is fully cognizant of blockchain would come to > this same conclusion, I believe. > > So I hope at least a couple of the guys who are well-connected in Adobe > will pass the word up the chain of command regarding this concept. In 10yrs > nobody will want to use a content repository that doesn't have the level of > 'trust' that can only come from a blockchain. I think in 10 to 20yrs even > RDBs will have 'blockchain verifiable' transactions as built-in functions, > in them also. But for now, a protocol layer on top of and separate from the > JCR that specifically does blockchain functionality seems like the next > step for blockchain technology and also for JCR. Who knows, maybe the world > is ready for Adobe to start a cryptocurrency of their own!? Perhaps that > would be the financial incentive to get them interested in this? I have > $10K for that ICO ready and waiting!! > > I've probably violated the terms and conditions of this mailing list and I > apologize if so. I went slightly beyond a reply to John. > > Best regards, > Clay Ferguson > https://github.com/Clay-Ferguson/meta64 > [email protected] > > > > On Sat, Jun 24, 2017 at 6:52 AM, John Chilton <[email protected]> wrote: > > > Thanks Galo, this is useful information. > > > > When you say, “large” working sets, how large is large — just looking for > > order of magnitude (Gig, Tera, Peta….)? > > > > Also, are you aware if any Mesos frameworks that offer similar > > capabilities as K8s stateful sets? > > > > Thanks again, > > > > -John > > > > > On Jun 23, 2017, at 6:37 PM, Galo Gimenez <[email protected]> > > wrote: > > > > > > One issue you will find on Jackrabbit is indexing, local storage is > > ephemeral so new nodes need to re index and on large working sets this > can > > take hours. > > > > > > Kubernetes introduced stateful sets, this allows you to have very > stable > > naming and storage inside the cluster, and a consistent ordering when > nodes > > are started -https://kubernetes.io/docs/concepts/workloads/ > > controllers/statefulset/ <https://kubernetes.io/docs/concepts/workloads/ > > controllers/statefulset/>. > > > > > > — Galo > > > > > >> On Jun 23, 2017, at 11:03 PM, John Chilton <[email protected]> > wrote: > > >> > > >> We are running in an orchestration environment — either > > Mesos/Chronos/Marathon or Kubernetes. > > >> > > >> Each docker container needs to join the Jackrabbit cluster for the > > lifetime of that container and then leave the Jackrabbit cluster when its > > work is complete. > > >> When each container joins the Jackrabbit cluster it is assigned a > > unique cluster node id (repository.xml). We also have no upper bound on > the > > number of our containers that may join the cluster at any given time. > > >> > > >> Will this “dynamic” clustering work or will we encounter issues? Is > > this ill-advised? or are there things we need to do beyond uniquely > > identify each cluster node. > > >> I Am trying to get ahead of issues that may arise when exercising > this. > > Any thoughts at all would be appreciated. > > >> > > >> Thanks, > > >> > > >> -John > > >> > > > > > > > > -- -- Galo
