John,

Large sets is when you have +100M nodes, blob size is not a factor unless
you want full-text indexing. You can control which metadata fields get
indexed, so - https://wiki.apache.org/jackrabbit/IndexingConfiguration -
you need to setup this to a minimum for you application to work. I know
very little a about Mesos, so can't comment.


-- Galo

On Sat, Jun 24, 2017 at 6:59 PM, Clay Ferguson <[email protected]> wrote:

> Related to the updating of indexes. I'm working on a P2P capability which
> will make a JCR Repo behave essentially like a distributed blockchain
> database (i.e. "ledger"), where every node has a full copy of the DB/repo.
> One capability required for that which i've already completed is the
> implementation of a Merkle-Tree-like capability where I can tell if the
> full content under any given subgraph is identical to that located on some
> separate "peer" (network node), simply by comparing a SHA256 hash at both
> nodes (each node being on totally independent repositories).
>
> The method for maintaining 'identical' copies of the repos (technically a
> subgraph in each) will be to use the Merkle-tree to perform a "sync" doing
> the "least effort" data transfers from peer to peer to perform the updates
> (syncing). I may end up using an open source BitTorrent library to perform
> the transmission of data between clients efficiently. So John, that kind of
> technique (BitTorrent protocol) could theoretically help you distribute
> index files across nodes rather than regenerating index files manually
> every time you spin one up.
>
> I admit I haven't even researched "Clusters" (in jackrabbit), and I don't
> know if those are sharded/federated, or whether they use a full "copy" on
> each node. Interestingly, if you're a fan of blockchain, i will also be
> using a public-key encryption system on this app to be able to authenticate
> who added what content, by having each 'edit' (node property modification)
> get hashed and then encrypted with the user's private key, and storing that
> encrypted hash on the tree. So the entire app I am implementing will BE a
> true blockchain, implemented as a layer built on top of the JCR.
>
> I think of what I'm doing as a "reference implementation" of what could
> eventually become a blockchain specification for the JCR which will be an
> extension to the JCR API specifically adding a blockchain protocol/layer on
> top of JCR, and hopefully will become an Apache Project of it's own, and a
> formal spec for how to use JCR to build out Blockchains. What I am doing is
> along the lines of Ethereum, by making blockchain be a more generic,
> accessible, reusable technology, but afaik Ethereum is not built on JCR,
> and I believe in building on top of JCR. Anyone who understands Merkle
> Trees AND the JCR and also is fully cognizant of blockchain would come to
> this same conclusion, I believe.
>
> So I hope at least a couple of the guys who are well-connected in Adobe
> will pass the word up the chain of command regarding this concept. In 10yrs
> nobody will want to use a content repository that doesn't have the level of
> 'trust' that can only come from a blockchain. I think in 10 to 20yrs even
> RDBs will have 'blockchain verifiable' transactions as built-in functions,
> in them also. But for now, a protocol layer on top of and separate from the
> JCR that specifically does blockchain functionality seems like the next
> step for blockchain technology and also for JCR. Who knows, maybe the world
> is ready for Adobe to start a cryptocurrency of their own!? Perhaps that
> would be the financial incentive to get them interested in this? I have
> $10K for that ICO ready and waiting!!
>
> I've probably violated the terms and conditions of this mailing list and I
> apologize if so. I went slightly beyond a reply to John.
>
> Best regards,
> Clay Ferguson
> ​https://github.com/Clay-Ferguson/meta64
> ​[email protected]
>
>
>
> On Sat, Jun 24, 2017 at 6:52 AM, John Chilton <[email protected]> wrote:
>
> > Thanks Galo, this is useful information.
> >
> > When you say, “large” working sets, how large is large — just looking for
> > order of magnitude (Gig, Tera, Peta….)?
> >
> > Also, are you aware if any Mesos frameworks that offer similar
> > capabilities as K8s stateful sets?
> >
> > Thanks again,
> >
> > -John
> >
> > > On Jun 23, 2017, at 6:37 PM, Galo Gimenez <[email protected]>
> > wrote:
> > >
> > > One issue you will find on Jackrabbit is indexing, local storage is
> > ephemeral so new nodes need to re index and on large working sets this
> can
> > take hours.
> > >
> > > Kubernetes introduced stateful sets, this allows you to have very
> stable
> > naming and storage inside the cluster, and a consistent ordering when
> nodes
> > are started -https://kubernetes.io/docs/concepts/workloads/
> > controllers/statefulset/ <https://kubernetes.io/docs/concepts/workloads/
> > controllers/statefulset/>.
> > >
> > > — Galo
> > >
> > >> On Jun 23, 2017, at 11:03 PM, John Chilton <[email protected]>
> wrote:
> > >>
> > >> We are running in an orchestration environment — either
> > Mesos/Chronos/Marathon or Kubernetes.
> > >>
> > >> Each docker container needs to join the Jackrabbit cluster for the
> > lifetime of that container and then leave the Jackrabbit cluster when its
> > work is complete.
> > >> When each container joins the Jackrabbit cluster it is assigned a
> > unique cluster node id (repository.xml). We also have no upper bound on
> the
> > number of our containers that may join the cluster at any given time.
> > >>
> > >> Will this “dynamic” clustering work or will we encounter issues? Is
> > this ill-advised? or are there things we need to do beyond uniquely
> > identify each cluster node.
> > >> I Am trying to get ahead of issues that may arise when exercising
> this.
> > Any thoughts at all would be appreciated.
> > >>
> > >> Thanks,
> > >>
> > >> -John
> > >>
> > >
> >
> >
>



-- 
-- Galo

Reply via email to