Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-04-01 Thread Adam Kocoloski
I went and collected this discussion into an RFC proposing the “exploded KV” approach: https://github.com/apache/couchdb-documentation/pull/403 Cheers, Adam > On Feb 20, 2019, at 10:47 AM, Paul Davis wrote: > > Strongly agree that

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-20 Thread Paul Davis
Strongly agree that we very much don't want to have Erlang-isms being pushed into fdb. Regardless of what we end up with I'd like to see a very strong (de)?serialization layer with some significant test coverage. On Tue, Feb 19, 2019 at 6:54 PM Adam Kocoloski wrote: > > Yes, that sort of

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-19 Thread Adam Kocoloski
Yes, that sort of versioning has been omitted from the various concrete proposals but we definitely want to have it. We’ve seen the alternative in some of the Erlang records that we serialize to disk today and it ain’t pretty. I can imagine that we’ll want to have the codebase laid out in a way

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-19 Thread Paul Davis
g > > Sent: Tuesday, February 19, 2019 5:41:23 PM > > Subject: Re: [DISCUSS] : things we need to solve/decide : storing JSON > documents > > > > A simple doc storage version number would likely be enough for future > > us to > > do fancier things. > > >

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-19 Thread Joan Touzet
Would it be too much work to prototype both and check CRUD timings for each across a small variety of documents? -Joan - Original Message - > From: "Paul Davis" > To: dev@couchdb.apache.org > Sent: Tuesday, February 19, 2019 5:41:23 PM > Subject: Re: [DISCUSS] : t

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-19 Thread Paul Davis
A simple doc storage version number would likely be enough for future us to do fancier things. On Tue, Feb 19, 2019 at 4:16 PM Benjamin Anderson wrote: > > I don’t think adding a layer of abstraction is the right move just yet, > I think we should continue to find consensus on one answer to

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-19 Thread Benjamin Anderson
> I don’t think adding a layer of abstraction is the right move just yet, I > think we should continue to find consensus on one answer to this question Agree that the theorycrafting stage is not optimal for making abstraction decisions, but I suspect it would be worthwhile somewhere between

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-19 Thread Robert Samuel Newson
Addendum: By “directory aliasing” I meant within a document (either the actual Directory thing or something equivalent of our own making). The directory aliasing for each database is a good way to reduce key size without a significant cost. Though if Redwood lands in time, even this would

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-19 Thread Benjamin Anderson
As is evident by the length of this thread, there's a pretty big design space to cover here, and it seems unlikely we'll have arrived at a "correct" solution even by the time this thing ships. Perhaps it would be worthwhile to treat the in-FDB representation of data as a first-class abstraction

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-19 Thread Robert Newson
Good points on revtree, I agree with you we should store that intelligently to gain the benefits you mentioned. -- Robert Samuel Newson rnew...@apache.org On Tue, 19 Feb 2019, at 18:41, Adam Kocoloski wrote: > I do not think we should store the revtree as a blob. The design where > each

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-19 Thread Adam Kocoloski
I do not think we should store the revtree as a blob. The design where each edit branch is its own KV should save on network IO and CPU cycles for normal updates. We’ve performed too many heroics to keep couch_key_tree from stalling entire databases when trying to update a single document with

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-19 Thread Robert Newson
I like the idea that we'd reuse the same pattern (but perhaps not the same _code_) for doc bodies, revtree and attachments. I hope we still get to delete couch_key_tree.erl, though. -- Robert Samuel Newson rnew...@apache.org On Tue, 19 Feb 2019, at 17:03, Jan Lehnardt wrote: > I like the

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-19 Thread Jan Lehnardt
I like the idea from a “trying a simple thing first” perspective, but Nick’s points below are especially convincing to with this for now. Best Jan — > On 19. Feb 2019, at 17:53, Nick Vatamaniuc wrote: > > Hi, > > Sorry for jumping in so late, I was following from the sidelines mostly. A >

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-19 Thread Nick Vatamaniuc
Hi, Sorry for jumping in so late, I was following from the sidelines mostly. A lot of good discussion happening and am excited about the possibilities here. I do like the simpler "chunking" approach for a few reasons: * Most documents bodies are probably going to be smaller than 100k. So in the

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-19 Thread Paul Davis
> I'm very interested in knowing if anyone else is interested in going this > simple, or considers it a wasted opportunity relative to the 'exploded' path. > Very interested because this is how the Record Layer stores their protobuf messages.

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-19 Thread Robert Newson
Hi, An alternative storage model that we should seriously consider is to follow our current approach in couch_file et al. Specifically, that the document _body_ is stored as an uninterpreted binary value. This would be much like the obvious plan for attachment storage; a key prefix that

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-04 Thread Robert Newson
I've been remiss here in not posting the data model ideas that IBM worked up while we were thinking about using FoundationDB so I'm posting it now. This is Adam' Kocoloski's original work, I am just transcribing it, and this is the context that the folks from the IBM side came in with, for full

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-04 Thread Ilya Khlopotov
I want to fix previous mistakes. I did two mistakes in previous calculations: - I used 1Kb as base size for calculating expansion factor (although we don't know exact size of original document) - The expansion factor calculation included number of revisions (it shouldn't) I'll focus on

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-04 Thread Robert Newson
I think we're deep in the weeds on this small aspect of the data model problem, and haven't touched other aspects yet. The numbers used in your example (1k of paths, 100 unique field names, 100 bytes for a value), where are they from? If they are not from some empirical data source, I don't see

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-04 Thread Ilya Khlopotov
At some point I changed the number of unique JSON paths and probably forgot to update other conditions. The ` - each document is around 10Kb` is not used in the calculations so can be ignored. On 2019/02/04 17:46:20, Adam Kocoloski wrote: > Ugh! We definitely cannot have a model where a 10K

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-04 Thread Adam Kocoloski
Ugh! We definitely cannot have a model where a 10K JSON document is exploded into 2MB worth of KV data. I’ve tried several times to follow the math here but I’m failing. I can’t even get past this first bit: > - each document is around 10Kb > - each document consists of 1K of unique JSON paths

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-04 Thread Robert Newson
Hi, The talk of crypto in the key space is extremely premature in my opinion. It it is the database's job (foundationdb's in this case) to map meaningful names to whatever it takes to efficiently store, index, and retrieve them. Obscuring every key with an expensive cryptographic operation

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-04 Thread Ilya Khlopotov
Hi Michael, > For example, hears a crazy thought: > Map every distinct occurence of a key/value instance through a crypto hash > function to get a set of hashes. > > These can be be precomputed by Couch without any lookups in FDB. These > will be spread all over kingdom come in FDB and not lend

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-03 Thread Robert Samuel Newson
ld > continue to contribute to the namespace, is that right? > > -Joan > > - Original Message - >> From: "Ilya Khlopotov" >> To: dev@couchdb.apache.org >> Sent: Wednesday, 30 January, 2019 8:05:05 AM >> Subject: Re: [DISCUSS] : things

RE: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-01 Thread Reddy B .
Thank you very much De : Robert Newson Envoyé : vendredi 1 février 2019 10:29 À : dev@couchdb.apache.org Objet : Re: [DISCUSS] : things we need to solve/decide : storing JSON documents Hi, "rebasing is not just a politically correct way of saying that Co

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-01 Thread Robert Newson
voice the more general concern > aforementioned. > > But the specific question is: what's the vision for "schema-less" usage > of CouchDb. > > Thanks > > > > ____________ > De : Ilya Khlopotov > Envoyé : mercredi 30 janvier

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-01 Thread Jan Lehnardt
ntage of this > opportunity to voice the more general concern aforementioned. > > But the specific question is: what's the vision for "schema-less" usage of > CouchDb. > > Thanks > > > > ____________ > De : Ilya Khlopotov > Envoyé

RE: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-02-01 Thread Reddy B .
the vision for "schema-less" usage of CouchDb. Thanks De : Ilya Khlopotov Envoyé : mercredi 30 janvier 2019 22:08 À : dev@couchdb.apache.org Objet : Re: [DISCUSS] : things we need to solve/decide : storing JSON documents > I think I prefer the idea of

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-31 Thread Robert Newson
Thanks! I stress it would be optional, we could add it in a release after the main couchdb-on-fdb in response to pressure from users finding the 10mb (etc) limits too restrictive, or we can do it as a neat enhancement in its own right (the validation aspect) that just happens to allow us to

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-31 Thread Adam Kocoloski
I like the idea, both for the efficiencies it enables in the FoundationDB data model and for the ability to cover a lot of validation functionality without shelling out to JS. It’s pretty obviously a big, meaty topic unto itself, one that needs some careful thought and design. Also an awful

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-31 Thread Robert Newson
Hi, An enhancement over the first idea (where we flatten JSON documents into keys and values where the key is the full path to every terminal value, regardless of depth in the JSON) is to allow users to register schemas. For documents that match a registered schema (suggestion, a top level

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-31 Thread Adam Kocoloski
> On Jan 31, 2019, at 1:47 AM, ermouth wrote: > >> As I don't see the 10k limitation as having significant merit > > Not sure it’s relevant here, but Mango indexes put selected doc values into > keys. > > ermouth Totally relevant. Not just because of the possibility of putting a large

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-31 Thread Michael Fair
On Wed, Jan 30, 2019 at 10:48 PM ermouth wrote: > > As I don't see the 10k limitation as having significant merit > > Not sure it’s relevant here, but Mango indexes put selected doc values into > keys. > It kind of is, putting the values at the end of the keys is a viable strategy. It means

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-30 Thread ermouth
> As I don't see the 10k limitation as having significant merit Not sure it’s relevant here, but Mango indexes put selected doc values into keys. ermouth

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-30 Thread Michael Fair
On Wed, Jan 30, 2019, 12:57 PM Adam Kocoloski Hi Michael, > > > The trivial fix is to use DOCID/REVISIONID as DOC_KEY. > > Yes that’s definitely one way to address storage of edit conflicts. I > think there are other, more compact representations that we can explore if > we have this “exploded”

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-30 Thread Michael Fair
> > > Assuming it only does "prefix" and not "segment", then I don't think > this will help because the DOCID for each key in JSON_PATH will be > different, making the "prefix" to each path across different documents > distinct. > > I’m not sure I follow you here, or we have different

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-30 Thread Ilya Khlopotov
> I think I prefer the idea of indexing all document's keys using the same > identifier set. In general I think applications have the behavior that > some keys are referenced far more than other keys and giving those keys in > each document the same value I think could eventually prove useful for

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-30 Thread Adam Kocoloski
Hi Michael, > The trivial fix is to use DOCID/REVISIONID as DOC_KEY. Yes that’s definitely one way to address storage of edit conflicts. I think there are other, more compact representations that we can explore if we have this “exploded” data model where each scalar value maps to an individual

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-30 Thread Ilya Khlopotov
> The limitation I was calling out was you can't store two different values > in the same Key. > {CouchNS}/{DOCID}/{XXX} Sorry I was not clear. I used DOC_KEY instead of DOCID in the proposal on purpose. The format of DOC_KEY is TBD. One option is to have DOC_KEY = DOCID/REVISIONID. In this

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-30 Thread Michael Fair
> > | limitation |recommended value|recommended max|absolute > max| > > |-|--:|:|--:| > | transaction duration | | > | 5 sec | > | transaction data size |

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-30 Thread Michael Fair
On Wed, Jan 30, 2019 at 9:53 AM Ilya Khlopotov wrote: > FoundationDB Records layer uses global schema for JSON documents. They > also have a nice way of creating indexes and schema evolution support. > However this support comes at a cost of extra lookups in different > subspace. With local

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-30 Thread Michael Fair
On Wed, Jan 30, 2019 at 11:54 AM Ilya Khlopotov wrote: > Hi Mike, > > > The trivial fix is to use DOCID/REVISIONID as DOC_KEY. > This doesn't solve the issue with scalar values being over the limits > FoundationDB can support. > > Right, that wasn't the limitation I was calling out. The

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-30 Thread Ilya Khlopotov
Hi Mike, > The trivial fix is to use DOCID/REVISIONID as DOC_KEY. This doesn't solve the issue with scalar values being over the limits FoundationDB can support. Best regards, iilyak On 2019/01/30 19:00:15, Michael Fair wrote: > I know the claim was to avoid "revisions" and "conflicts"

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-30 Thread Michael Fair
I know the claim was to avoid "revisions" and "conflicts" discussion in this thread but isn't that unavoidable. In scheme #1 you have multiple keys with the same DOCID/PART_IDX but different data. In schemes #2 / #3 you have multiple copies of the JSON_PATH but different values. The trivial fix

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-30 Thread Ilya Khlopotov
FoundationDB Records layer uses global schema for JSON documents. They also have a nice way of creating indexes and schema evolution support. However this support comes at a cost of extra lookups in different subspace. With local mapping table we almost (except a corner case) certain that the

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-30 Thread Ilya Khlopotov
> I was kind of expecting to see some magic value indicating that the > subsequent set of keys with the same prefix are all elements of a “multi-part > object” I missed this aspect. This is easy to solve (as you've mentioned) by using either a special character or reserved value in the mapping

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-30 Thread Jan Lehnardt
Ah sure, if we store the *cough* schema per doc, then it's not that easy. An iteration of this proposal could store paths globally with ids that the k/v store then uses for keys, which would enable what I described, but happy to ignore this for the time being. :) Cheers Jan — > On 30. Jan

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-30 Thread Adam Kocoloski
Jan, I don’t think it does have that "fun property #2", as the mapping is created separately for each document. In this proposal the field name “foo” could map to 2 in one document and 42 in another. Thanks for the proposal Ilya. Personally I wonder if the 10KB limit on field paths is anything

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-30 Thread Jan Lehnardt
+1 > On 30. Jan 2019, at 15:31, Paul Davis wrote: > > Jiffy preserves duplicate keys if its not decoding into a map (in > which case last value for duplicate keys wins). Its significantly > corner case and not at all supported by nearly any other JSON library > so changing that shouldn't be

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-30 Thread Paul Davis
Jiffy preserves duplicate keys if its not decoding into a map (in which case last value for duplicate keys wins). Its significantly corner case and not at all supported by nearly any other JSON library so changing that shouldn't be considered a breaking change in my opinion. On Wed, Jan 30, 2019

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-30 Thread Mike Rhodes
>From what I recall Jiffy is able to cope with the valid-but-kinda-silly[1] >thing where you have multiple JSON keys with the same name, i.e., { "foo": 1, >"foo": 2 }. Are the proposals on the table able to continue this support (or am I wrong about Jiffy)? [1]

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-30 Thread Jan Lehnardt
> On 30. Jan 2019, at 14:22, Jan Lehnardt wrote: > > Thanks Ilya for getting this started! > > Two quick notes on this one: > > 1. note that JSON does not guarantee object key order and that CouchDB has > never guaranteed it either, and with say emit(doc.foo, doc.bar), if either > emit()

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-30 Thread Jan Lehnardt
Thanks Ilya for getting this started! Two quick notes on this one: 1. note that JSON does not guarantee object key order and that CouchDB has never guaranteed it either, and with say emit(doc.foo, doc.bar), if either emit() parameter was an object, the undefined-sort-order of SpiderMonkey

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-30 Thread Ilya Khlopotov
# First proposal In order to overcome FoudationDB limitations on key size (10 kB) and value size (100 kB) we could use the following approach. Bellow the paths are using slash for illustration purposes only. We can use nested subspaces, tuples, directories or something else. - Store

[DISCUSS] : things we need to solve/decide : storing JSON documents

2019-01-30 Thread Ilya Khlopotov
As you might already know the FoundationDB has a number of limitations which influences the way we might store JSON documents. The limitations are: | limitation |recommended value|recommended max|absolute max|