Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

Robert Samuel Newson Sun, 03 Feb 2019 00:43:45 -0800

Hi,


Yes. The value (in the foundationdb sense) will always be the terminal value 
(string, number, boolean, null) and so the key has to be the path to that, 
including special delimiters for object and array boundaries (what fdb calls 
’the tuple layer’).

I’d also like to see more effort into simplifying the way we map json 
documents/mvcc into fdb as far as possible (but not further). We can add 
embellishments over time, but removing complexity is much harder.

B.

> On 3 Feb 2019, at 01:27, Joan Touzet <woh...@apache.org> wrote:
> 
> Hi Ilya,
> 
> I'm not seeing it in your proposal explicitly, or maybe I'm not
> readling it very well, so: can you confirm that arrays of large
> objects would continue to be deconstructed down to the base JSON
> types of string, number, boolean, and null? Any intermediate 
> objects (or further nesting of arrays and objects beyond that) would
> continue to contribute to the namespace, is that right?
> 
> -Joan
> 
> ----- Original Message -----
>> From: "Ilya Khlopotov" <iil...@apache.org>
>> To: dev@couchdb.apache.org
>> Sent: Wednesday, 30 January, 2019 8:05:05 AM
>> Subject: Re: [DISCUSS] : things we need to solve/decide : storing JSON 
>> documents
>> 
>> # First proposal
>> 
>> In order to overcome FoudationDB limitations on key size (10 kB) and
>> value size (100 kB) we could use the following approach.
>> 
>> Bellow the paths are using slash for illustration purposes only. We
>> can use nested subspaces, tuples, directories or something else.
>> 
>> - Store documents in a subspace or directory  (to keep prefix for a
>> key short)
>> - When we store the document we would enumerate all field names (0
>> and 1 are reserved) and store the mapping table in the key which
>> look like:
>> ```
>> {DB_DOCS_NS} / {DOC_KEY} / 0
>> ```
>> - Flatten the JSON document (convert it into key value pairs where
>> the key is `JSON_PATH` and value is `SCALAR_VALUE`)
>> - Replace elements of JSON_PATH with integers from mapping table we
>> constructed earlier
>> - When we have array use `1 / {array_idx}`
>> - Store scalar values in the keys which look like the following (we
>> use `JSON_PATH` with integers).
>> ```
>> {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH}
>> ```
>> - If the scalar value exceeds 100kB we would split it and store every
>> part under key constructed as:
>> ```
>> {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH} / {PART_IDX}
>> ```
>> 
>> Since all parts of the documents are stored under a common
>> `{DB_DOCS_NS} / {DOC_KEY}` they will be stored on the same server
>> most of the time. The document can be retrieved by using range query
>> (`txn.get_range("{DB_DOCS_NS} / {DOC_KEY} / 0", "{DB_DOCS_NS} /
>> {DOC_KEY} / 0xFF")`). We can reconstruct the document since the
>> mapping is returned as well.
>> 
>> The downside of this approach is we wouldn't be able to ensure the
>> same order of keys in the JSON object. Currently the `jiffy` JSON
>> encoder respects order of keys.
>> ```
>> 4> jiffy:encode({[{bbb, 1}, {aaa, 12}]}).
>> <<"{\"bbb\":1,\"aaa\":12}">>
>> 5> jiffy:encode({[{aaa, 12}, {bbb, 1}]}).
>> <<"{\"aaa\":12,\"bbb\":1}">>
>> ```
>> 
>> Best regards,
>> iilyak
>> 
>> On 2019/01/30 13:02:57, Ilya Khlopotov <iil...@apache.org> wrote:
>>> As you might already know the FoundationDB has a number of
>>> limitations which influences the way we might store JSON
>>> documents. The limitations are:
>>> 
>>> |      limitation             |recommended value|recommended
>>> |      max|absolute max|
>>> |-------------------------|----------------------:|--------------------:|--------------:|
>>> | transaction duration  |                              |
>>> |                           |      5 sec      |
>>> | transaction data size |                              |
>>> |                           |      10 Mb     |
>>> | key size                   |                 32 bytes |
>>> |                   1 kB  |     10 kB      |
>>> | value size                |                               |
>>> |                  10 kB |    100 kB     |
>>> 
>>> In order to fit the JSON document into 100kB we would have to
>>> partition it in some way. There are three ways of partitioning the
>>> document
>>> 1. store multiple binary blobs (parts) in different keys
>>> 2. flatten JSON structure and store every path leading to a scalar
>>> value under own key
>>> 3. measure the size of different branches of a tree representing
>>> the JSON document (while we parse) and use another key for the
>>> branch when we about to exceed the limit
>>> 
>>> - The first approach is the simplest but it wouldn't allow us to
>>> access parts of the document.
>>> - The downsides of a second approach are:
>>>  - flattened JSON structure would have long paths which means
>>>  longer keys
>>>  - the scalar value cannot be more than 100kb (unless we split it
>>>  as well)
>>> - Third approach falls short in cases when the structure of the
>>> document doesn't allow a clean cut off branches:
>>>  - complex rules to handle all corner cases
>>> 
>>> The goals of this thread are:
>>> - to collect ideas on how to encode and store the JSON document
>>> - to comment on the collected ideas
>>> 
>>> Non goals:
>>> - the storage of metadata for the document would be discussed
>>> elsewhere
>>>  - thumb stones
>>>  - edit conflicts
>>>  - revisions
>>> 
>>> Best regards,
>>> iilyak
>>> 
>>

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

Reply via email to