It would be nice to see real benchmarks before we decide it is necessary. The DB is intended to be size limited (~20MB per db per user). I have a dataset of 50k docs that totals 33MB in JSON, and ~60MB in sqlite. I can deserialize all docs in <1s (I'll try to get a more accurate timing) in python.
That doesn't seem particularly slow, and if we are exposing some data over HTTP, it seems like a nice format. John =:-> On Nov 11, 2011 9:51 PM, "Mikkel Kamstrup Erlandsen" < [email protected]> wrote: > On 11/11/2011 04:32 PM, Rodney Dawes wrote: > >> On Fri, 2011-11-11 at 08:52 +0000, Stuart Langridge wrote: >> >>> I think you might be under a bit of a misapprehension here. The thing >>> that you pass to the Python functions as a "doc" is a JSON string. It's >>> not a Python dictionary or some other complex type. Our basic "document" >>> is a string containing a JSON serialisation of the document; it's not an >>> object. >>> >> That is exactly what my complaint is. That 'doc' and every bit of data >> associated with it (id, revision, etc…) must be maintained and passed >> around as separate things. I am suggesting we should have a Document >> class, which contains all of these things in one place. A simple class, >> with properties for all these bits of data, which can be set/get in >> accordance to the conventions of the language for each implementation. >> >> (sorry if I lack context here, I just joined the list and this thread is > not in the archives on > https://lists.launchpad.net/**u1db-discuss/<https://lists.launchpad.net/u1db-discuss/>for > some reason) > > I second Rodney's opinion here. Passing around JSON is *very* inefficient > (not to mention inconvenient). And this is something that I am not making > up, I've spend lots of time profiling apps and libs with exactly this > problem. > > I haven't actually looked at the Python code yet, but I saw the same > behaviour in the C version. > > Having a Document class also separates the wire format from the > programmatic representation which I think is a big plus as well, > architecture wise. > > And while I am ranting - any chance we can use a binary format instead of > JSON, maybe BSON or GVariant, fx? Parsing JSON is super slow compared to > these[1]. Also in in C. Again from bitter experience :-) > > Other than that, let me just use my first mail here to make clear that I > am super hyped about the idea of u1db! Let's make this rock! :-D > > Cheers, > Mikkel > > [1] (ok, you got me, I haven't profiled BSON vs JSON, but I have done it > for GVariant) > > -- > Mailing list: > https://launchpad.net/~u1db-**discuss<https://launchpad.net/~u1db-discuss> > Post to : > [email protected].**net<[email protected]> > Unsubscribe : > https://launchpad.net/~u1db-**discuss<https://launchpad.net/~u1db-discuss> > More help : > https://help.launchpad.net/**ListHelp<https://help.launchpad.net/ListHelp> >
-- Mailing list: https://launchpad.net/~u1db-discuss Post to : [email protected] Unsubscribe : https://launchpad.net/~u1db-discuss More help : https://help.launchpad.net/ListHelp

