On 16.02.11 16:39, "Günther Schmidt" <gue.schm...@web.de> wrote: >how efficiently is versioning implemented? >Is it similar to copy-on-write, ie. a new version of a document only >consists of deltas to the previous version? Or is every version of a >document full-sized? I presume in RDBMS backends it would always be >full-sized versions, but what about file-based repositories?
In all cases the entire binary is written. The persistence manager, where you can chose between a RDBMS or other backend, doesn't know about versioning, it's all just simple node bundles on this level. However, for binaries there is the (generally recommended) DataStore [0] that will store large binaries separately, directly as files. Binaries will only get stored once if you have multiple copies of them in the repository, using a hash of its contents. Thus if you create a new version of a node with a binary property, but only change other properties, not the binary, the binary will not be stored twice. But if you change the binary, the full binary will be stored. There is no diffing for versions. (To be exact, you actually write to the normal repository location ("HEAD") first, then save and only then create the version, which means creating a version is an internal copy). Regarding efficiency, it depends what efficiency you mean: read/write performance or space usage? The current implementation is an optimization towards read (and partly write) performance - with the cost of requiring more disk space. Reading binaries, even from older versions, is simply a direct I/O stream from the disk, without any conversions or diff calculation. Similar for writes, albeit you have a small overhead through the hash calculation here, compared to reads. [0] http://wiki.apache.org/jackrabbit/DataStore Regards, Alex -- Alexander Klimetschek Developer // Adobe (Day) // Berlin - Basel