Hello again.
My suggestion is not however to change things but to see if they need to > be changed. I experimented with PlantUML for this and ended up with the > class diagram attached to this email (with is PlantUML sources). It's > more detailed than Philipp's one as I tried to include attributes that I > considered strongly attached to each entity (each entity would be a > table in a relational db). It is also quite simpler because it removes > all the caches and redundant parts. > That is indeed the minimal (logical) diagram, and much cleaner -- eliminating the explicit associations to DatabaseVersions and missing the DatabaseVersionHeader and all the caching ... One minor adjustment: The association between MultiChunkEntry and ChunkEntry is a n:m (that's also wrong in my diagram), because it can happen that two clients index the same file at the same time, thereby adding the same chunks to a different multichunk. (...) but I like being Captain Obvious. > Well, Captain Obvious, ... It's only obvious after you've someone points out the elephant in the room. Related :-) : http://www.youtube.com/watch?v=Ahg6qcgoay4 (...) and of one major issue: there is no simple way to query > this data model in order to get the current state of the repository. > Indeed, you need to reconstruct the winning series of commits (i.e., of > DatabaseVersion) and to walk said series to determine the current status > of all files. This is mainly caused by DatabaseVersion being delta and > not complete commits. In the current code this is handled by a full > database cache which induces more or less a full duplication of the > database. > Like most times, this is EXACTLY what's wrong with the current database representation -- and one of the reasons why I thought a SQL backend would be a lot simpler to handle. In my naive mind, I thought that storing the clean data model (as per your diagram) in the SQL database, and query a current state when needed. Something in the sense of select f.* from databaseversions dbv join filehistories fh on ... join fileversions fv on ... where dbv.date < '2013-12-12 18:10:00' and fv.version = (select max(version) from fileversions fv2 where fv.id= fv2.id) In fact, that's what I was trying to do when I stated the "Data model" wiki page, trying to identify which database views we need and derive the SELECT statements ... > I'm not sure how this should be represented in a persistent state but based on what is done in most of the version control systems I > know, I think we need a CurrentDatabase entity which aggregates one > FileVersion (the current one) for each path of repository. > Do you want to persist the CurrentDatabase, or just "create" it on the fly? If it's the former, I don't know about this. I'm not saying the other version control systems are wrong, but having a single CurrentDatabase (representing the last state) is not sufficient, because we need to be able to go back in time for the restore operation (anywhere else?) >From this on disk/in relational db data model, each operation can derive > what it needs in memory, based on some specific DAO if needed. > Can you elaborate on that? Does that mean having a RestoreDAO that implements specific queries (such as the one above)? What do you think of all that? > Good stuff Fabrice, as always!! This is so incredibly helpful! As a quick recap, here is what I understand is wrong with the current data model (or better: its representation in code): - Explicit relationship between DatabaseVersion and lists of ChunkEntry, MultiChunkEntry, etc. should not be explicit - No easy view on the current/latest database - Minor: no optimal (not normalized?) representation of the file version attributes (I don't see an issue here) - Anything else? Best Philipp
-- Mailing list: https://launchpad.net/~syncany-team Post to : [email protected] Unsubscribe : https://launchpad.net/~syncany-team More help : https://help.launchpad.net/ListHelp

