On 9/12/07, Byron Clark <[EMAIL PROTECTED]> wrote: > On Wed, Sep 12, 2007 at 06:32:11PM -0600, Michael Torrie wrote: > > While we're on the subject of SCM in general, does anyone have a > > solution for storing OpenDocument files in SCM? For example, odt, odc, > > etc? Normally these are just treated as binary files, which is kind of > > silly since they are just xml files in a zip file. If the SCM could > > somehow open them, then we could do all kinds of cool diff and patchset > > stuff with the various xml files. Currently since they are treated as > > binary files, SCMs like Subversion commit new copies of the file each > > time, rather than track changes. > > > > Can Git deal with these files? What about a plugin for SVN? > > Here are some tools that can be used with mercurial or git to handle > diffing OpenDocument files: > > http://www-verimag.imag.fr/~moy/opendocument/
Simply converting the documents to plain text for diffing doesn't seem like right solution at all (that's what I gathered they were doing on the above referenced web page). It seems like they are just storing the binary file (compressed xml) in the SCM, which for most of them, wastes a lot of disk space since they don't store only the deltas, but the whole file for each version (actually, from what I understand that's what git does for every file, but it does have an option to "pack" your repository which then stores only deltas). What you really want, as Michael said, is to have the xml (plain text, basically) stored in the revision control system. I suppose you could have an option to have Open Office not compress the files so that the SCM tool would not need any special configuration because they would just be text files like any other source code that they can handle. Of course, if you edit a spreadsheet and then want to see a diff of changes between versions, seeing what changed in the xml probably isn't going to be very helpful. You'd need some external diffing tool that could be fed the xml diff and portray that in some useful manner. The other option would be to have the SCM tool recognize open document formats and do the unzipping themselves in order to be able to store deltas, do diffs, etc. Here, though, you'd have the same diff problem mentioned above. I doubt if any SCM can handle that right now, but I admit I hadn't thought about it. Shouldn't you just be using LaTeX and Emacs instead of OpenOffice? :-) Just kidding!!! Hmmm, there is a spreadsheet mode for Emacs.... No really, I'm kidding. Even I don't go that far. So it seems like you'd need a plugin that would, upon a checkin of an OpenDocument file, unzip it and then feed it to the SCM. It would also intercept a diff and feed the text diff output into some sort of nice GUI OpenDocument diff tool. Most tools also have an annotate (AKA, blame, praise, etc.), which would also need some nice presentation to the user. It's be nice to have a layer that is SCM agnostic that just does the conversions and nice presentation to the user of data, and then you could have version control tool specific plugins/patches that all talk to that. Sounds like a good Summer of Code project for Google, except that summer just ended... Bryan -------------------- BYU Unix Users Group http://uug.byu.edu/ The opinions expressed in this message are the responsibility of their author. They are not endorsed by BYU, the BYU CS Department or BYU-UUG. ___________________________________________________________________ List Info: http://uug.byu.edu/cgi-bin/mailman/listinfo/uug-list
