On 9/12/07, Byron Clark <[EMAIL PROTECTED]> wrote:
> On Wed, Sep 12, 2007 at 06:32:11PM -0600, Michael Torrie wrote:
> > While we're on the subject of SCM in general, does anyone have a
> > solution for storing OpenDocument files in SCM?  For example, odt, odc,
> > etc?  Normally these are just treated as binary files, which is kind of
> > silly since they are just xml files in a zip file.  If the SCM could
> > somehow open them, then we could do all kinds of cool diff and patchset
> > stuff with the various xml files.  Currently since they are treated as
> > binary files, SCMs like Subversion commit new copies of the file each
> > time, rather than track changes.
> >
> > Can Git deal with these files?  What about a plugin for SVN?
>
> Here are some tools that can be used with mercurial or git to handle
> diffing OpenDocument files:
>
> http://www-verimag.imag.fr/~moy/opendocument/

Simply converting the documents to plain text for diffing doesn't seem
like right solution at all (that's what I gathered they were doing on
the above referenced web page).  It seems like they are just storing
the binary file (compressed xml) in the SCM, which for most of them,
wastes a lot of disk space since they don't store only the deltas, but
the whole file for each version (actually, from what I understand
that's what git does for every file, but it does have an option to
"pack" your repository which then stores only deltas).

What you really want, as Michael said, is to have the xml (plain text,
basically) stored in the revision control system.  I suppose you could
have an option to have Open Office not compress the files so that the
SCM tool would not need any special configuration because they would
just be text files like any other source code that they can handle.
Of course, if you edit a spreadsheet and then want to see a diff of
changes between versions, seeing what changed in the xml probably
isn't going to be very helpful.  You'd need some external diffing tool
that could be fed the xml diff and portray that in some useful manner.

The other option would be to have the SCM tool recognize open document
formats and do the unzipping themselves in order to be able to store
deltas, do diffs, etc.  Here, though, you'd have the same diff problem
mentioned above.  I doubt if any SCM can handle that right now, but I
admit I hadn't thought about it.  Shouldn't you just be using LaTeX
and Emacs instead of OpenOffice?  :-)  Just kidding!!!  Hmmm, there is
a spreadsheet mode for Emacs....  No really, I'm kidding.  Even I
don't go that far.

So it seems like you'd need a plugin that would, upon a checkin of an
OpenDocument file, unzip it and then feed it to the SCM.  It would
also intercept a diff and feed the text diff output into some sort of
nice GUI OpenDocument diff tool.  Most tools also have an annotate
(AKA, blame, praise, etc.), which would also need some nice
presentation to the user.  It's be nice to have a layer that is SCM
agnostic that just does the conversions and nice presentation to the
user of data, and then you could have version control tool specific
plugins/patches that all talk to that.

Sounds like a good Summer of Code project for Google, except that
summer just ended...

Bryan
--------------------
BYU Unix Users Group 
http://uug.byu.edu/ 

The opinions expressed in this message are the responsibility of their
author.  They are not endorsed by BYU, the BYU CS Department or BYU-UUG. 
___________________________________________________________________
List Info: http://uug.byu.edu/cgi-bin/mailman/listinfo/uug-list

Reply via email to