"proper encoding and lack of available metadata" am i correct in assuming that you mean the correct value for the encoding attribute? what metadata are you referring to? do you mean things like a DOCTYPE element or a schema? or do you mean something more like "last modified time"?
dave -----Original Message----- From: Stefano Mazzocchi [mailto:[EMAIL PROTECTED] Sent: Thursday, May 09, 2002 5:11 AM To: [EMAIL PROTECTED] Subject: Re: utf-8 working code... caution with existing data files. James Bates wrote: > > Boys (and girls?), > > I have a patch for the current Xindice CVS that supports reading/writing files in UTF-8 containing any Unicode characters you want into and out of Xindice. This means Greek, Hebrew, Korean, Chinese, Arabic, Russian, etc... To allow for this, I have had to modify the internal data format of Xindice files, meaning that existing Xindice databases will appear corrupt to Xindice patched with this new code... > > It is however necessary in my opinion, as discussed in earlier posts, to migrate toward this. > > In reality, this will affect ONLY databases that contain XML documents with NON-ASCII characters. ASCII characters are: English letters, Digits, punctuation marks, Whitespaces, as well as some control characters like delete and backspace. There are 128 ASCII characters in all. So as long as you have databases using documents with only these characters, the patch won't affect your datafiles. > > Typical non-ASCII characters, which will cause incompatibilities between old and new database files include: french, spanish etc... accented characters, such as �, �, �, �; currencies like �, EUR, �, non-breakable spaces ( in HTML), fancy quotes �, �, copyright sign �, etc... > > Because of these possible incompatiblities, I'd like to WARN people and try and co-ordinate applying them so as to cause as little disruption as possible. You can check them out already at > http://lambiek.amplexor.be/downloads/xindice/new-utf8-patch. > > I don't believe you NEED to use the XML-RPC client for just reading/writing documents, though I haven't really tested the CORBA client anymore... Using XPaths and XUpdates with non-ASCII characers will definately not work in CORBA, but should now work with XML-RPC interface. (Need to test some more myself though). > > Anyway, let me know how and when I can commit this patch... > > James +1 for committing as early as possible. The trick would be a way to write a client that serializes the entire database into a big XML file and another one in the new version that allows import thru this XML dump file (which can use namespaces to indicate xindice-specific data along the tree). What do you think? [NOTE: XIndice is totally useless to me today exactly because of proper encoding and lack of available metadata... and I've met tons of people that believe the exact same, so I'd suggest to patch these two things then do a 1.1 release ASAP... this is very likely the reason why this community is stagnating, so this might be a good thing to patch] I volunteer to work on the metadata since I badly need it in the future. Just don't know how to do it and I think the XML:DB API are slowing us down rather than helping us in any way. Comments? -- Stefano Mazzocchi One must still have chaos in oneself to be able to give birth to a dancing star. <[EMAIL PROTECTED]> Friedrich Nietzsche --------------------------------------------------------------------
