"proper encoding and lack of available metadata"
am i correct in assuming that you mean the correct value for the encoding
attribute?
what metadata are you referring to?  do you mean things like a DOCTYPE
element or a schema?
or do you mean something more like "last modified time"?

dave


-----Original Message-----
From: Stefano Mazzocchi [mailto:[EMAIL PROTECTED]
Sent: Thursday, May 09, 2002 5:11 AM
To: [EMAIL PROTECTED]
Subject: Re: utf-8 working code... caution with existing data files.


James Bates wrote:
>
> Boys (and girls?),
>
> I have a patch for the current Xindice CVS that supports reading/writing
files in UTF-8 containing any Unicode characters you want into and out of
Xindice. This means Greek, Hebrew, Korean, Chinese, Arabic, Russian, etc...
To allow for this, I have had to modify the internal data format of Xindice
files, meaning that existing Xindice databases will appear corrupt to
Xindice patched with this new code...
>
> It is however necessary in my opinion, as discussed in earlier posts, to
migrate toward this.
>
> In reality, this will affect ONLY databases that contain XML documents
with NON-ASCII characters. ASCII characters are: English letters, Digits,
punctuation marks,  Whitespaces, as well as some control characters like
delete and backspace. There are 128 ASCII characters in all. So as long as
you have databases using documents with only these characters, the patch
won't affect your datafiles.
>
> Typical non-ASCII characters, which will cause incompatibilities between
old and new database files include: french, spanish etc... accented
characters, such as �, �, �, �; currencies like �, EUR, �, non-breakable
spaces (  in HTML), fancy quotes �, �, copyright sign �, etc...
>
> Because of these possible incompatiblities, I'd like to WARN people and
try and co-ordinate applying them so as to cause as little disruption as
possible. You can check them out already at
> http://lambiek.amplexor.be/downloads/xindice/new-utf8-patch.
>
> I don't believe you NEED to use the XML-RPC client for just
reading/writing documents, though I haven't really tested the CORBA client
anymore... Using XPaths and XUpdates with non-ASCII characers will
definately not work in CORBA, but should now work with XML-RPC interface.
(Need to test some more myself though).
>
> Anyway, let me know how and when I can commit this patch...
>
> James

+1 for committing as early as possible.

The trick would be a way to write a client that serializes the entire
database into a big XML file and another one in the new version that
allows import thru this XML dump file (which can use namespaces to
indicate xindice-specific data along the tree).

What do you think?

[NOTE: XIndice is totally useless to me today exactly because of proper
encoding and lack of available metadata... and I've met tons of people
that believe the exact same, so I'd suggest to patch these two things
then do a 1.1 release ASAP... this is very likely the reason why this
community is stagnating, so this might be a good thing to patch]

I volunteer to work on the metadata since I badly need it in the future.
Just don't know how to do it and I think the XML:DB API are slowing us
down rather than helping us in any way.

Comments?

--
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<[EMAIL PROTECTED]>                             Friedrich Nietzsche
--------------------------------------------------------------------



Reply via email to