Re: Long-term archiving of electronic text documents

2013-01-28 Thread Stephan Stiller
One detail to add: Some archive file formats also add redundancy for error correction. I agree that error correction is in principle best left to storage media and transmission protocols (for a clean separation of functionality), but the idea to have error correction tailored to (ie: optimized

Re: Long-term archiving of electronic text documents

2013-01-28 Thread Mark E. Shoulson
On 01/28/2013 07:30 AM, William_J_G Overington wrote: A document saved as UTF-64 may well take four times as many bytes as such a Unicode Text Document, yet there would be the error checking and correction facilities at a character level. It seems to me that the character-encoding level is the

Re: Long-term archiving of electronic text documents

2013-01-28 Thread Jim Breen
William_J_G Overington wrote: > The idea is that there would be an additional UTF format, perhaps UTF-64, > so that each character would be expressed in UTF-64 notation using 64 bits, > thus providing error checking and correction facilities at a character level. Error detection and correction a

RE: Long-term archiving of electronic text documents

2013-01-28 Thread Shawn Steele
> UTF-256 allows each hex digit of UTF-32 to be expressed as an ASCII hex digit > (characters 0-9 and A-F encoded as bytes 0x30-0x39 and 0x41-0x46). In my experience, I lose an entire block of a disk, or track, or drive, so redundancy at the character level isn’t likely to be very helpful, you’d

Re: Long-term archiving of electronic text documents

2013-01-28 Thread Clive Hohberger
Using UTF64 with 48 bits of Reed-Solomon error correction (RSEC) on a single UTF-16 data codeword would allow you to recover 24 data or EC bits. Remember that the EC bits, being in the same codeword, are just as likely to be damaged as the data. Ottos' comment is more practical. You have 11 unused

Re: Long-term archiving of electronic text documents

2013-01-28 Thread Otto Stolz
Hello, am 28.01.2013 schrieb William_J_G Overington: The idea is that there would be an additional UTF format, perhaps UTF-64, so that each character would be expressed in UTF-64 notation using 64 bits, thus providing error checking and correction facilities at a character level. We have alrea

Re: Long-term archiving of electronic text documents

2013-01-28 Thread James Cloos
> "WJGO" == William J G Overington writes: WJGO> I was thinking about the problems of the long-term archiving of WJGO> electronic text documents and thought of an idea. I wonder if I WJGO> may please mention the idea here in the hope of there being a WJGO> discussion so that an assessment of

Re: Long-term archiving of electronic text documents

2013-01-28 Thread Asmus Freytag
On 1/28/2013 4:30 AM, William_J_G Overington wrote: The idea is that there would be an additional UTF format, perhaps UTF-64, so that each character would be expressed in UTF-64 notation using 64 bits, thus providing error checking and correction facilities at a character level. I think this

Re: Long-term archiving of electronic text documents

2013-01-28 Thread Alka Irani
I would love to have such a facility because it is too much hassle to write bilingual/trilingual documentswhich is often the case at least in Indian environment. On Jan 28, 2013 6:17 PM, "William_J_G Overington" wrote: > I was thinking about the problems of the long-term archiving of electro

Re: Long-term archiving of electronic text documents

2013-01-28 Thread Asmus Freytag
On 1/28/2013 5:12 AM, Martinho Fernandes wrote: Similarly, there could be a type of pdf document where the text within the pdf document were stored in UTF-64 format. >> FWIW, there is already a PDF variant designed for long-term archiving known as PDF/A. You may want to look into that. Goo

Re: Long-term archiving of electronic text documents

2013-01-28 Thread Asmus Freytag
On 1/28/2013 5:12 AM, Martinho Fernandes wrote: Similarly, there could be a type of pdf document where the text within the pdf document were stored in UTF-64 format. FWIW, there is already a PDF variant designed for long-term archiving known as PDF/A. You may want to look into that. Good po

COMBINING ABBREVIATION MARK SUPERSCRIPT UR TILDE FORM in MUFI 3.0

2013-01-28 Thread Andrew Miller
The MUFI 3.0 specification states that codepoint U+F1C3 COMBINING ABBREVIATION MARK SUPERSCRIPT UR TILDE FORM has been assigned to U+1DD1 COMBINING UR ABOVE, and the box has been shaded yellow which indicates that the codepoint has been decommissioned. However the glyph for U+1DD1 in the Unicode C

Re: Long-term archiving of electronic text documents

2013-01-28 Thread Martinho Fernandes
> Similarly, there could be a type of pdf document where the text within the > pdf document were stored in UTF-64 format. FWIW, there is already a PDF variant designed for long-term archiving known as PDF/A. You may want to look into that. Mit freundlichen Grüßen, Martinho

Long-term archiving of electronic text documents

2013-01-28 Thread William_J_G Overington
I was thinking about the problems of the long-term archiving of electronic text documents and thought of an idea. I wonder if I may please mention the idea here in the hope of there being a discussion so that an assessment of whether the idea is worth developing can be made. The idea is that t