Firstly, thanks to the few of you who have already responded to me on
the "UTF-8 or UCS-2 ?" topic. I fear that I did not give sufficient
details.
Well my problem is this:
Currently, I have got a thing called a "raw message". It is a stream of
data that may contain any number of EDI documents. A program bursts
this bundle of EDI documents and distributes it to the recipient. It's
pretty easy to programatically determine where in the stream of data
that one EDI message starts and ends.
Now I am told we want to allow this bundle to also contain XML
documents. So, the immediate question in my mind is How can I determine
when I am at the beginning of an XML document? The problem is
compounded by the fact that the XML document might be encoded UCS-2.
I don't think I can just look for the string "<?XML" for two reasons.
First, the XML doocument might start with comments and second, the XML
document might start with ".<.?.X.M.L" (where . = null) if using UCS-2
encoding. Furthermore I am told that the REAL first character might be
a byte-order marker like (0xFEFF). I suppose that given a different
byte order I might actually be looking at "<?.X.M.L." as my first
characters.
My program needs to run on a Tandem machine (ASCII) and the document
are delivered to this machine via 12 different protocols. OFTP, FTP,
STREAM X.25, STREAM BSC, SMTP, POP3, STREAM IP, SNA, SNA IP, ZMODEM,
and HTTP.
Once I find an XML document I need to extract the Sender and Receiver
and some other data.
If I have to write some code to deal with all the contengencies, I can
do that, but my question to the group is, Am I missing something? Are
any of my assumptions wrong? Is there a better way to do this?
A fellow programmer wants to convert the XML to ASCII with some sort of
shift-in/Shift-out escape byte to indicate when multibyte characters
are in the stream. (The object being that all down stream process can
then process the file as though it were ASCII and not be concerned with
UCS-2 formatting.) But I don't like this idea because I can't believe
that there is not some standard way to handle this sort of thing. I
hate to write some home grown solution when there might be an industry
standard that will do the job.
Also, does anybody know of a web page that addresses unicode support in
the various protocols I have listed. For example: What does FTP do to a
NT Unicode file when being transported to an 8-bit machine? Or what
happens when transporting Unicode from a Big Endian to Little Endian
machine?
Comments and advice are solicited.
Thanks,
Mike
=====
("\''/").__..-''"`-. .
`9_ 9 ) `-. ( ).`-._.`) Meow!
(_Y_.)' ._ ) `._`. " -.-'
_..`-'_..-_/ /-'_.'
(l)-'' ((i).' ((!.'
__________________________________________________
Do You Yahoo!?
Talk to your friends online with Yahoo! Messenger.
http://im.yahoo.com
==========================================
XML/EDI Group members-only discussion list
Homepage = http://www.xmledi.com
Brought to you by: Online Technologies Corporation
Home of BizServe - www.bizserve.com
TO UNSUBSCRIBE: Send email to <[EMAIL PROTECTED]>
Leave the subject blank, and
In the body of the message, enter ONLY: unsubscribe
Questions/requests should be sent to: [EMAIL PROTECTED]
To join the XML/EDI Group complete the form located at:
http://www.geocities.com/WallStreet/Floor/5815/mail1.htm