Re: Re: The problem of using Cyber Neko HTML Parser parse HTML files

2005-02-17 Thread Jingkang Zhang
ue > 0x00A0) > > > - Original Message - > From: "Jingkang Zhang" <[EMAIL PROTECTED]> > To: > Sent: Friday, February 18, 2005 5:12 PM > Subject: The problem of using Cyber Neko HTML Parser > parse HTML files > > > > When I was usin

Re: The problem of using Cyber Neko HTML Parser parse HTML files

2005-02-17 Thread Jason Polites
This is not an unknown character.. it is a non breaking space (unicode value 0x00A0) - Original Message - From: "Jingkang Zhang" <[EMAIL PROTECTED]> To: Sent: Friday, February 18, 2005 5:12 PM Subject: The problem of using Cyber Neko HTML Parser parse HTML files

The problem of using Cyber Neko HTML Parser parse HTML files

2005-02-17 Thread Jingkang Zhang
When I was using Cyber Neko HTML Parser parse HTML files( created by Microsoft word ), if the file contains HTML built-in entity references(for example:  ) , node value may contain unknown character. Like this: source html: -rw-r--r--    1 root root   50 Jan 21 16:12 _1e.f6 after p