Re: Parsing HTML entities

2007-09-17 Thread Tobia Conforto
Andrew Stevens wrote: > Tobia Conforto writes: > > I cannot change this data source component, therefore I need a > > transformer to examine every text node in the stream, split it at the > > fake "" tags, substitute them with elements, and > > replace every escaped HTML entity with the relevant U

Re: Parsing HTML entities

2007-08-31 Thread Tobia Conforto
Never mind, I solved it "by hand" I wrote a Python script that takes a list of HTML entities and generates a huge tree of switch() { case: switch () { case: switch () { case: ... The generated Java code goes through a char[] in a single pass and when it recognizes an entity it pushes the associat

RE: Parsing HTML entities

2007-08-31 Thread Andrew Stevens
Oh, for crying out loud. Even after switching to plain text Hotmail still strips out my included XML :-( Let's try again - replace the square brackets below with the appropriate less-than and greater-than symbols. > From: [EMAIL PROTECTED] > Date: Fri, 31 Aug 2007 14:06:59 + > > Tobia Conf

RE: Parsing HTML entities

2007-08-31 Thread Andrew Stevens
> From: [EMAIL PROTECTED] > Date: Fri, 31 Aug 2007 14:06:59 + > > Tobia Conforto linux.it> writes: > >> I have a data source from which I get SAX text nodes into my pipeline >> that contain escaped HTML entities and tags. In Java syntax: >> >> "Lorem ipsum — dolor sit amet. Consectetuer"

Re: Parsing HTML entities

2007-08-31 Thread Joerg Heinicke
Tobia Conforto linux.it> writes: > I have a data source from which I get SAX text nodes into my pipeline > that contain escaped HTML entities and tags. In Java syntax: > > "Lorem ipsum — dolor sit amet. Consectetuer" > > or, in XML syntax: > > Lorem ipsum — dolor sit amet.
Consec