Removing non-xhtml tags from a string ???

2004-04-28 Thread Marcin Okraszewski
Hi, I'm building a webapp which allows to enter XHTML via forms. The problem is that I the entered XHTML *must* be valid XML. I use JTidy to correct any errors that may accure. So far so good. But it turns out that if JTidy gets some tag, that it doesn't know, it simply returns empty string :-(

Re: Removing non-xhtml tags from a string ???

2004-04-28 Thread Joerg Heinicke
On 28.04.2004 17:20, Marcin Okraszewski wrote: Hi, I'm building a webapp which allows to enter XHTML via forms. The problem is that I the entered XHTML *must* be valid XML. I use JTidy to correct any errors that may accure. So far so good. But it turns out that if JTidy gets some tag, that it d

Re: Removing non-xhtml tags from a string ???

2004-04-29 Thread Ugo Cei
Il giorno 28/apr/04, alle 17:20, Marcin Okraszewski ha scritto: Hi, I'm building a webapp which allows to enter XHTML via forms. The problem is that I the entered XHTML *must* be valid XML. I use JTidy to correct any errors that may accure. So far so good. But it turns out that if JTidy gets so

Re: Removing non-xhtml tags from a string ???

2004-04-29 Thread Marcin Okraszewski
The empty string points probably to a thrown exception, doesn't it? Maybe you should first look for jTidy options for getting the error message to see, ignoring errors or even remove unknown tags. On a first sight at http://www.w3.org/People/Raggett/tidy/ I found an option "word-2000: bool" fo

Re: Removing non-xhtml tags from a string ???

2004-04-29 Thread Marcin Okraszewski
I don't know if it would really help, but you might try using CyberNeko [1] instead of JTidy. I've found it gives better results on average, particularly when dealing with [so-called] HTML pasted from Word. Ugo [1] http://www.apache.org/~andyc/neko/doc/html/ I must admit, that CyberNeko loo

Re: Removing non-xhtml tags from a string ???

2004-04-29 Thread Peter Velychko
I use nekoHTML for parsing HTML and building DOM from HTML input instead of JTidy for about six months. It allows to set a chain of filters which are performed on the document after parsing. One of the filters is the filter "ElementRemover" that removes from document or keeps elements specified.