Hi,
I 've got a lot of files which I need to proces in order to make them
indexable by sphinx.
The files contain the data of a website with a custom perl based cms.
Unfortunatly they sometimes contain xml/html tags like i
And since most of the texts are in dutch and some are in French they also
At Tue, 5 Aug 2008 23:21:43 +0200,
Pieter Laeremans wrote:
And is there some haskell function which converts special tokens lik -
amp; and é - egu; ?
By default, xml only has 5 predefined entities: quot, amp, apos, lt,
and gt. Any additional ones are defined in the DTD.
But you can *always*
Hi Pieter,
2008/8/5 Pieter Laeremans [EMAIL PROTECTED]:
But the sphinx indexer complains that the xml isn't valid. When I look at
the errors this seems due to some documents containing not well formed
html.
If you need to cope with non-well-formed HTML, try HTML Tidy: