Re: [fw-general] Interesting Zend_Search additions

2007-01-27 Thread Simon Mundy
Hi Alexander I'm a wee bit lazy so I just run all my HTML text through Tidy (added as a PHP extension) and it's a consistent base to start from. I realise this isn't going to be possible in all environments but it may be a good idea to check if it exists and 'sanitise' the HTML input with

Re: [fw-general] Interesting Zend_Search additions

2007-01-26 Thread Alexander Veremyev
Hi Simon, There was no HTML documents parsing/indexing capability in Zend_Search up to now. But it's most common format for Internet :) It's experimental now, so it's not documented and I didn't make any announcement :) I consider what should be used for this. 1) Pure PHP parser gives possi

[fw-general] Interesting Zend_Search additions

2007-01-25 Thread Simon Mundy
Hi Alexander Just noticed a new HTML document component in Zend_Search. Is this the start of the killer ZF-powered spider? :) Would be very keen to know how you intended to use it as I've implemented a spider of sorts that can parse HTML and PDF files but is probably a little limited in i