On 7/1/11 6:08 PM, Victor Villa wrote: > I'm in the market for an indexer that can index doc, docx, xls, xlsx, > pdf, html and mysql fields. > > the file formats are important, if i have to search in mysql and > aggregate the result set, that's fine too.
You can check out Zend Framework Lucene: http://framework.zend.com/manual/en/zend.search.lucene.html Lucene will handle the document types that you want. Lucene though is a java application, but I think the zend framework does a great job of wrapping that up for you. I know with lucene and sphinx you need to get the metadata into the application in order to do the indexing. You can use sphinx; however, for some of the ms doc types you will have to provide a method of getting the metadata for sphinx. It is not impossible to do; but might take some work on your part as that functionality is not provided out of the box. Let me know how it goes or if you have any questions. I built a help system doing exactly the same type of indexing that you are doing and it turned out really good. I even added pictures and videos to it too. -- thebigdog _______________________________________________ UPHPU mailing list [email protected] http://uphpu.org/mailman/listinfo/uphpu IRC: #uphpu on irc.freenode.net
