I'd attached a file to the previous mail. Is there any filter for pdf files or any other reason.
On Tue, Jan 5, 2010 at 12:49 PM, Zacarias <zacar...@linebee.com> wrote: > Here is my propousal > > Regards > > > > > On Tue, Jan 5, 2010 at 12:48 PM, Zacarias <zacar...@linebee.com> wrote: > >> Hi, I'm developing a directory monitor to add in a Sor implementation. >> Tell me if it could be interesting for you we will be glad to share it >> with the comunity. Also I would like your opinion about the propousal if it >> looks ok for you and if you like to make any change or question it will be >> very well welcome. >> >> Regards >> Zacarias >> www.linebee.com >> >> >> 2009/12/8 Noble Paul നോബിള് नोब्ळ् <noble.p...@corp.aol.com> >> >> I was refering to SOLR-1358. Anyway , SolrCell as an updateprocessor >>> is a good idea >>> >>> On Tue, Dec 8, 2009 at 4:47 PM, Grant Ingersoll <gsing...@apache.org> >>> wrote: >>> > >>> > On Dec 8, 2009, at 12:22 AM, Noble Paul നോബിള് नोब्ळ् wrote: >>> > >>> >> Integrating Extraction w/ DIH is a better option. DIH makes it easier >>> >> to do the mapping of fields etc. >>> > >>> > Which comment is this directed at? I'm lacking context here. >>> > >>> >> >>> >> >>> >> On Tue, Dec 8, 2009 at 4:59 AM, Grant Ingersoll <gsing...@apache.org> >>> wrote: >>> >>> >>> >>> On Dec 7, 2009, at 3:51 PM, Chris Hostetter wrote: >>> >>> >>> >>>> >>> >>>> ASs someone with very little knowledge of Solr Cell and/or Tika, I >>> find myself wondering if ExtractingRequestHandler would make more sense as >>> an extractingUpdateProcessor -- where it could be configured to take take >>> either binary fields (or string fields containing URLs) out of the >>> Documents, parse them with tika, and add the various XPath matching hunks of >>> text back into the document as new fields. >>> >>>> >>> >>>> Then ExtractingRequestHandler just becomes a handler that slurps up >>> it's ContentStreams and adds them as binary data fields and adds the other >>> literal params as fields. >>> >>>> >>> >>>> Wouldn't that make things like SOLR-1358, and using Tika with >>> URLs/filepaths in XML and CSV based updates fairly trivial? >>> >>> >>> >>> It probably could, but am not sure how it works in a processor chain. >>> However, I'm not sure I understand how they work all that much either. I >>> also plan on adding, BTW, a SolrJ client for Tika that does the extraction >>> on the client. In many cases, the ExtrReqHandler is really only designed >>> for lighter weight extraction cases, as one would simply not want to send >>> that much rich content over the wire. >>> >> >>> >> >>> >> >>> >> -- >>> >> ----------------------------------------------------- >>> >> Noble Paul | Systems Architect| AOL | http://aol.com >>> > >>> > -------------------------- >>> > Grant Ingersoll >>> > http://www.lucidimagination.com/ >>> > >>> > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) >>> using Solr/Lucene: >>> > http://www.lucidimagination.com/search >>> > >>> > >>> >>> >>> >>> -- >>> ----------------------------------------------------- >>> Noble Paul | Systems Architect| AOL | http://aol.com >>> >> >> >