+1, agreed. This would be a welcomed addition.
Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -----Original Message----- From: Lewis John Mcgibbney <lewis.mcgibb...@gmail.com> Reply-To: "user@nutch.apache.org" <user@nutch.apache.org> Date: Sunday, May 31, 2015 at 11:49 AM To: "user@nutch.apache.org" <user@nutch.apache.org> Subject: Re: about language extraction for zip documents >Hi, > >On Sun, May 31, 2015 at 12:30 AM, <user-digest-h...@nutch.apache.org> >wrote: > >> >> >> Hi comunity. >> Im using nutch 1.9 and solr 4.10. >> I use nutch for parse zip documents, but the field language is empty in >> solr for all of this documents and this is a problem for me. >> ParseZip plugin use tika to detect mimetype and to extract content of >> files but language is missing. >> I was thinking that if the package has 3 documents so the language could >> be a multivalued field and contain all language from the documents >>inside. >> What you think about this topic? >> > >Please open a Jira issue and if possible attach a patch for the >functionality. It think it would be a nice addition to the parse-zip >plugin >and to me makes good sense. >Thanks >Lewis