Re: about language extraction for zip documents

Mattmann, Chris A (3980) Sun, 31 May 2015 11:51:52 -0700

+1, agreed.

This would be a welcomed addition.


Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++




-----Original Message-----
From: Lewis John Mcgibbney <lewis.mcgibb...@gmail.com>
Reply-To: "user@nutch.apache.org" <user@nutch.apache.org>
Date: Sunday, May 31, 2015 at 11:49 AM
To: "user@nutch.apache.org" <user@nutch.apache.org>
Subject: Re: about language extraction for zip documents

>Hi,
>
>On Sun, May 31, 2015 at 12:30 AM, <user-digest-h...@nutch.apache.org>
>wrote:
>
>>
>>
>> Hi comunity.
>> Im using nutch 1.9 and solr 4.10.
>> I use nutch for parse zip documents, but the field language is empty in
>> solr for all of this documents and this is a problem for me.
>> ParseZip plugin use tika to detect mimetype and to extract content of
>> files but language is missing.
>> I was thinking that if the package has 3 documents so the language could
>> be a multivalued field and contain all language from the documents
>>inside.
>> What you think about this topic?
>>
>
>Please open a Jira issue and if possible attach a patch for the
>functionality. It think it would be a nice addition to the parse-zip
>plugin
>and to me makes good sense.
>Thanks
>Lewis

Re: about language extraction for zip documents

Reply via email to