--- On Tue, 13/7/10, AJ Chen <ajc...@web2express.org> wrote:


From: AJ Chen <ajc...@web2express.org>
Subject: Re: parse step hangs
To: "nutch-user" <nutch-u...@lucene.apache.org>
Date: Tuesday, 13 July, 2010, 4:27 AM


I set mime.type.magic=false, parsed the segment again. the parser got hung
up at the same place. maybe tika is trapped into a endless loop after seeing
mime-type application/x-sh.  is there a way to configure tika to skip
mime-type application/x-sh?
thanks,
-aj

On Mon, Jul 12, 2010 at 3:36 PM, AJ Chen <ajc...@web2express.org> wrote:

> there is another thread reporting hanging during tika parsing. I'm seeing
> similar problem now. not sure the cause is the same or not, but what to show
> the message at the point of hanging.
> 2010-07-12 14:36:33,645 ERROR tika.TikaParser - Can't retrieve Tika parser
> for mime-type application/x-sh
> 2010-07-12 14:36:33,645 WARN  parse.Parser - Error parsing:
> http://rsb.info.nih.gov/ij/download/linux/unix-script.txt: failed(2,0):
> Can't retrieve Tika parser for mime-type application/x-sh
> 2010-07-12 14:36:33,650 INFO  parse.ParserFactory - The parsing plugins:
> [org.apache.nutch.parse.tika.Parser -
> org.apache.nutch.parse.text.TextParser] are enabled via the plugin.includes
> system property, and all claim to support the content type text/plain, but
> they are not mapped to it  in the parse-plugins.xml file
>
> my setting:
> mime.type.magic=true
> plugin.includes=...parse-(text|html|js|tika)...
>
> any idea?
> thanks,
> --
> AJ Chen, PhD
> Chair, Semantic Web SIG, sdforum.org
> http://web2express.org
> twitter @web2express
> Palo Alto, CA, USA
>



-- 
AJ Chen, PhD
Chair, Semantic Web SIG, sdforum.org
http://web2express.org
twitter @web2express
Palo Alto, CA, USA


Reply via email to