For those you wishing to use Nutch 1.9, and its Tika plugin to parse a Json,

You need to change the TikaParser.java, as the current Tika implementation 
won't return a TxtParser for 'application/json' Mime Types. 

So either change the mimeType on TikaParser to 'application/text' before 
calling:
Parser parser = tikaConfig.getParser(MediaType.parse(mimeType));

Or for json's just say:

if (mimeType.equalsIgnoreCase("application/json")) {
        parser = new TXTParser();
}

Build you're plugin and you're done.

Iqbal Shaikh

________________________________________
Sent: 03 September 2014 16:36
To: [email protected]
Subject: Parsing Json

Hi All,

Am using Nutch 1.9 (latest) and trying to parse a json input source but getting:

Error parsing: http://www.jsonip.com: failed(2,0): Can't retrieve Tika parser 
for mime-type application/json

Now my plugin property is:

<property>
  <name>plugin.includes</name>
  
<value>protocol-httpclient|indexer-elastic|protocol-http|urlfilter-regex|parse-(html|tika|metatags)|index-(basic|anchor|metadata)|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
  <description>Regular expression naming plugin directory names to include. 
</description>
</property>

And my regex filter:
# accept anything else
+.

I thought simply specifying Tika as one of the parsers would do the job as Tika 
recognises json mime type.

Thanks in advance.

Iqbal Shaikh

Transform is a trading division of Engine Partners UK LLP, a limited liability 
partnership registered in England & Wales with registered number OC365812. 
Our registered office is at 60 Great Portland Street, London  W1W 7RT, United 
Kingdom. 
A list of our members is open for inspection at our registered office.

Reply via email to