Maybe with processGzippedXML() from Crawler-Commons? Is this possible?

Thanks,

Michael


On 08/01/2017 05:21 PM, Michael Chen wrote:
Dear all,

I was trying to parse .xml.gz sitemaps with Nutch 2.x, but couldn't build the parse-zip plugin. parse-ext, parse-swf and feed also failed to build. It seems to be a known issue (NUTCH-874) and is marked for version 2.5.

Is there a workaround to parse gunzipped files? Is the porting of these plugins under active development?

Thank you!

Michael


Reply via email to