Hi.

I am trying to import some Common Crawl dataset files into Nutch.
Those files are in Arc file format.
I tried using ArcSegmentCreator tool, but that didn't work well.
It was using up all the heap space. Increasing heap space limit didn't help.

Does anyone have any thoughts on this?
Is there a better way to import Common Crawl files?
Why does ArcSegmentCreator have issues?

Thanks,
Nenad

Reply via email to