Re: Unable to fetch data from segment folder

Lewis John McGibbney Tue, 11 Jan 2022 21:29:09 -0800

I created  https://issues.apache.org/jira/browse/NUTCH-2931 to track all of 
this work.
If you are interested in working on any of this it would be great to 
collaborate.
There is much more we can do over and above the few tickets I created.
lewismc


On 2021/12/24 10:07:20 sw.l...@quandatics.com wrote:
> Hi, 
> 
>  
> 
> We are currently facing a problem when using NUTCH Rest API. We try to run
> Nutch API through Postman and It works perfectly fine if we don't define the
> segment pathway. This is the command we run in Postman.
> 
>  
> 
> Inject
> 
>  
> 
> {
> 
> "type":"INJECT",
> 
>     "confId":"default",
> 
>     "crawlId":"crawl01",
> 
>     "args": {"url_dir":"/opt/apache-nutch-1.18/runtime/local/urls/seed.txt",
> 
>               "crawldb": "/tmp/crawl/crawldb"
> 
>     }
> 
> }
> 
>  
> 
> Generate
> 
>  
> 
> {
> 
> "type":"GENERATE",
> 
>     "confId":"default",
> 
>     "crawlId":"crawl01",
> 
>     "args": {    "crawldb": "/tmp/crawl/crawldb",
> 
>                 "segment_dir": "/tmp/crawl/segments"
> 
>                }
> 
> }
> 
>  
> 
> Fetch 
> 
>  
> 
> {
> 
> "type":"FETCH",
> 
>     "confId":"default",
> 
>     "crawlId":"crawl01",
> 
>     "args": {"segment": "/tmp/crawl/segments"}
> 
> }
> 
>  
> 
> We try to define the pathway to store the crawled data in a specific
> directory. However, when come to fetch part, it cannot retrieve data from a
> specific folder (folder name that is generated by current date and time)
> under the segments folder. We have tried /tmp/crawl/segments/* and it can
> successfully retrieve the data, but it will also generate a new folder
> called *. 
> 
>  
> 
> Therefore, may we know if there is any way that could define the folder name
> in segments folder or is it got other way to change the output directory?
> 
>  
> 
> Attached is our log for your reference. Kindly advise. Thanks in advance.
> 
>  
> 
> Best Regards,
> 
> Shi Wei
> 
>  
> 
>

Re: Unable to fetch data from segment folder

Reply via email to