Hi Shi Wei,
I missed this thread over the holidays!
Which version of Nutch are you using?
The REST API needs quite a bit of attention. It is not a particularly mature 
aspect of the Nutch codebase and there are a catalog of issues which needs to 
be addressed.
If you are interested in learning about these issues then we can create an EPIC 
issue in JIRA and then begin flushing out all of the things wrong.
lewismc

On 2021/12/24 10:07:20 sw.l...@quandatics.com wrote:
> Hi, 
> 
>  
> 
> We are currently facing a problem when using NUTCH Rest API. We try to run
> Nutch API through Postman and It works perfectly fine if we don't define the
> segment pathway. This is the command we run in Postman.
> 
>  
> 
> Inject
> 
>  
> 
> {
> 
> "type":"INJECT",
> 
>     "confId":"default",
> 
>     "crawlId":"crawl01",
> 
>     "args": {"url_dir":"/opt/apache-nutch-1.18/runtime/local/urls/seed.txt",
> 
>               "crawldb": "/tmp/crawl/crawldb"
> 
>     }
> 
> }
> 
>  
> 
> Generate
> 
>  
> 
> {
> 
> "type":"GENERATE",
> 
>     "confId":"default",
> 
>     "crawlId":"crawl01",
> 
>     "args": {    "crawldb": "/tmp/crawl/crawldb",
> 
>                 "segment_dir": "/tmp/crawl/segments"
> 
>                }
> 
> }
> 
>  
> 
> Fetch 
> 
>  
> 
> {
> 
> "type":"FETCH",
> 
>     "confId":"default",
> 
>     "crawlId":"crawl01",
> 
>     "args": {"segment": "/tmp/crawl/segments"}
> 
> }
> 
>  
> 
> We try to define the pathway to store the crawled data in a specific
> directory. However, when come to fetch part, it cannot retrieve data from a
> specific folder (folder name that is generated by current date and time)
> under the segments folder. We have tried /tmp/crawl/segments/* and it can
> successfully retrieve the data, but it will also generate a new folder
> called *. 
> 
>  
> 
> Therefore, may we know if there is any way that could define the folder name
> in segments folder or is it got other way to change the output directory?
> 
>  
> 
> Attached is our log for your reference. Kindly advise. Thanks in advance.
> 
>  
> 
> Best Regards,
> 
> Shi Wei
> 
>  
> 
> 

Reply via email to