Sebastian Nagel created NUTCH-2143:
--------------------------------------

             Summary: GeneratorJob ignores batch id passed as argument
                 Key: NUTCH-2143
                 URL: https://issues.apache.org/jira/browse/NUTCH-2143
             Project: Nutch
          Issue Type: Bug
          Components: generator
    Affects Versions: 2.3.1
            Reporter: Sebastian Nagel
            Priority: Blocker
             Fix For: 2.3.1


The batch id passed to GeneratorJob by option/argument -batchId <id> is ignored 
and a generated batch id is used to mark the current batch. Log snippets from a 
run of bin/crawl:
{noformat}
bin/nutch generate ... -batchId 1444941073-14208
...
GeneratorJob: generated batch id: 1444941074-858443668 containing 1 URLs

Fetching : 
bin/nutch fetch ... 1444941073-14208 ...
...
QueueFeeder finished: total 0 records. Hit by time limit :0
{noformat}

The generated URLs are marked with the wrong batch id:
{noformat}
hbase(main):010:0> scan 'test_webpage'
ROW                            COLUMN+CELL
 org.apache.nutch:http/        column=f:bid, timestamp=1444941077080, 
value=1444941074-858443668
 ...
 org.apache.nutch:http/        column=mk:_gnmrk_, timestamp=1444941077080, 
value=1444941074-858443668
{noformat}
and fetcher will not fetch anything. This problem was reported by Sherban 
Drulea 
[[1|https://www.mail-archive.com/user@nutch.apache.org/msg13894.html],[2|https://www.mail-archive.com/user@nutch.apache.org/msg13912.html]].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to