Fetch status is not changed

harsh Wed, 24 Feb 2016 03:58:06 -0800

Hi All

In nutch-gora-mongoDB 2.3.1 during URLs crawling,Some URLs are notdownloaded due to some exception.

Here is the sample document of MongoDB.
db.business_webpage.findOne({"markers.dist":"1",status:1})
{

"_id" :"br.com.valor.www:http/financas/4286772/divida-ficara-dentro-da-meta-afirma-tesouro",

*"status" : 1*,
    "fetchTime" : NumberLong("1456378905007"),
    "fetchInterval" : 2592000,
    "retriesSinceFetch" : 2,
    "score" : 0,
    "inlinks" : {


    },
    "markers" : {
        "_ftcmrk_" : null,
        "dist" : "1",
        "_gnmrk_" : null
    },
    "metadata" : {
        "_csh_" : BinData(0,"AAAAAA==")
    },
    "batchId" : "1456292454-1525750489",
    "prevFetchTime" : NumberLong("1456291771776"),
    "protocolStatus" : {
        "code" : 16,
        "args" : [
            "java.net.SocketTimeoutException: Read timed out"
        ],
        "lastModified" : NumberLong(0)
    }
}

Problem is fetch *status *is *1* which should be changed.

It is observed these type of URLs are neither selected nor fetched innutch work flow.

Is it bug??

Thanks

Fetch status is not changed

Reply via email to