Hi All

In nutch-gora-mongoDB 2.3.1 during URLs crawling,Some URLs are not downloaded due to some exception.
Here is the sample document of MongoDB.
db.business_webpage.findOne({"markers.dist":"1",status:1})
{
"_id" : "br.com.valor.www:http/financas/4286772/divida-ficara-dentro-da-meta-afirma-tesouro",
*"status" : 1*,
    "fetchTime" : NumberLong("1456378905007"),
    "fetchInterval" : 2592000,
    "retriesSinceFetch" : 2,
    "score" : 0,
    "inlinks" : {

    },
    "markers" : {
        "_ftcmrk_" : null,
        "dist" : "1",
        "_gnmrk_" : null
    },
    "metadata" : {
        "_csh_" : BinData(0,"AAAAAA==")
    },
    "batchId" : "1456292454-1525750489",
    "prevFetchTime" : NumberLong("1456291771776"),
    "protocolStatus" : {
        "code" : 16,
        "args" : [
            "java.net.SocketTimeoutException: Read timed out"
        ],
        "lastModified" : NumberLong(0)
    }
}

Problem is fetch *status *is *1* which should be changed.
It is observed these type of URLs are neither selected nor fetched in nutch work flow.
Is it bug??

Thanks


Reply via email to