Hi All
In nutch-gora-mongoDB 2.3.1 during URLs crawling,Some URLs are not
downloaded due to some exception.
Here is the sample document of MongoDB.
db.business_webpage.findOne({"markers.dist":"1",status:1})
{
"_id" :
"br.com.valor.www:http/financas/4286772/divida-ficara-dentro-da-meta-afirma-tesouro",
*"status" : 1*,
"fetchTime" : NumberLong("1456378905007"),
"fetchInterval" : 2592000,
"retriesSinceFetch" : 2,
"score" : 0,
"inlinks" : {
},
"markers" : {
"_ftcmrk_" : null,
"dist" : "1",
"_gnmrk_" : null
},
"metadata" : {
"_csh_" : BinData(0,"AAAAAA==")
},
"batchId" : "1456292454-1525750489",
"prevFetchTime" : NumberLong("1456291771776"),
"protocolStatus" : {
"code" : 16,
"args" : [
"java.net.SocketTimeoutException: Read timed out"
],
"lastModified" : NumberLong(0)
}
}
Problem is fetch *status *is *1* which should be changed.
It is observed these type of URLs are neither selected nor fetched in
nutch work flow.
Is it bug??
Thanks