Re: Does Nutch Checks Whether A Page crawled before or not

2013-03-21 Thread Canan GİRGİN
You can find some detailed information about Nutch 2.x HBase WebPage Table Columns using this page : http://nlp.solutions.asia/?p=232 On Thu, Mar 21, 2013 at 1:43 AM, kamaci furkankam...@gmail.com wrote: is there any command for that when I use describe 'webpage' there is not column

Re: Does Nutch Checks Whether A Page crawled before or not

2013-03-20 Thread kamaci
Where does Nutch stores that information? 2013/3/21 Markus Jelsma-2 [via Lucene] ml-node+s472066n4049568...@n3.nabble.com Nutch selects records that are eligible for fetch. It's either due to a transient failure or if the fetch interval has been expired. This means that failed fetches due to

RE: Does Nutch Checks Whether A Page crawled before or not

2013-03-20 Thread Markus Jelsma
To: user@nutch.apache.org Subject: Re: Does Nutch Checks Whether A Page crawled before or not Where does Nutch stores that information? 2013/3/21 Markus Jelsma-2 [via Lucene] ml-node+s472066n4049568...@n3.nabble.com Nutch selects records that are eligible for fetch. It's either due

Re: Does Nutch Checks Whether A Page crawled before or not

2013-03-20 Thread kamaci
the readdb command to inspect a specific URL. -Original message- From:kamaci [hidden email]http://user/SendEmail.jtp?type=nodenode=4049572i=0 Sent: Wed 20-Mar-2013 23:52 To: [hidden email]http://user/SendEmail.jtp?type=nodenode=4049572i=1 Subject: Re: Does Nutch Checks Whether A Page

RE: Does Nutch Checks Whether A Page crawled before or not

2013-03-20 Thread kamaci
I use Nutch 2.1 and don't use that crawldb command. I have an Hbase database. Can I see such kind of data still? I think readdb doesn't work at my situaton? -- View this message in context:

Re: Does Nutch Checks Whether A Page crawled before or not

2013-03-20 Thread kamaci
Ok that works for me: ./bin/nutch readdb -url http://www.generalist.org.uk/blog/ 2013/3/21 kamaci [via Lucene] ml-node+s472066n4049582...@n3.nabble.com I use Nutch 2.1 and don't use that crawldb command. I have an Hbase database. Can I see such kind of data still? I think readdb doesn't work

Re: Does Nutch Checks Whether A Page crawled before or not

2013-03-20 Thread Tejas Patil
readdb works for both versions of nutch. In 2.x, its implemented by WebTableReader [0] class. See the usage to get more details of the command. [0] http://svn.apache.org/viewvc/nutch/branches/2.x/src/java/org/apache/nutch/crawl/WebTableReader.java?view=markup On Wed, Mar 20, 2013 at 4:18 PM,

Re: Does Nutch Checks Whether A Page crawled before or not

2013-03-20 Thread kamaci
I use Hbase than where is that crawldb? Is it stored at my Hbase or any other special folder at Nutch? 2013/3/21 Tejas Patil [via Lucene] ml-node+s472066n4049588...@n3.nabble.com readdb works for both versions of nutch. In 2.x, its implemented by WebTableReader [0] class. See the usage to get

Re: Does Nutch Checks Whether A Page crawled before or not

2013-03-20 Thread Tejas Patil
yes. If you have configured it to use HBase, then the info will be stored in HBase. On Wed, Mar 20, 2013 at 4:27 PM, kamaci furkankam...@gmail.com wrote: I use Hbase than where is that crawldb? Is it stored at my Hbase or any other special folder at Nutch? 2013/3/21 Tejas Patil [via Lucene]

Re: Does Nutch Checks Whether A Page crawled before or not

2013-03-20 Thread kamaci
is there any command for that when I use describe 'webpage' there is not column something like fetchtime? How can I see it from Hbase? 2013/3/21 Tejas Patil [via Lucene] ml-node+s472066n4049596...@n3.nabble.com yes. If you have configured it to use HBase, then the info will be stored in

Re: Does Nutch Checks Whether A Page crawled before or not

2013-03-20 Thread kiran chitturi
In addition to Lewis suggestions, please take a look at [0] to understand how Nutch 2.x works with the storing of files in the database. Please report to us if there any inconsistencies. [0] - http://wiki.apache.org/nutch/NutchConfigurationFiles-2.x#preview On Wed, Mar 20, 2013 at 7:43 PM,