You can find some detailed information about Nutch 2.x HBase WebPage Table
Columns using this page : http://nlp.solutions.asia/?p=232
On Thu, Mar 21, 2013 at 1:43 AM, kamaci furkankam...@gmail.com wrote:
is there any command for that when I use
describe 'webpage'
there is not column
Where does Nutch stores that information?
2013/3/21 Markus Jelsma-2 [via Lucene]
ml-node+s472066n4049568...@n3.nabble.com
Nutch selects records that are eligible for fetch. It's either due to a
transient failure or if the fetch interval has been expired. This means
that failed fetches due to
To: user@nutch.apache.org
Subject: Re: Does Nutch Checks Whether A Page crawled before or not
Where does Nutch stores that information?
2013/3/21 Markus Jelsma-2 [via Lucene]
ml-node+s472066n4049568...@n3.nabble.com
Nutch selects records that are eligible for fetch. It's either due
the readdb command to inspect a specific URL.
-Original message-
From:kamaci [hidden
email]http://user/SendEmail.jtp?type=nodenode=4049572i=0
Sent: Wed 20-Mar-2013 23:52
To: [hidden email]http://user/SendEmail.jtp?type=nodenode=4049572i=1
Subject: Re: Does Nutch Checks Whether A Page
I use Nutch 2.1 and don't use that crawldb command. I have an Hbase database.
Can I see such kind of data still? I think readdb doesn't work at my
situaton?
--
View this message in context:
Ok that works for me:
./bin/nutch readdb -url http://www.generalist.org.uk/blog/
2013/3/21 kamaci [via Lucene] ml-node+s472066n4049582...@n3.nabble.com
I use Nutch 2.1 and don't use that crawldb command. I have an Hbase
database. Can I see such kind of data still? I think readdb doesn't work
readdb works for both versions of nutch. In 2.x, its implemented by
WebTableReader [0] class. See the usage to get more details of the command.
[0]
http://svn.apache.org/viewvc/nutch/branches/2.x/src/java/org/apache/nutch/crawl/WebTableReader.java?view=markup
On Wed, Mar 20, 2013 at 4:18 PM,
I use Hbase than where is that crawldb? Is it stored at my Hbase or any
other special folder at Nutch?
2013/3/21 Tejas Patil [via Lucene] ml-node+s472066n4049588...@n3.nabble.com
readdb works for both versions of nutch. In 2.x, its implemented by
WebTableReader [0] class. See the usage to get
yes. If you have configured it to use HBase, then the info will be stored
in HBase.
On Wed, Mar 20, 2013 at 4:27 PM, kamaci furkankam...@gmail.com wrote:
I use Hbase than where is that crawldb? Is it stored at my Hbase or any
other special folder at Nutch?
2013/3/21 Tejas Patil [via Lucene]
is there any command for that when I use
describe 'webpage'
there is not column something like fetchtime? How can I see it from Hbase?
2013/3/21 Tejas Patil [via Lucene] ml-node+s472066n4049596...@n3.nabble.com
yes. If you have configured it to use HBase, then the info will be stored
in
In addition to Lewis suggestions, please take a look at [0] to understand
how Nutch 2.x works with the storing of files in the database.
Please report to us if there any inconsistencies.
[0] - http://wiki.apache.org/nutch/NutchConfigurationFiles-2.x#preview
On Wed, Mar 20, 2013 at 7:43 PM,
11 matches
Mail list logo