Hi
I am using pig 2.0 and nutch 1.0; but it dont have common hadoop verion.
what is common hadoop verion for both pig and hadoop;
GIVE the pig version, nutch version and hadoo
please can any one help on this
thanks
ramanaiah
Did another test and got this error:
2009-06-25 21:19:44,663 ERROR mapred.EagerTaskInitializationListener - Job
initialization failed:
java.lang.IllegalArgumentException: Pathname
/d:/Bii/nutch/logs/history/user/_logs/history/localhost_1245956549829_job_200906252102_0001_pc-xxx%xxx_inject+urls
fro
2009/6/24 Andrzej Bialecki
> MilleBii wrote:
>
>> What's also i have discovered
>> + hadoop (script) works with unix like paths and works fine on windows
>> + nutch (script) works with Windows paths
>>
>
> bin/nutch works with Windows paths? I think this could happen only by
> accident - both scr
MilleBii wrote:
What's also i have discovered
+ hadoop (script) works with unix like paths and works fine on windows
+ nutch (script) works with Windows paths
bin/nutch works with Windows paths? I think this could happen only by
accident - both scripts work with Cygwin paths. On the other hand
What's also i have discovered
+ hadoop (script) works with unix like paths and works fine on windows
+ nutch (script) works with Windows paths
Could it be that there is some incompatibility because one works unix like
paths and not the other ???
2009/6/24 MilleBii
> Actually tried and it fail
Actually tried and it fails but this is what I found :
bin/hadoop-config.sh does the conversion from relative to absolute path
this="$0"
while [ -h "$this" ]; do
ls=`ls -ld "$this"`
link=`expr "$ls" : '.*-> \(.*\)$'`
if expr "$link" : '.*/.*' > /dev/null; then
this="$link"
else
th
Yes I'm using both relative path & cygwin under windows. so /d: is not
introduced by me, but either nutch or hadoop.
Regarding the cygwin path you are righ... actually where I lost quite some
time.
OK will try absolute paths and let you know.
-MilleBii-
2009/6/24 Andrzej Bialecki
> MilleBii
MilleBii wrote:
HLPPP !!!
Stuck for 3 days on not able to start any nutch job.
hdfs works fine, ie I can put & look at files.
When i start nutch crawl, I get the following error
Job initialization failed:
java.lang.IllegalArgumentException: Pathname
/d:/Bii/nutch/logs/history/user/_log
HLPPP !!!
Stuck for 3 days on not able to start any nutch job.
hdfs works fine, ie I can put & look at files.
When i start nutch crawl, I get the following error
Job initialization failed:
java.lang.IllegalArgumentException: Pathname
/d:/Bii/nutch/logs/history/user/_logs/history/localhos
Looks like I just needed to transfer from the local filesystem to hdfs:
Is it safe to transfer a crawl directory (and subs) from the local file
system to hdfs and start crawling again ?
1. hadoop fs -put crawl crawl
2. nutch generate crawl/crawldb crawl/segments -topN 500 (where now it
should use
I have newly installed hadoop in a distributed single node configuration.
When I run nutch commands it is looking for files my user home directory
and not at the nutch directory ?
How can I change this ?
--
-MilleBii-
Hello,
Problem is partialy solved but I still write it :)
Usuing bin/nutch commands (inject, generate, fetch etc.) is working.
Only bin/nutch crawl is not
--
I have successfully setup hadoop cluster on 6 nodes (1
namenode
https://issues.apache.org/jira/browse/NUTCH-637
2008/7/17 宫照 <[EMAIL PROTECTED]>:
> Hi,
>
> I have the same problems. Because there are some bugs with hadoop-0.12.2,I
> want to change to hadoop-0.17.0, but the api changed,we can't use it
> directly. If your find the way to solve this problem. let
Hi,
I have the same problems. Because there are some bugs with hadoop-0.12.2,I
want to change to hadoop-0.17.0, but the api changed,we can't use it
directly. If your find the way to solve this problem. let me know.
Regards,
gong zhao
2008/7/15 kranthi reddy <[EMAIL PROTECTED]>:
> Hi,
>
> I am
Hi,
I am using nutch-0.9 version and it has default hadoop-0.12.2 .
Now hadoop-0.17.0 being the latest version i want my nutch to run using
this hadoop . So i have replaced the hadoop-0.12.2.jar file in lib with
hadoop-0.17.0.jar file. And as usual errors creep up with some
functions/methods be
message in context:
http://www.nabble.com/Nutch-and-Hadoop-tp15136744p15436273.html
Sent from the Nutch - User mailing list archive at Nabble.com.
this
[EMAIL PROTECTED] search]$ ./bin/hadoop dfs -put urls urls
put: Connection refused
thanks
--
View this message in context:
http://www.nabble.com/Nutch-and-Hadoop-tp15136744p15414607.html
Sent from the Nutch - User mailing list archive at Nabble.com.
./bin/hadoop dfs -put urls urls
put: Connection refused
what is the problem?
thanks
--
View this message in context:
http://www.nabble.com/Nutch-and-Hadoop-tp15136744p15365523.html
Sent from the Nutch - User mailing list archive at Nabble.com.
Hi
You need to add your public key to the .ssh/authorized_keys on the master as
well as the slave. Also, make sure that this file is not writable by anyone
else but you.
regards
Barry
On Thursday 07 February 2008, payo wrote:
> i created my ssh keys and i can login over ssh without being pro
-emcvaalkm01.estafeta.com.out
[EMAIL PROTECTED]'s password:
what is the problem
--
View this message in context:
http://www.nabble.com/Nutch-and-Hadoop-tp15136744p15343807.html
Sent from the Nutch - User mailing list archive at Nabble.com.
-and-Hadoop-tp15136744p15300083.html
Sent from the Nutch - User mailing list archive at Nabble.com.
i resolved the problem!!
i change in conf/context.xsl
by
this is correct?
i read this
http://www.openrdf.org/doc/sesame/users/ch09.html#d0e3707
--
View this message in context:
http://www.nabble.com/Nutch-and-Hadoop-tp15136744p15296614.html
Sent from the Nutch - User mailing list
--
View this message in context:
http://www.nabble.com/Nutch-and-Hadoop-tp15136744p15295369.html
Sent from the Nutch - User mailing list archive at Nabble.com.
i am trying configure nutch and hadoop in two pc, but i have questions:
1.- i have install nutch in the two pcs or only in the master node?
2.- hadoop helpme to reduce times in my crawl?
3.- Only i need create keys for communication with my pcs
thanks
--
View this message in context:
http
> i am working with nutch-0.8.1 and i am trying configure hadoop but my
> questions are:
>
> -in the directory bin exist the files:
>
> hadoop, hadoop-daemon, hadoop-daemons, nutch, rcc, slaves, start-all,
> start-dfs, start-mapred, stop-all, stop-dfs, stop-mapred
>
> this files are necesary f
with hadoop?
my base is
http://wiki.apache.org/nutch/NutchHadoopTutorial
thanks
or i have download hadoop and make install?
--
View this message in context:
http://www.nabble.com/Nutch-and-Hadoop-tp15136744p15136744.html
Sent from the Nutch - User mailing list archive at Nabble.com.
Hello Frens,
I am gathering information on supoorted hardware and OS for nutch and hadoop
. I did not find any conclusive information by going thru Nutch wiki.
If I want to build a cluster of nodes using nutch/hadoop for crawling then
what are my options for H/W and OS ?
> I can only conclude that the way to succeed as a search startup is to
> CRAWL DIFFERENTLY. Focus on websites in specific regions, specific
> topics, specific data types. Crawl into the corners of websites that
> contain interesting nuggets of data (listings, calendars, etc) that
> won't ever have
Andrzej, before I dive into your specific questions... I want to step
back to the original topic: what applications are possible with Nutch?
The specialization that I focused on was a _listings_ crawler. There
are any number of listings types that one could potentially crawl:
- events (what
Matt Kangas wrote:
Hi Andrzej (and everyone else),
A few weeks ago, I intended to chime in on your "Scoring API issues"
thread, but this new thread is perhaps an even better place to speak up.
Time to stop lurking and contribute. :)
Thanks a lot for sharing your thoughts. Your post touches a
Hi Andrzej (and everyone else),
A few weeks ago, I intended to chime in on your "Scoring API issues"
thread, but this new thread is perhaps an even better place to speak
up. Time to stop lurking and contribute. :)
First, I want echo Stefan Groschupf's comment several months ago that
the N
Berlin Brown wrote:
Yea, you are right. You have to have a constrained set of domains to
search and to be honest, that works pretty well. The only thing, I
still get a lot of junk links. I would say that 30% are valid or
interesting links while the other is kind of worthless. I guess it is
a
Yea, you are right. You have to have a constrained set of domains to
search and to be honest, that works pretty well. The only thing, I
still get a lot of junk links. I would say that 30% are valid or
interesting links while the other is kind of worthless. I guess it is
a matter of studying spa
Hi
> My question; have you build a general site to crawl the internet and
> how did you find links that people would be interested in as opposed
> to capturing a lot of the junk out there.
interesting question. are you planning to build a new google ?
if you are planning to crawl without any limi
I really like the concept of nutch and hadoop but I haven't been able
to build an application with them. Most of the apps I like building
are targetted at the public, anyone on the internet. I built a
crawler of top sites like the NYtimes and Slate but I couldn't filter
out the sites
asn't hard at all. Though, I needed to replace
hadoop-12.whatever.jar to the lastest within the nutch build. It
seems to be working. yay.
Thanks.
On 7/11/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
Briggs wrote:
> I am currently trying to figure out how to deploy Nutch and Hado
Briggs wrote:
I am currently trying to figure out how to deploy Nutch and Hadoop
separately. I want to configure Hadoop outside of Nutch and have
Nutch use that service, rather than configuring hadoop within nutch.
I would think all that Nutch should need to know is the urls to
connect to
I am currently trying to figure out how to deploy Nutch and Hadoop
separately. I want to configure Hadoop outside of Nutch and have
Nutch use that service, rather than configuring hadoop within nutch.
I would think all that Nutch should need to know is the urls to
connect to Hadoop, but can
02 17:27:58,922 INFO conf.Configuration - parsing
>> file:/home/nutch/search/conf/hadoop-site.xml
>> 2007-05-02 17:27:58,989 WARN mapred.JobTracker - Starting tracker
>> java.net.ConnectException: Connection refused
>> at java.net.PlainSocketImpl.socketConnect(Native Method)
&
Is your hadoop jar in the lib directory named
"hadoop-0.4.0-patched.jar!" with the exclamation point? If it is, that
may be causing the error. Also let me know if you can ping the namenode
from any of the data nodes.
Dennis Kubes
cybercouf wrote:
I tried both with "localhost" or "myhostnam
What errors are you seeing in your hadoop-namenode and datanode logs?
Dennis Kubes
--
View this message in context:
http://www.nabble.com/nutch-and-hadoop%3A-can%27t-launch-properly-the-name-node-tf3680311.html#a10289878
Sent from the Nutch - User mailing list archive at Nabble.com.
What errors are you seeing in your hadoop-namenode and datanode logs?
Dennis Kubes
cybercouf wrote:
Yes it is.
Here more details:
$ cat /etc/hosts
127.0.0.1 localhost
84.x.x.xmyhostname.mydomain.com myhostname
# ping myhostname
PING myhostname.mydomain.com (84.x.x.x) 56(84) bytes o
looks like the namenode is running (when I stop it I have the
>> message
>> "stopping namenode"), but why I can't access it ? (is this ip from the
>> log
>> correct? 0.0.0.0:50070)
>> all is on the same machine, and my conf file looks ok:
>> fs.default.
Make sure your hosts file on your namenode is setup correctly:
127.0.0.1 localhost.localdomain localhost
10.x.x.xmyhostname.mydomain.com myhostname
As opposed to:
127.0.0.1 localhost.localdomain localhost
myhostname.mydomain.com myhostname
The prob
mapred.local.dir /home/nutch/filesystem/mapreduce/local
dfs.replication 1
--
View this message in context:
http://www.nabble.com/nutch-and-hadoop%3A-can%27t-launch-properly-the-name-node-tf3680311.html#a10285097
Sent from the Nutch - User mailing list archive at Nabble.com.
It is under NutchHadoopTutorial under Nutch Administration.
Dennis
-Original Message-
From: Chris Mattmann [mailto:[EMAIL PROTECTED]
Sent: Monday, March 20, 2006 5:21 PM
To: nutch-user@lucene.apache.org
Subject: Re: Nutch and Hadoop Tutorial Finished
Hi Dennis,
Thanks for your hard
inal Message-
> From: Vanderdray, Jacob [mailto:[EMAIL PROTECTED]
> Sent: Monday, March 20, 2006 12:49 PM
> To: nutch-user@lucene.apache.org
> Subject: RE: Nutch and Hadoop Tutorial Finished
>
> Sorry. Go to http://wiki.apache.org/nutch/ and click on the
> "login&qu
The NutchHadoop tutorial is now up on the wiki.
Dennis
-Original Message-
From: Vanderdray, Jacob [mailto:[EMAIL PROTECTED]
Sent: Monday, March 20, 2006 12:49 PM
To: nutch-user@lucene.apache.org
Subject: RE: Nutch and Hadoop Tutorial Finished
Sorry. Go to http
I will add in your changes and then put it up on the wiki.
Dennis
-Original Message-
From: Doug Cutting [mailto:[EMAIL PROTECTED]
Sent: Monday, March 20, 2006 2:41 PM
To: nutch-user@lucene.apache.org
Subject: Re: Nutch and Hadoop Tutorial Finished
Dennis Kubes wrote:
> Here it is
Dennis Kubes wrote:
Here it is for the list, I will try to put it on the wiki as well.
Thanks for writing this!
I've added a few comments below.
Some things are assumed for this tutorial. First, you will need root level
access to all of the boxes you are deploying to.
Root access should n
Dennis,
Thank you very very much for tutorial.
Michael
- Original Message -
From: "Dennis Kubes" <[EMAIL PROTECTED]>
To:
Sent: Monday, March 20, 2006 10:46 AM
Subject: RE: Nutch and Hadoop Tutorial Finished
Here it is for the list, I will try to put it on the wiki
Here it is for the list, I will try to put it on the wiki as well.
Dennis
How to Setup Nutch and Hadoop
After searching the web and mailing lists, it seems that there is very
little information on how to setup
If you have any trouble, just shout.
Jake.
-Original Message-
From: Dennis Kubes [mailto:[EMAIL PROTECTED]
Sent: Monday, March 20, 2006 1:37 PM
To: nutch-user@lucene.apache.org
Subject: RE: Nutch and Hadoop Tutorial Finished
Not to act dumb, but how do I add it to the wiki?
Denni
Not to act dumb, but how do I add it to the wiki?
Dennis
-Original Message-
From: Vanderdray, Jacob [mailto:[EMAIL PROTECTED]
Sent: Monday, March 20, 2006 12:20 PM
To: nutch-user@lucene.apache.org
Subject: RE: Nutch and Hadoop Tutorial Finished
Dennis,
How 'bout the
May you send me one? Thank you.
On 3/21/06, Dennis Kubes <[EMAIL PROTECTED]> wrote:
>
> All,
>
> I have finished a lengthy tutorial on how to setup a distributed
> implementation of nutch and hadoop. Should I post it on this list or is
> there a better place for it?
&
Dennis,
How 'bout the wiki.
Jake.
-Original Message-
From: Dennis Kubes [mailto:[EMAIL PROTECTED]
Sent: Monday, March 20, 2006 1:01 PM
To: nutch-user@lucene.apache.org
Subject: Nutch and Hadoop Tutorial Finished
All,
I have finished a lengthy tutorial on how to se
All,
I have finished a lengthy tutorial on how to setup a distributed
implementation of nutch and hadoop. Should I post it on this list or is
there a better place for it?
Dennis
57 matches
Mail list logo