I also use another solution for the namespace incompatibility which is to run :
rm -Rf /tmp/hadoop-/ *
then format the namenode. Hope that helps,
Maha
On Jan 9, 2011, at 9:08 PM, Adarsh Sharma wrote:
> Shuja Rehman wrote:
>> hi
>>
>> i have format the name node and now when i restart the
Shuja Rehman wrote:
hi
i have format the name node and now when i restart the cluster, i am getting
the strange error. kindly let me know how to fix it.
thnx
/
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = hadoop.zoniversal.com/
Esteban Gutierrez Moguel wrote:
Adarsh,
Dou you have in /etc/hosts the hostnames for masters and slaves?
Yes I know this issue. But did you think the error occurs while reading
the output of map.
I want to know the proper reason of below lines :
org.apache.hadoop.util.DiskChecker$DiskErr
Hi Arvind,
thanks very much for that. Very good to know. Sounds like Sqoop is just what
I'm looking for.
cheers,
Brian
On Sun, Jan 9, 2011 at 9:37 PM, arv...@cloudera.com wrote:
> Hi Brian,
>
> Sqoop supports incremental imports that can be run against a live database
> system on a daily basis
Thanks Jeff,
Great info and I really appreciate it.
cheers,
Brian
On Mon, Jan 10, 2011 at 12:00 AM, Jeff Hammerbacher wrote:
> Hey Brian,
>
> One final point about Sqoop: it's a part of Cloudera's Distribution for
> Hadoop, so it's Apache 2.0 licensed and tightly integrated with the other
> pla
Hi Ted,
I agree about reducing the quadratic cost and hopefully my reply to Michael
will show what my idea has been in this regard.
I really appreciate the pointers on LSH and Mahoot and I'll read up on it
and see if it helps out.
thanks very much for your help.
cheers,
Brian
On Sun, Jan 9, 20
Hi Michael,
Firstly, thanks for the reply. Secondly, I have to give you credit for the
first person who has ever asked me if I want to open up my kimono a little
and also the first person on a tech list who has ever made me laugh out
loud. :)
Ok, I hear you, and you raise some very valid issues s
Hey Brian,
One final point about Sqoop: it's a part of Cloudera's Distribution for
Hadoop, so it's Apache 2.0 licensed and tightly integrated with the other
platform components. This means, for example, that we have added a Sqoop
action to Oozie, which makes integrating data import and export into
btw, when I said "user's information", it is just my system's data. It can be
any kinds of data, like word count. e.g., I want to process word count of a
specific word immediately.
Thanks.
From: Savannah Beckett
To: common-user@hadoop.apache.org
Sent: Sun
Hi,
I know that hadoop is more suitable for patch processing. But is there a
ondemand feature in hadoop? I want hadoop to process a specific user's
information when the user demands it in the web interface. I am thinking maybe
setting a priority to this specific user's information, so hadoo
Hi Brian,
Sqoop supports incremental imports that can be run against a live database
system on a daily basis for importing the new data. Unless your data is
large and cannot be split into comparable slices for parallel imports, I do
not see any concerns regarding performance.
Regarding the databa
You still have to knock down the quadratic cost.
Any equality checks you have in your problem can be used to limit the
problem to growing quadratically in the number of records equal by that
comparison. That may be enough to fix things (for now). Unfortunately
heavily skewed data are very common
All you're doing is delaying the inevitable by going to hadoop. There's no
magic to hadoop. It doens't run as fast as individual processes. There's just
the ability to split jobs across a cluster which works for some problems. You
won't even get a linear improvement in speed.
At least I as
All you're doing is delaying the inevitable by going to hadoop. There's no
magic to hadoop. It doens't run as fast as individual processes. There's just
the ability to split jobs across a cluster which works for some problems. You
won't even get a linear improvement in speed.
At least I as
Hi Michael,
yeah, sorry, I shouldn't have said a compare as that would be a simplified
problem. For each two rows I have to calculate a score based on multiplying
some of the column values together, running some functions against each
other etc. I could do this as the rows are entered into the db,
Thanks Ted,
You're right but I suppose I was too brief in my initial statement. I should
have said that I have to run an operation on all rows with respect to each
other. It's not a case of just comparing them and thus sorting them so
unfortunately I don't think this will help much. Some of the va
Thanks Konstantin,
I had seen sqoop. I wonder is it normally used as a once off process or can
it also be effectively used on a live database system on a daily basis to
batch export. Are there performance issues with this approach? Or how would
it compare to some of the other classes that I have s
thanks Sonal,
I'll check it out
On Sun, Jan 9, 2011 at 2:57 AM, Sonal Goyal wrote:
> Hi Brian,
>
> You can check HIHO at https://github.com/sonalgoyal/hiho which can help
> you
> load data from any JDBC database to the Hadoop file system. If your table
> has a date or id field, or any indicator
What kind of compare do you have to do?
You should be able to compute a checksum or such for each row when you insert
them and only have to look at the subset that matches if you're doing some sort
of substring or such.
Michael D. Black
Senior Scientist
Advanced Analytics Directorate
Northrop
Hello everyone,
I do not have much practical experience using Hadoop and there is a
thing I'd like to know but can't test it at the moment and I'm not
sure about it theoretically ;)
On a Reducer, do all calls to reduce() share the main memory the
Reducer has, or are all those calls sequenti
It is, of course, only quadratic, even if you compare all rows to all other
rows. You can reduce this cost to O(n log n) by ordinary sorting and you
can reduce further reduce the cost to O(n) using radix sort on hashes.
Practically speaking, in either the parallel or non parallel setting try
sort
21 matches
Mail list logo