Hello everybody,
I attempted to use the Eclipse IDE for Hadoop development and I followed
the instructions shown in here:
http://wiki.apache.org/hadoop/EclipseEnvironment
Everything goes well until I am starting to import projects in Eclipse,
and particularly HDFS. When I follow the
Hi,
I wonder that if Hadoop can solve effectively the question as following:
==
input file: a.txt, b.txt
result: c.txt
a.txt:
id1,name1,age1,...
id2,name2,age2,...
id3,name3,age3,...
id4,name4,age4,...
b.txt:
id1,address1,...
id2,address2,...
Hive?
Sure Assuming you mean that the id is a FK common amongst the tables...
Sent from a remote device. Please excuse any typos...
Mike Segel
On May 29, 2012, at 5:29 AM, liuzhg liu...@cernet.com wrote:
Hi,
I wonder that if Hadoop can solve effectively the question as following:
hive is one approach (similar to routine databases but exactly not the same)
if you are looking at mapreduce program then using multipleinput formats
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/lib/input/MultipleInputs.html
On Tue, May 29, 2012 at 4:02 PM,
Which release? Version?
I believe there are variables in the *-site.xml that allow LDAP integration ...
Sent from a remote device. Please excuse any typos...
Mike Segel
On May 26, 2012, at 7:40 AM, samir das mohapatra samir.help...@gmail.com
wrote:
Hi All,
Did any one work on hadoop
Hi Gump,
Mapreduce fits well for solving these types(joins) of problem.
I hope this will help you to solve the described problem..
1. Mapoutput key and value classes : Write a map out put key class(Text.class),
value class(CombinedValue.class). Here value class should be able to hold the
Hi,
You can also try to use the Hadoop Reduce Side Join functionality.
Look into the contrib/datajoin/hadoop-datajoin-*.jar for the base MAP and
Reduce classes to do the same.
Regards,
Soumya.
On Tue, May 29, 2012 at 4:10 PM, Devaraj k devara...@huawei.com wrote:
Hi Gump,
Mapreduce fits
It is cloudera version .20
On Tue, May 29, 2012 at 4:14 PM, Michel Segel michael_se...@hotmail.comwrote:
Which release? Version?
I believe there are variables in the *-site.xml that allow LDAP
integration ...
Sent from a remote device. Please excuse any typos...
Mike Segel
On May 26,
Yes it is possible by using MultipleInputs format to multiple mapper
(basically 2 different mapper)
Setp: 1
MultipleInputs.addInputPath(conf, new Path(args[0]), TextInputFormat.class,
*Mapper1.class*);
MultipleInputs.addInputPath(conf, new Path(args[1]),
TextInputFormat.class, *Mapper2.class*);
I'm trying to use the DistributedCache but having an issue resolving the
symlinks to my files.
My Driver class writes some hashmaps to files in the DC like this:
Path tPath = new Path(/data/cache/fd, UUID.randomUUID().toString());
os = new ObjectOutputStream(fs.create(tPath));
So my question is that do hadoop 0.20 and 1.0.3 differ in their support of
writing or reading sequencefiles? same code works fine with hadoop 0.20 but
problem occurs when run it under hadoop 1.0.3.
On Sun, May 27, 2012 at 6:15 PM, waqas latif waqas...@gmail.com wrote:
But the thing is, it works
I believe that their CDH3u3 or later has this... parameter.
(Possibly even earlier.)
On May 29, 2012, at 6:12 AM, samir das mohapatra wrote:
It is cloudera version .20
On Tue, May 29, 2012 at 4:14 PM, Michel Segel
michael_se...@hotmail.comwrote:
Which release? Version?
I believe there
Found the problem.
Shifting VMs from VirtualBox to KVM worked for me, all other configurations of
VMs were kept same.
So, checksum errors were indeed showing problem with hardware .. though virtual
in this case.
-Akshay
From: Akshay Singh
Hi,
That's not a back up strategy.
You could still have joe luser take out a key file or directory. What do you do
then?
On May 29, 2012, at 11:19 AM, Darrell Taylor wrote:
Hi,
We are about to build a 10 machine cluster with 40Tb of storage, obviously
as this gets full actually trying to
Yes you will have redundancy, so no single point of hardware failure can wipe
out your data, short of a major catastrophe. But you can still have an errant
or malicious hadoop fs -rm -rf shut you down. If you still have the original
source of your data somewhere else you may be able to
Hello Hadoop community,
I have been trying to set up a double node Hadoop cluster (following
the instructions in -
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/)
and am very close to running it apart from one small glitch - when I
start the dfs (using
I have another question.
I want to use hadoop's class and Xml message get about Hadoop's NameNode
DataNode Job etc in my Application monitor it,so I want to deployment a WEB
Application(structs 2.0) in Hadoop's webapps, i'm reading something about
hadoop's src, but i could't found good function
Hi,
We are frequently observing the exception
java.io.IOException: DFSClient_attempt_201205232329_28133_r_02_0 could not
complete file
/output/tmp/test/_temporary/_attempt_201205232329_28133_r_02_0/part-r-2.
Giving up.
on our cluster. The exception occurs during writing a file.
Hi,
I wonder that if Hadoop can solve effectively the question as following:
==
input file: a.txt, b.txt
result: c.txt
a.txt:
id1,name1,age1,...
id2,name2,age2,...
id3,name3,age3,...
id4,name4,age4,...
b.txt:
id1,address1,...
id2,address2,...
Can you see logs for nn and dn
Sent from my iPhone
On May 27, 2012, at 1:21 PM, Rohit Pandey rohitpandey...@gmail.com wrote:
Hello Hadoop community,
I have been trying to set up a double node Hadoop cluster (following
the instructions in -
Rohit,
The SNN may start and run infinitely without doing any work. The NN
and DN have probably not started cause the NN has an issue (perhaps NN
name directory isn't formatted) and the DN can't find the NN (or has
data directory issues as well).
So this isn't a glitch but a real issue you'll
Hi,
I'd like to upgrade my Hadoop cluster from version 0.20.2-CDH3B4 to
1.0.3. I'm running a pretty small cluster of just 4 nodes, and it's not
really being used by too many people at the moment, so I'm OK if things
get dirty or it goes offline for a bit. I was looking at the tutorial at
Should be ./q_map .
Koji
On 5/29/12 7:38 AM, Alan Miller alan.mil...@synopsys.com wrote:
I'm trying to use the DistributedCache but having an issue resolving the
symlinks to my files.
My Driver class writes some hashmaps to files in the DC like this:
Path tPath = new
Yes you can do it. In pig you would write something like
A = load ‘a.txt’ as (id, name, age, ...)
B = load ‘b.txt’ as (id, address, ...)
C = JOIN A BY id, B BY id;
STORE C into ‘c.txt’
Hive can do it similarly too. Or you could write your own directly in
map/redcue or using the data_join jar.
Hi Mark
public void map(LongWritable offset, Text
val,OutputCollector
FloatWritable,Text output, Reporter reporter)
throws IOException {
output.collect(new FloatWritable(*1*), val); *//chanage 1 to 1.0f
then it will work.*
}
let me know the status after the change
On Wed, May
Thanks for the reply but I already tried this option, and is the error:
java.io.IOException: wrong key class: org.apache.hadoop.io.LongWritable is
not class org.apache.hadoop.io.FloatWritable
at
org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:998)
at
Hi Mark
See the out put for that same Application .
I am not getting any error.
On Wed, May 30, 2012 at 1:27 AM, Mark question markq2...@gmail.com wrote:
Hi guys, this is a very simple program, trying to use TextInputFormat and
SequenceFileoutputFormat. Should be easy but I get the
Hi Samir, can you email me your main class.. or if you can check mine, it
is as follows:
public class SortByNorm1 extends Configured implements Tool {
@Override public int run(String[] args) throws Exception {
if (args.length != 2) {
System.err.printf(Usage:bin/hadoop
Hi,
Mike, Nitin, Devaraj, Soumya, samir, Robert
Thank you all for your suggestions.
Actually, I want to know if hadoop has any advantage than routine database
in performance for solving this kind of problem ( join data ).
Best Regards,
Gump
On Tue, May 29, 2012 at 6:53 PM, Soumya
Hi,
I add 5 new datanode and I want to do the rebalance, and I started the
rebalance on the namenode, and it gave me the notice that
starting balancer, logging to /hadoop/logs/hadoop-hdfs-balancer-hadoop220.out
and today I check the log file and the detail is that
Another balancer is
if you have huge dataset (huge meaning that around tera bytes or at the
least few GBs) then yes, hadoop has the advantage of distributed systems
and is much faster
but on a smaller set of records it is not as good as RDBMS
On Wed, May 30, 2012 at 6:53 AM, liuzhg liu...@cernet.com wrote:
Hi,
1) I am not sure that whether I should start the rebalance on the namenode or
on each new datanode.
You can run the balancer in any node. It is not suggested to run in namenode
and would be better to run in a node which has less load.
2) should I set the bandwidth on each datanode or just only
32 matches
Mail list logo