On Tue, Jun 3, 2008 at 5:04 PM, Andreas Kostyrka <[EMAIL PROTECTED]> wrote:
> Plus to make it even more painful, you cannot easily run it with one simple
> SOCKS server, because you need to defer DNS resolution to the inside the
> cluster, because VM names do resolve to external IPs, while the webs
>It seems that setrep won't force replication change to the specified number
>immediately, it changed really slowly. just wondering if this is the expected
>behavior? what's the rational for this behavior? is there way to speed it up?
Yes, it wont force replication to be instant. Once you inc
It seems that setrep won't force replication change to the specified number
immediately, it changed really slowly. just wondering if this is the expected
behavior? what's the rational for this behavior? is there way to speed it up?
Thanks
Haijun
Well, the basic "trouble" with EC2 is that clusters usually are not networks
in the TCP/IP sense.
This makes it painful to decide which URLs should be resolved where.
Plus to make it even more painful, you cannot easily run it with one simple
SOCKS server, because you need to defer DNS resoluti
Ok, I've tried it out, the example sort bombs exactly like streaming =>
http://heaven.kostyrka.org/test.log
Any recommendations?
Thanks,
Andreas
signature.asc
Description: This is a digitally signed message part.
On Tuesday 03 June 2008 22:16:05 Andreas Kostyrka wrote:
> On Tuesday 03 June 2008 21:00:49 Runping Qi wrote:
> > ${hadoop} jar hadoop-0.17-examples.jar sort -m \
> >
> > > -r 88 \
> > > -inFormat org.apache.hadoop.mapred.KeyValueTextInputFormat \
> > > -outFormat org.apache.hadoop.mapred
On Tuesday 03 June 2008 21:00:49 Runping Qi wrote:
> ${hadoop} jar hadoop-0.17-examples.jar sort -m \
>
> > -r 88 \
> > -inFormat org.apache.hadoop.mapred.KeyValueTextInputFormat \
> > -outFormat org.apache.hadoop.mapred.lib.NullOutputFormat \
> > -outKey org.apache.hadoop.io.Text \
>
Ah; you're right, of course. Sorry about that. -C
On Jun 3, 2008, at 12:00 PM, Runping Qi wrote:
Chris,
Your version will use LongWritable as the map output key type, which
changes the job nature completely. You should use
${hadoop} jar hadoop-0.17-examples.jar sort -m \
-r 88 \
-inFo
Chris,
Your version will use LongWritable as the map output key type, which
changes the job nature completely. You should use
${hadoop} jar hadoop-0.17-examples.jar sort -m \
>-r 88 \
>-inFormat org.apache.hadoop.mapred.KeyValueTextInputFormat \
>-outFormat org.apache.hadoop.mapred.
On Tuesday 03 June 2008 20:35:03 Chris Douglas wrote:
> >> By "not exactly small, do you mean each line is long or that there
> >> are many records?
> >
> > Well, not small in the meaning, that even I could get my boss to
> > allow me to
> > give you the data, transfering it might be painful. (E.g.
You can also play with aggressive rebalancing. If you decommission the node
before adding the disk, then the namenode will make sure that you don't have
any data on that machine. Then when you restore the machine, it will fill
the volumes more sanely than if you start with a full partition.
In m
By "not exactly small, do you mean each line is long or that there
are many records?
Well, not small in the meaning, that even I could get my boss to
allow me to
give you the data, transfering it might be painful. (E.g. the job that
aborted had about 12M lines with with ~2.6GB data => the lin
Ok, a new dead job: ;(
This time after 2.4GB/11,3M lines ;(
Any idea what I could do debug this?
(No idea how to go at debugging a Java process that is distributed and does
GBs of data. How does one stabilize that kind of stuff to generate a
reproducable situation?)
Andresa
signature.asc
Des
If your keys and values have meaningful toString methods, hadoop fs -
text will print the contents to stdout. -C
On Jun 3, 2008, at 3:17 AM, Lin Guo wrote:
I am wondering whether it is possible to deserialize the keys and
values in a hadoop output file where the output format is
SequenceFi
This is an old problem.
We use round-robin algorithm to determine which local volume (disk/partition)
should a block be placed to. This does not work well in some cases including
the one when a new volume is included.
This was particularly discussed in
http://issues.apache.org/jira/browse/HADOOP-
On Jun 3, 2008, at 4:56 AM, smallufo wrote:
What if my data come from DB or memory ?
I should implement a DatabaseInputFormat implements InputFormatrowIndex
, MyData value> , right ?
Yes
But , how to implement the getSplits() , and getRecordReader() ?
I looks into the sample source code fo
Maybe you can check org.apache.hadoop.mapred.jobcontrol.*
I did not try it myself but it looks like this is what you need.
Cheers,
Christophe
On Tue, Jun 3, 2008 at 5:55 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:
> No.
>
> At least you need to call runJob twice. Typically, it is safer to create
I have had problems with multiple volumes while using ancient versions of
Hadoop. If I put the smaller partition first, I would get overfull
partition because hadoop was allocating by machine rather than by partition.
If you feel energetic, go ahead and try putting the smaller partition first
in
Hi,
are there any known issues with hadoop 0.16.3 and java 1.6? We have
some hanging jobs.
Thanks in advance.
Bye,
martin
No.
At least you need to call runJob twice. Typically, it is safer to create
two job configurations so you don't forget to change something from the
first jobs.
It isn't a big deal. Just do it!
On Tue, Jun 3, 2008 at 8:31 AM, hong <[EMAIL PROTECTED]> wrote:
> Hi all
>
> A job must be done in
Hi,
I'm about to add a new disk (under a new partition) to some existing DataNodes
that are nearly full. I see FAQ #15:
15. HDFS. How do I set up a hadoop node to use multiple volumes?
Data-nodes can store blocks in multiple directories typically allocated on
different local disk drives. In o
Hi all
A job must be done in two pairs of map reduce, That is, Map1==>
reduce1 ==> map2==>reduce2. "==>" means the output file of left is
the input of the right.
To do that job, can I just create only one JobConf instance, and
invoke JobClient.runJob(conf) once?
Is there any similar examp
On Tue, Jun 3, 2008 at 6:17 AM, Lin Guo <[EMAIL PROTECTED]> wrote:
> I am wondering whether it is possible to deserialize the keys and values in a
> hadoop output file where the output format is SequenceFileOutputFormat.
I wrote some code to do this, samples attached.
-Stuart
/* SeqKeyList.java -
On Tuesday 03 June 2008 08:35:10 Chris Douglas wrote:
> > I have no Java implementation of my job, sorry.
>
> Since it's all in the map side, IdentityMapper/IdentityReducer is
> fine, as long as both the splits and the number of reduce tasks are
> the same.
>
> > The data is a representation for lo
Do you mean read output file which create by SequenceFile.createWriter? If so,
maybe below code part will be useful. It reads out long integer number out from
sequence file.
SequenceFile.Reader reader = new SequenceFile.Reader(fileSys, inFile,
jobConf);
LongWritable numInside = new
Hi
I hava a question , what if my data is not originally located from HDFS.
What if my data come from DB or memory ?
I should implement a DatabaseInputFormat implements InputFormat , right ?
But , how to implement the getSplits() , and getRecordReader() ?
I looks into the sample source code for a l
I am wondering whether it is possible to deserialize the keys and values in a
hadoop output file where the output format is SequenceFileOutputFormat.
many thanks!
27 matches
Mail list logo