Hi,
I was new to hadoop. Sorry for my novice question.
I got some problem while I was trying to use task side-effect files.
Since there is no code example in wiki, I tried this way:
I override cofigure method in reducer to create a side file,
public void configure(JobConf conf){
I have seen EC2 be slower than a comparable system in development, but
not by the factors that you're experiencing. One thing about EC2 that
has concerned me - you are not guaranteed that your "/mnt" disk is an
uncontested spindle. Early on, this was the case, but Amazon made no
promises.
A
You could use hadoop daemonlog to get and set LOG levels
to set FSNameSystem to DEBUG you would do something like this
>hadoop daemonlog -setlevel namenode:50070 org.apache.hadoop.dfs.FSNameSystem
>DEBUG
Thanks,
Lohit
- Original Message
From: Cagdas Gerede <[EMAIL PROTECTED]>
To: core-
On Thu, 10 Apr 2008, Ted Dunning wrote:
Are you trying to read from mySQL?
No, we're outputting to MySQL. I've also verified that the MySQL server is
hardly seeing any load, isn't waiting on slow queries, etc.
If so, it isn't very surprising that you could get lower performance
with more re
I am faced with the exact same problem described here, does anybody know how
to resolve this?
You can write your own output format (extending TextOutputFormat) and simply
add the header when you create the output file, and the footer right before
you close it.
RB
On Tue, Apr 8, 2008 at 12:42 PM, ncardoso <[EMAIL PROTECTED]> wrote:
>
> Hello.
>
> I'm using Hadoop to process several XML fi
I Tried in both ways i am still i am getting some errors
--- import org.apache.tools.ant.BuildException; (error: cannot be resolved..)
--- public Socket createSocket() throws IOException {
--- s = socketFactory.createSocket(); (error: incorrect parameters)
earlier it failed to resolve this pack
I should mention that the mogile available generally is not suitable for
large installs.
We had to make significant changes to get it to work correctly. We are
figuring out how to contribute these back, but may have to fork the project
to do it.
On 4/10/08 12:21 PM, "Todd Troxell" <[EMAIL PROT
Ted Dunning wrote:
I haven't done a detailed comparison, but I have seen some effects:
A) raid doesn't usually work really well on low-end machines compared to
independent drives. This would make me distrust raid.
B) hadoop doesn't do very well, historically speaking with more than one
parti
On Thu, Apr 10, 2008 at 09:18:02AM -0700, Ted Dunning wrote:
> Hadoop also does much better with spindles spread across many machines.
> Putting 16 TB on each of two nodes is distinctly sub-optimal on many fronts.
> Much better to put 0.5-2TB on 16-64 machines. With 2x1TB SATA drives, your
> cost
I haven't done a detailed comparison, but I have seen some effects:
A) raid doesn't usually work really well on low-end machines compared to
independent drives. This would make me distrust raid.
B) hadoop doesn't do very well, historically speaking with more than one
partition if the partition
We're building a cluster of 40 machines with 5 drives each, and I'm
curious what people's experiences have been for using RAID-0 for HDFS
vs. configuring seperate partitions (JBOD) and having the datanode
balance between them.
I took a look at the datanode code, and datanodes appear to write b
Hadoop also does much better with spindles spread across many machines.
Putting 16 TB on each of two nodes is distinctly sub-optimal on many fronts.
Much better to put 0.5-2TB on 16-64 machines. With 2x1TB SATA drives, your
cost and performance are likely to both be better than two machines with
Are you trying to read from mySQL?
If so, it isn't very surprising that you could get lower performance with
more readers.
On 4/9/08 7:07 PM, "Nate Carlson" <[EMAIL PROTECTED]> wrote:
> Hey all,
>
> We've got a job that we're running in both a development environment, and
> out on EC2. I've
Hello list readers,
i'm still looking for an explanation.
maybe i put it into this way: i run a job and at the end i need to know
how many instances of a specific type were written into an output file.
i wanted to rely on the counters. but maybe this is not a good idea
anyway. or what is the o
This has been my experience as well. This should be mentioned in the
Getting Started pages until resolved.
-colin
On Thu, Apr 10, 2008 at 10:54 AM, Michaela Buergle <
[EMAIL PROTECTED]> wrote:
> Hi all,
> on http://wiki.apache.org/hadoop/GettingStartedWithHadoop - it says:
> "Do not format a
Hi all,
on http://wiki.apache.org/hadoop/GettingStartedWithHadoop - it says:
"Do not format a running Hadoop filesystem, this will cause all your
data to be erased."
It seems to me however that currently you better not format a Hadoop
filesystem at all (after the first time, that is), running or n
On Apr 9, 2008, at 7:30 AM, Aayush Garg wrote:
But the problem is that I need to sort according to freq which is
the part
of my value field...
Any inputs?? Could you provide smal piece of code of your thought
The second job would use the InverseMapper from
org.apache.hadoop.mapred.lib.Inv
At the root of the source and it's called build.xml
Jean-Daniel
2008/4/9, Khalil Honsali <[EMAIL PROTECTED]>:
>
> Mr. Jean-Daniel,
>
> where is the ant script please?
>
>
> On 10/04/2008, Jean-Daniel Cryans <[EMAIL PROTECTED]> wrote:
> >
> > The ANT script works well also.
> >
> > Jean-Daniel
> >
On Thu, Apr 10, 2008 at 09:47:59AM +0200, Mads Toftum wrote:
> On Wed, Apr 09, 2008 at 09:42:36PM -0500, Todd Troxell wrote:
> > I was unable to access the archives for this list as
> > http://hadoop.apache.org/mail/core-user/ returns 403.
> >
> You're probably looking for
> http://mail-archives.a
On 4/10/08 4:42 AM, "Todd Troxell" <[EMAIL PROTECTED]> wrote:
> Hello list,
Howdy.
> I am interested in using HDFS for storage, and for map/reduce only
> tangentially. I see clusters mentioned in the docs with many many nodes and
> 9TB of disk.
>
> Is HDFS expected to scale to > 100TB?
On Wed, Apr 09, 2008 at 09:42:36PM -0500, Todd Troxell wrote:
> I was unable to access the archives for this list as
> http://hadoop.apache.org/mail/core-user/ returns 403.
>
You're probably looking for
http://mail-archives.apache.org/mod_mbox/hadoop-core-user/
vh
Mads Toftum
--
http://soulfood
22 matches
Mail list logo