Some people at sun have done some recent work on this -- see a blog post at
http://blogs.sun.com/jgebis/entry/hadoop_resource_utilization_and_performance,
and a subsequent post with more detail at
http://blogs.sun.com/jgebis/entry/hadoop_resource_utilization_monitoring_scripts
.
Kevin
On Thu
On Wed, Jun 3, 2009 at 10:59 AM, Tarandeep Singh wrote:
> I want to share a object (Lucene Index Writer Instance) between mappers
> running on same node of 1 job (not across multiple jobs). Please correct me
> if I am wrong -
>
> If I set the -1 for the property: mapred.job.reuse.jvm.num.tasks the
I am trying to figure out the best way to split output into different
directories. My goal is to have a directory structure allowing me to add the
content from each batch into the right bucket, like this:
...
/content/200904/batch_20090429
/content/200904/batch_20090430
/content/200904/batch_20090
On Tue, May 26, 2009 at 7:50 PM, Malcolm Matalka <
mmata...@millennialmedia.com> wrote:
> I'm using EBS volumes to have a persistent HDFS on EC2. Do I need to keep
> the master updated on how to map the internal IPs, which change as I
> understand, to a known set of host names so it knows where t
you could
scale back on datanode work entirely by setting the maximum number of
mappers or reducers to 1 per node during the day (also in
conf/hadoop-site.xml).
Kevin
On Tue, May 19, 2009 at 7:23 AM, Steve Loughran wrote:
> John Clarke wrote:
>
>> Hi,
>>
>> I am workin
Currently, we are running our cluster in EC2 with HDFS stored on the local
(i.e. transient) disk. We don't want to deal with EBS, because it
complicates being able to spin up additional slaves as needed. We're looking
at moving to a combination of s3 (block) or s3n for data that we care about,
and
On Sat, Apr 18, 2009 at 5:18 AM, hari939 wrote:
>
> My project of parsing through material for a semantic search engine
> requires
> me to use the http://nlp.stanford.edu/software/lex-parser.shtml Stanford
> NLP parser on hadoop cluster.
>
> To use the Stanford NLP parser, one must create a lex
On Tue, Apr 14, 2009 at 2:35 AM, tim robertson wrote:
>
> I am considering (for better throughput as maps generate huge request
> volumes) pregenerating all my tiles (PNG) and storing them in S3 with
> cloudfront. There will be billions of PNGs produced each at 1-3KB
> each.
>
Storing billions o
Unfortunately not. I don't have much leeway to experiment with this cluster.
-kevin
-Original Message-
From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of Jean-Daniel
Cryans
Sent: Wednesday, April 08, 2009 8:30 AM
To: core-user@hadoop.apache.org
Subject: Re: Hadoop
k' from IRC for the help.
-kevin
-Original Message-
From: Kevin Eppinger [mailto:keppin...@adknowledge.com]
Sent: Tuesday, April 07, 2009 1:05 PM
To: core-user@hadoop.apache.org
Subject: Hadoop data nodes failing to start
Hello everyone-
So I have a 5 node cluster that I've bee
java:997)
at java.lang.Thread.run(Thread.java:619)
After this the data node shuts down. This same message is appearing on all the
failed nodes. Help!
-kevin
So if I understand correctly, this is an automated system to bring up a
hadoop cluster on EC2, import some data from S3, run a job flow, write the
data back to S3, and bring down the cluster?
This seems like a pretty good deal. At the pricing they are offering, unless
I'm able to keep a cluster at
On Fri, Mar 27, 2009 at 4:39 PM, Sid123 wrote:
> But I was thinking of grouping the values and generating a key using a
> random number generator in the collector of the mapper. The values will now
> be uniformly distributed over a few keys. Say the number of keys will be
> 0.1% of the # of value
On Thu, Mar 26, 2009 at 4:38 PM, Sid123 wrote:
>
> I am working of implementing some machine learning algorithms using Map
> Red.
> I want to know that If I have data that takes 5-6 hours to train on a
> normal
> machine. Will putting in 2-3 more nodes have an effect? I read in the yahoo
> hadoop
There may be a separate issue with windows, but the error related to:
[javac] import
org.eclipse.jdt.internal.debug.ui.launcher.JavaApplicationLaunchShortcut;
is the eclipse 3.4 issue that is addressed by the patch in
https://issues.apache.org/jira/browse/HADOOP-3744
We're using JSON serialization for all our data, but we can't seem to find a
good library. We just discovered that the root cause of out of memory errors
is a leak in the net.sf.json library. Can anyone out there recommend a java
json library that they have actually used successfully within Hadoop?
On Wed, Feb 18, 2009 at 1:06 AM, sandhiya wrote:
> Thanks a million!!! It worked. but its a little weird though. I have to put
> the Library with the jdbc jars in BOTH the executable jar file AND the lib
> folder in $HADOOP_HOME. Do all of you do the same thing or is it just my
> computer acting
On Tue, Feb 3, 2009 at 5:49 PM, Amandeep Khurana wrote:
> In the setInput(...) function in DBInputFormat, there are two sets of
> arguments that one can use.
>
> 1. public static void *setInput*(JobConf
>
> a) In this, do we necessarily have to give all the fieldNames (which are
> the
> column na
On Mon, Jan 26, 2009 at 5:40 PM, Vadim Zaliva wrote:
> Is it possible to obtain auto-generated IDs when writing data using
> DBOutputFormat?
>
> For example, is it possible to write Mapper which stores records in DB
> and returns auto-generated
> IDs of these records?
...
> which I would like t
I'm trying to import Hadoop Core into our local repository using piston
( http://piston.rubyforge.org/index.html ).
I can't seem to access svn.apache.org though. I've also tried the EU
mirror. No errors, nothing but eventual timeout. Traceroute fails at
corv-car1-gw.nero.net. I got the same errors
uration so
that Hadoop has everything it needs to function. For example, I somehow have
to copy my seed urls file to the S3 bucket in a way that Hadoop can find it.
Can anyone point me in the right direction on how to do this?
2008-09-30 13:31:49,926 WARN httpclient.RestS3Service - Response
'/%
by its own, instead of sticking to the default one?
-Kevin
On Fri, Sep 5, 2008 at 8:45 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]> wrote:
> Kevin,
>
> Did you try changing the
> dfs.datanode.dns.interface/dfs.datanode.dns.nameserver/mapred.tasktra
when hadoop runs. Does anyone have an idea how I could
possibly make it work? Thank you!
-Kevin
Hi,
Does any one happen to know how to specify the replication factor of a
file when I upload it by the "hadoop dfs -put" command? Thank you!
Best,
-Kevin
It turns out that I should not set hadoop.tmp.dir to multiple
directories. Instead, I should overwrite the dfs.data.dir and
dfs.name.dir.
-Kevin
On Mon, Aug 18, 2008 at 3:03 PM, Kevin <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I guess it is not a rare use case to have hadoop d
I did not try this but maybe "-libjars" of hadoop command could help.
-Kevin
On Mon, Aug 25, 2008 at 4:06 PM, Elia Mazzawi
<[EMAIL PROTECTED]> wrote:
> ended up putting the bdb library with the hadoop library, works fine now.
>
> cp /usr/local/BerkeleyDB.4.5/lib/libdb_
Correct my previous reply. It should be after classname.
-Kevin
On Mon, Aug 25, 2008 at 3:26 PM, Kevin <[EMAIL PROTECTED]> wrote:
> Thank you. I see where I was wrong. The -D should come after "jar" AND
> before application-specific parameters.
>
> Best,
> -Kevin
Thank you. I see where I was wrong. The -D should come after "jar" AND
before application-specific parameters.
Best,
-Kevin
On Mon, Aug 25, 2008 at 2:18 PM, Chris Douglas <[EMAIL PROTECTED]> wrote:
> bin/hadoop fs -D key=value -ls
>
> works for me. Options to the Ge
Could anyone help verify this? It does not look like working here.
-Kevin
For the same key, reducer is called only once.
-Kevin
On Fri, Aug 22, 2008 at 4:06 PM, Alex Holmes <[EMAIL PROTECTED]> wrote:
> If this is the case, can the same reducer be invoked multiple times
> with the same key? And if so, would this imply that the key could
> appear on mu
IIRC, the same key will always be sent to the same reducer.
-Kevin
On Fri, Aug 22, 2008 at 4:00 PM, Alex Holmes <[EMAIL PROTECTED]> wrote:
> Hi,
>
> For a given input key, K, in a reduce task, does Hadoop guarantee that
> all mapper-emitted values for key K are available in
Is 0.18.0 supposed to be the current stable?
-Kevin
On Fri, Aug 22, 2008 at 1:44 PM, Nigel Daley <[EMAIL PROTECTED]> wrote:
> Release 0.18.0 contains many improvements, new features, bug fixes and
> optimizations.
>
> For release details and downloads, visit:
>
> http:
Why -jobconf is not recognized, and -D is overwritten by the program code?
Best,
-Kevin
On Fri, Aug 22, 2008 at 2:05 PM, Kevin <[EMAIL PROTECTED]> wrote:
> Thank you!
> -Kevin
>
>
>
> On Fri, Aug 22, 2008 at 1:53 PM, Miles Osborne <[EMAIL PROTECTED]>
Thank you!
-Kevin
On Fri, Aug 22, 2008 at 1:53 PM, Miles Osborne <[EMAIL PROTECTED]> wrote:
> yes:
>
> -jobconf mapred.job.name
>
> is your friend
>
> Miles
>
> 2008/8/22 Kevin <[EMAIL PROTECTED]>
>
>> Hi group,
>>
>> Is it p
Hi group,
Is it possible to customize the job name when using "bin/hadoop jar ..."?
Best,
-Kevin
Override "configure(JobConf job)" in your mapper class. Get the
"map.input.start" and "map.input.length" from the JobConf.
-Kevin
On Thu, Aug 21, 2008 at 2:14 PM, Qin Gao <[EMAIL PROTECTED]> wrote:
> Hi mailing,
>
> I want to get information of cu
upload(put) a file. But everything is right when I use
only one directory at each node.
Does any one know about this issue? Thank you!
Best,
-Kevin
Yes, I agree with you that it should be negotiated. That is "namenode
provides an ordered list and the client can choose some based on its
own measurements." But I am afraid 0.17.1 does not provide easy
interface for this.
-Kevin
On Thu, Aug 7, 2008 at 3:40 AM, Steve Loughr
Yes, I have looked at the block files and it matches what you said. I
am just wondering if there is some property or flag that would turn
this feature on, if it exists.
-Kevin
On Wed, Aug 6, 2008 at 8:01 PM, Taeho Kang <[EMAIL PROTECTED]> wrote:
> I guess a quick way to find an answer
I suppose you meant to sort the result globally across files. AFAIK,
This is not currently supported unless you have only one reducer. It
is said that version 0.19 will introduce such capability.
-Kevin
On Wed, Aug 6, 2008 at 6:01 PM, Xing <[EMAIL PROTECTED]> wrote:
> If I use one
Hi,
I guess this thread is old. But I eventually need to raise the
question again as I am more into dfs now. Would a line be broken
between adjacent blocks in dfs? Can line be preserved in block level?
-Kevin
On Wed, Jul 16, 2008 at 4:57 PM, Chris Douglas <[EMAIL PROTECTED]>
Thank you for the idea of submitting request. However, I guess I could
not wait until it is served. The worst case is that I would probably
hack my copy of hadoop and rebuild it.
-Kevin
On Wed, Aug 6, 2008 at 11:31 AM, lohit <[EMAIL PROTECTED]> wrote:
>>I need this because I do
out which datanode is nearest.
-Kevin
On Wed, Aug 6, 2008 at 2:31 AM, Samuel Guo <[EMAIL PROTECTED]> wrote:
> Kevin 写道:
>>
>> Hi,
>>
>> This is about dfs only, not to consider mapreduce. It may sound like a
>> strange need, but sometimes I want to read a b
overriding it seems infeasible. Neither
are the callers of chooseDataNode public or protected.
I need this because I do not want to trust namenode's ordering. For
applications where network congestion is rare, we should let the
client to decide which data node to load from.
-Kevin
On Tue,
from?
Best,
-Kevin
OK. I guess I find out how. Override the "configure" method of user
defined Map class so that you can take note of the filename.
-Kevin
On Mon, Aug 4, 2008 at 3:53 PM, Kevin <[EMAIL PROTECTED]> wrote:
> Is it possible to get this information in user defined map function?
&g
Is it possible to get this information in user defined map function?
i.e., how do we get the JobConf object in map() function?
Another way is to subclass RecordReader to embed file-name in the
data, which does not look simple.
-Kevin
On Sun, Aug 3, 2008 at 10:17 PM, Amareshwari Sriramadasu
Thank you! The java code is exactly what I want.
Following your code, I encounter the user permission issue when trying
to write to a file. I wonder if the user id could be manipulated in
the protocol.
-Kevin
On Mon, Aug 4, 2008 at 2:27 PM, Michael Bieniosek <[EMAIL PROTECTED]> wrote:
Hi there,
I am trying to use the DFS of hadoop in other applications. It is not
clear to me how that could be carried out easily. Could any one give a
direction to go or examples? Thank you.
-Kevin
Hi,
Besides knowing "data-local" and "rack-local" map task numbers, I am
interested in the size of data that are transferred on network. E.g.,
the size of intermediate map output transferred (not dealt locally). I
wonder if there is such a counter. Thank you.
Best,
-Kevin
block and apply to every replica. But in hadoop, how is
this achieved? If multiple clients write to the same block, what will
happen? Moreover, is this scenario possible under current situation?
Thanks and regards,
-Kevin
I tried a bit and it looks that lines are preserved so far. However,
is this property supported for sure, or what should I do to keep it
works in this way? Thank you.
-Kevin
On Tue, Jul 15, 2008 at 5:07 PM, Kevin <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I was trying to parse text
InputFormat may not preserve lines. If this is
the case, is it possible to restore the lines for mapper input, or I
have to drop broken lines? Thank you.
Best,
-Kevin
,
-Kevin
reducer only needs to do merge sort when it gets all the
intermediate files from different mappers).
Best,
-Kevin
Thank you, Chris. This solves my questions.
-Kevin
On Mon, Jul 14, 2008 at 11:17 AM, Chris Douglas <[EMAIL PROTECTED]> wrote:
> "Yielding equal partitions" means that each input source will offer n
> partitions and for any given partition 0 <= i < n, the records in th
ng equal
partitions" mean?
Thank you.
-Kevin
Hi,
I searched a bit but could not find the answer. What is the right way
to add (and remove) new slave nodes on run time? Thank you.
-Kevin
58 matches
Mail list logo