I seem to be one of the mapside join champions. For jobs that fit onto that
pattern there is usually a 100x speed improvment, compared to doing reduce
side joins, for real (large) datasets.
On Wed, Jul 15, 2009 at 12:05 PM, bonito perdo
bonito.pe...@googlemail.comwrote:
Thank you for your
that let you log what is going on in the field comparator or field
partitioner.
On Thu, Jul 16, 2009 at 11:05 PM, jason hadoop jason.had...@gmail.comwrote:
In the example code for Pro Hadoop there are some shims for the
fieldcomparator classes, that let you log what is going
Particularly for highly compressible data such as web log files, the loss in
potential data locality is more than made up for by the increase in network
transfer speed. The other somewhat unexpected side benefit is that there are
fewer map tasks with less task startup overhead. If your data is not
The namenode is pretty much driven by the number of blocks and the number of
files in your HDFS, and to a lessor extent, the rate of
create/open/write/close of files.
If you have any instability in your datanodes, there is a great increase in
namenode loading.
On Tue, Jul 14, 2009 at 4:16 AM,
Thanks Tom.
The single reducer is greatly limiting in local mode.
On Tue, Jul 14, 2009 at 3:15 AM, Tom White t...@cloudera.com wrote:
There's a Jira to fix this here:
https://issues.apache.org/jira/browse/MAPREDUCE-434
Tom
On Mon, Jul 13, 2009 at 12:34 AM, jason
If the jobtracker is set to local, there is no way to have more than 1
reducer.
On Sun, Jul 12, 2009 at 12:21 PM, Rares Vernica rvern...@gmail.com wrote:
Hello,
Is it possible to have more than one reducer in standalone mode? I am
currently using 0.17.2.1 and I do:
There is already support for tar.gz, but it is buried.
FileUtil provides a static unTar method.
This is only used currently by the DistributedCache for unpacking archives.
On Fri, Jul 10, 2009 at 2:58 AM, Andraz Tori and...@zemanta.com wrote:
Has anyone written a TarGzipCodec decompressor for
Here are the set of configuration parameters for compression from 0.19
You can enable mapred.compress.map.output, and mapred.output.compress
as well as set mapred.output.compression.type to BLOCK for a good set of
defaults.
The compression codec's very by release substantially, so I won't go
The simplest way is to swap the key and value in your mapper's output, then
swap them back afterward.
On Thu, Jul 9, 2009 at 7:52 AM, Marcus Herou marcus.he...@tailsweep.comwrote:
Hi many times I want to sort by value instead of key.
For instance when counting the top used tags in blog posts
To clarify all of the writers.
Store the values you wish to share with your map tasks, in the JobConf
object.
In the configure method of your mapper class, unpack the variables and store
them in class fields of the mapper class.
Then use them as needed in the map method of your mapper class.
On
Just out of curiosity, what happens when you run your script by hand?
On Wed, Jul 8, 2009 at 8:09 AM, Rares Vernica rvern...@gmail.com wrote:
On Tue, Jul 7, 2009 at 10:26 PM, jason hadoop jason.had...@gmail.com
wrote:
The mapper has no control at the point where your mymapper.sh script
you.
On Wed, Jul 1, 2009 at 5:13 PM, jason hadoop jason.had...@gmail.com
wrote:
The parameter mapred.local.dir controls the directory used by the task
tracker for map/reduce jobs local files.
the dfs.data.dir paramter is for the datanode.
On Wed, Jul 1, 2009 at 8:56 AM, bonito
try the cloudera distributions, they have one based on 18.3,and soon
(perhaps already) on 20.0
www.cloudera.com
On Wed, Jul 1, 2009 at 9:45 PM, akhil1988 akhilan...@gmail.com wrote:
Hi All,
Has anyone written Hadoop auto-installation script for a cluster? If yes,
please let me know.
The ChainMapper class introduced in Hadoop 19 will provide you with the
ability to have an arbitrary number of map tasks to run one after the other,
in the context of a single job.
The one issue to be aware of is that the chain of mappers only see the
output the previous map in the chain.
There
How about multi-threaded mappers?
Multi-Threaded mappers are ideal for map tasks that are non locally io bound
with many distinct endpoints.
You can also control the thread count on a per job basis.
On Sat, Jun 27, 2009 at 8:26 AM, Marcus Herou marcus.he...@tailsweep.comwrote:
The argument
I believe the cloudera 18.3 supports bzip2
On Wed, Jun 24, 2009 at 3:45 AM, Usman Waheed usm...@opera.com wrote:
Hi All,
Can I map/reduce logs that have the .bz2 extension in Hadoop 18.3?
I tried but interestingly the output was not what i expected versus what i
got when my data was in
The join package does a streaming merge sort between each part-X in your
input directories,
part- will be handled a single task,
part-0001 will be handled in a single task
and so on
These jobs are essentially io bound, and hard to beat for performance.
On Wed, Jun 24, 2009 at 2:09 PM, pmg
with 64m block size get 16 blocks
mapped to different map tasks?
jason hadoop wrote:
The join package does a streaming merge sort between each part-X in your
input directories,
part- will be handled a single task,
part-0001 will be handled in a single task
and so on
These jobs
The namenode is constantly receiving reports about what datanode has what
blocks, and performing replication when a block becomes under replicated.
On Tue, Jun 23, 2009 at 6:18 PM, Stuart White stuart.whi...@gmail.comwrote:
In my Hadoop cluster, I've had several drives fail lately (and they've
I happened to have a copy of 18.1 lying about, and the JobConf is added to
the per process runtime environment in 18.1.
The entire configuration from the JobConf object is added to the
environment, with the jobconf key names being transformed slightly. Any
character in the key name, that is not
The directory specified by the configuration parameter mapred.system.dir,
defaulting to /tmp/hadoop/mapred/system, doesn't exist.
Most likely your tmp cleaner task has removed it, and I am guessing it is
only created at cluster start time.
On Mon, Jun 22, 2009 at 6:19 PM, akhil1988
configure and close are run for each task, mapper and reducer. The configure
and close are NOT run on the combiner class.
On Mon, Jun 22, 2009 at 9:23 AM, Saptarshi Guha saptarshi.g...@gmail.comwrote:
Hello,
In a mapreduce job, a given map JVM will run N map tasks. Are the
configure and close
Check the process environment for your streaming tasks, generally the
configuration variables are exported into the process environment.
The Mapper input file is normally stored as some variant of
mapred.input.file. The reducer's input is the mapper output for that reduce,
so the input file is
HDFS/DFS client uses quite a few file descriptors for each open file.
Many application developers (but not the hadoop core) rely on the JVM
finalizer methods to close open files.
This combination, expecially when many HDFS files are open can result in
very large demands for file descriptors for
will get called, if ever.
-brian
-Original Message-
From: ext jason hadoop [mailto:jason.had...@gmail.com]
Sent: Sunday, June 21, 2009 11:19 AM
To: core-user@hadoop.apache.org
Subject: Re: Too many open files error, which gets resolved after some
time
HDFS/DFS client uses quite a few
and every file
handle that I receive from HDFS?
Regards.
2009/6/21 jason hadoop jason.had...@gmail.com
Just to be clear, I second Brian's opinion. Relying on finalizes is a
very
good way to run out of file descriptors.
On Sun, Jun 21, 2009 at 9:32 AM, brian.lev...@nokia.com wrote:
IMHO
the output of mapper only job so that we don't
get a lot number of smaller files. Sometimes you just don't want to run
reducers and unnecessarily transfer a whole lot of data across the network.
Thanks,
Tarandeep
On Wed, Jun 17, 2009 at 7:57 PM, jason hadoop jason.had...@gmail.com
wrote
binaryRead.
Please let me know if I am going wrong anywhere.
Thanks,
Akhil
jason hadoop wrote:
I have only ever used the distributed cache to add files, including
binary
files such as shared libraries.
It looks like you are adding a directory.
The DistributedCache
:59 PM, jason hadoop jason.had...@gmail.com
wrote:
Job control is coming with the Hadoop WorkFlow manager, in the mean time
there is cascade by chris wensel. I do not have any personal experience
with
either. I do not know how pipes interacts with either.
On Wed, Jun 17, 2009 at 12:43
The task id is readily available, if you override the configure method.
The MapReduceBase class in the Pro Hadoop Book examples does this and makes
the taskId available as a class field.
On Thu, Jun 18, 2009 at 7:33 AM, Mark Desnoyer mdesno...@gmail.com wrote:
Thanks! I'll try that.
-Mark
In general if the values become very large, it becomes simpler to store them
outline in hdfs, and just pass the hdfs path for the item as the value in
the map reduce task.
This greatly reduces the amount of IO done, and doesn't blow up the sort
space on the reducer.
You loose the magic of data
to this?
Thanks,
Akhil
jason hadoop wrote:
Something is happening inside of your (Parameters.
readConfigAndLoadExternalData(Config/allLayer1.config);)
code, and the framework is killing the job for not heartbeating for 600
seconds
On Tue, Jun 16, 2009 at 8:32 PM, akhil1988
www.prohadoopbook.com ?
2009/6/17 zjffdu zjf...@gmail.com
HI Jason,
Where can I download your books' Alpha Chapters, I am very interested in
your book about hadoop.
And I cannot visit the link www.prohadoopbook.com
-Original Message-
From: jason hadoop [mailto:jason.had...@gmail.com
You can open your sequence file in the mapper configure method, write to it
in your map, and close it in the mapper close method.
Then you end up with 1 sequence file per map. I am making an assumption that
each key,value to your map some how represents a single xml file/item.
On Wed, Jun 17,
Job control is coming with the Hadoop WorkFlow manager, in the mean time
there is cascade by chris wensel. I do not have any personal experience with
either. I do not know how pipes interacts with either.
On Wed, Jun 17, 2009 at 12:43 PM, Roshan James
roshan.james.subscript...@gmail.com wrote:
Is there a requirement for hadoop 0.20 for HBase 0.20?
On Wed, Jun 17, 2009 at 1:44 AM, Andrew Purtell apurt...@apache.org wrote:
Minor correction/addition: Stargate is undergoing shared development in
two github trees:
http://github.com/macdiesel/stargate/tree/master
In the examples for my book is a jvm reuse with static data shared between
jvm's example
On Tue, Jun 16, 2009 at 1:08 AM, Hello World snowlo...@gmail.com wrote:
Thanks for your reply. Can you do me a favor to make a check?
I modified mapred-default.xml as follows:
540 property
541
Pankil
On Fri, May 15, 2009 at 2:25 AM, jason hadoop jason.had...@gmail.com
wrote:
There should be a few more lines at the end.
We only want the part from last the STARTUP_MSG to the end
On one of mine a successfull start looks like this:
STARTUP_MSG: Starting DataNode
STARTUP_MSG
When you are running in local mode you have 2 basic choices if you want to
interact with a debugger.
You can launch from within eclipse or other IDE, or you can setup a java
debugger transport as part of the mapred.child.java.opts variable, and
attach to the running jvm.
By far the simplest is
Is it possible that your map class is an inner class and not static?
On Tue, Jun 16, 2009 at 10:51 AM, akhil1988 akhilan...@gmail.com wrote:
Hi All,
I am running my mapred program in local mode by setting
mapred.jobtracker.local to local mode so that I can debug my code.
The mapred program
, etc.)
but it gets stuck there(while loading some classifier) and never reaches
HI3.
This program runs fine when executed normally(without mapreduce).
Thanks, Akhil
jason hadoop wrote:
Is it possible that your map class is an inner class and not static?
On Tue, Jun 16
Your class is not in your jar, or your jar is not avialable in the hadoop
class path.
On Mon, Jun 15, 2009 at 2:39 AM, bharath vissapragada
bhara...@students.iiit.ac.in wrote:
Hi all ,
When i try to run my own progam (jar file) i get the following error.
java.lang.ClassNotFoundException :
It would be nice if there was an interface compliant way. Perhaps it becomes
available in the 0.20 and beyond api's.
On Sat, Jun 13, 2009 at 3:40 PM, Rares Vernica rvern...@gmail.com wrote:
Hello,
In Reduce, can I get the number of values for the current key without
iterating over them? Does
.
Thanks.
Schubert
On Thu, Jun 11, 2009 at 11:26 AM, jason hadoop jason.had...@gmail.com
wrote:
I had a great time, smoozing with people, and enjoyed a couple of the
talks
I would love to see more from Pria Narasimhan, hope their toolset for
automated fault detection in hadoop clusters
You can always write something simple to hand call the HashPartitioner.
Jython works for quick tests.
But the code in hash partitioner is essentially ((int) key.hashcode()) %
num reduces.
Since nothing else is in play, I suspect there is an incorrect assumption
somewhere.
On Fri, Jun 12, 2009
Also check the IO wait time on your datanodes, if the io wait time is high,
you can't win.
On Fri, Jun 12, 2009 at 11:24 AM, Brian Bockelman bbock...@cse.unl.eduwrote:
What's your replication factor? What aggregate I/O rates do you see in
Ganglia? Is the I/O spikey, or has it plateaued?
We
The reduce output may spill to disk during the sort, and if it expected to
be larger than the partition free space, unless the machine/jvm has a hugh
allowed memory space, the data will spill to disk during the sort.
If I did my math correctly, you are trying to push ~2TB through the single
My book has a small section on setting up under windows.
The key piece is that you must have a cygwin installation on the machine,
and include the cygwin installation's bin directory in your windows system
PATH environment variable. (Control Panel|System|Advanced|Environment
Variables|System
The hadoop scripts must be run from the cygin bash shell also.
It is MUCH simpler to just switch to linux :)
On Thu, Jun 11, 2009 at 6:54 AM, jason hadoop jason.had...@gmail.comwrote:
My book has a small section on setting up under windows.
The key piece is that you must have a cygwin
more hbase
that hadoop does hbase is well suited for every large application like
auction website or very community forum
thx
2009/6/11 Alexandre Jaquet alexjaq...@gmail.com
Thanks I run yet to buy your ebook !
2009/6/11 jason hadoop jason.had...@gmail.com
My book has a small
the email I provided.
One more question, does hbase provide a ConnectionFactory or SessionFactory
that can be integrated within Spring ?
Thanks
2009/6/11 jason hadoop jason.had...@gmail.com
I don't know the password for that, you will need to contact apress
support.
On Thu, Jun 11, 2009
There is always NLineInputFormat. You specify the number of lines per split.
The key is the position of the line start in the file, value is the line
itself.
The parameter mapred.line.input.format.linespermap controls the number of
lines per split
On Wed, Jun 10, 2009 at 5:27 AM, Harish
I had a great time, smoozing with people, and enjoyed a couple of the talks
I would love to see more from Pria Narasimhan, hope their toolset for
automated fault detection in hadoop clusters becomes generally available.
Zookeeper rocks on!
Hbase is starting to look really good, in 0.20 the
but
it didn't work: ERROR: The promotional code 'LUCKYYOU' does not exist.
Burt
On Tuesday 09 June 2009 10:15:24 pm jason hadoop wrote:
In honor of the Hadoop Summit on June 10th(tomorrow), Apress has agreed
to
provide some conference swag, in the form of a 50% off coupon
Purchase
CORRECTED CODE, LUCKYOU
I miss read the flyer.
On Tue, Jun 9, 2009 at 8:45 PM, jason hadoop jason.had...@gmail.com wrote:
I just sent a note to the publisher, hopefully they will fix it, especially
since I just printed up 100 flyers to give out at the hadoop summit!
On Tue, Jun 9, 2009
A writeable basically needs to implement two methods:
/**
* Serialize the fields of this object to codeout/code.
*
* @param out codeDataOuput/code to serialize this object into.
* @throws IOException
*/
void write(DataOutput out) throws IOException;
/**
* Deserialize the
A very common one is processing large quantities of log files and producing
summary date.
Another use is simply as a way of distributing large jobs across multiple
computers.
In a previous job, we used Map/Reduce for distributed bulk web crawling, and
for distributed media file processing.
On
The chapters are available for download now.
On Sat, Jun 6, 2009 at 3:33 AM, zhang jianfeng zjf...@gmail.com wrote:
Is there any resource on internet that I can get as soon as possible ?
On Fri, Jun 5, 2009 at 6:43 PM, jason hadoop jason.had...@gmail.com
wrote:
chapter 7 of my book goes
chapter 7 of my book goes into details of hour to debug with eclipse
On Fri, Jun 5, 2009 at 3:40 AM, zhang jianfeng zjf...@gmail.com wrote:
Hi all,
Some jobs I submit to hadoop failed, but I can not see what's the problem.
So is there any way to debug the hadoop job in eclipse, such as the
Are your tasks failing or completing successfully. Failed tasks have the
output directory wiped, only successfully completed tasks have the files
moved up.
I don't recall if the FileOutputCommitter class appeared in 0.18
On Wed, Jun 3, 2009 at 6:43 PM, Ian Soboroff ian.sobor...@nist.gov wrote:
you can always dump the entire property space and work it out that way.
I haven't used the 0.20 api's yet so I can't speak to them
On Tue, Jun 2, 2009 at 10:52 AM, Rares Vernica rvern...@gmail.com wrote:
On 6/2/09, randy...@comcast.net randy...@comcast.net wrote:
Your Map class needs to
At the minimal level, enable map output compression, it may make some
difference, mapred.compress.map.output.
Sorting is very expensive when there are many keys and the values are large.
Are you quite certain your keys are unique.
Also, do you need them sorted by document id?
On Thu, May 28,
Use the mapside join stuff, if I understand your problem it provides a good
solution but requires getting over the learning hurdle.
Well described in chapter 8 of my book :)
On Thu, May 28, 2009 at 8:29 AM, Chris K Wensel ch...@wensel.net wrote:
I believe PIG, and I know Cascading use a kind
Random ordering helps with per thread delays based on domain recency also
helps.
On Wed, May 27, 2009 at 6:47 AM, Ken Krugler kkrugler_li...@transpac.comwrote:
My current project is to gather stats from a lot of different documents.
We're are not indexing just getting quite specific stats for
if you launch your jobs via bin/hadoop jar jar_file [main class] [options]
you can simply specify -fs hdfs://host:port before the jar_file
On Sun, May 24, 2009 at 3:02 PM, Stas Oskin stas.os...@gmail.com wrote:
Hi.
I'm looking to move the Hadoop NameNode URL outside the hadoop-site.xml
jason hadoop jason.had...@gmail.com
if you launch your jobs via bin/hadoop jar jar_file [main class]
[options]
you can simply specify -fs hdfs://host:port before the jar_file
On Sun, May 24, 2009 at 3:02 PM, Stas Oskin stas.os...@gmail.com
wrote:
Hi.
I'm looking to move
Can you give your machines multiple IP addresses, and bind the grid server
to a different IP than the datanode
With solaris you could put it in a different zone,
On Sat, May 23, 2009 at 10:13 AM, Brian Bockelman bbock...@math.unl.eduwrote:
Hey all,
Had a problem I wanted to ask advice on.
It does not appear that any datanodes have connected to your namenode.
on the datanode machines look in the hadoop logs directory at the datanode
log files.
There should be some information there that helps you diagnose the problem.
chapter 4 of my book provides some detail on work with this
setInputPaths will take an array, or variable arguments.
or you can simply provide the directory that the individual files reside in,
and the individual files will be added.
If there are other files in the directory, you may need to specify a custom
input path filter via
The last time I had to do something like this, in the map phase, I made the
key a random value, md5 of the key, and
built a new value that had the real key embedded.
Then in the reduce phase I received the records in random order and could do
what I wanted.
By using a stable but differently
I always disable atime and it's ilk
The deadline scheduler helps with the (non xfs hanging) du datanode timeout
issues, but not much.
Ultimately that is a caching failure in the kernel, due to the hadoop io
patterns.
Anshu, any luck getting off the PAE kernels? Is this the xfs lockup, or just
When you open a file you have the option, blockSize
/**
* Opens an FSDataOutputStream at the indicated Path with write-progress
* reporting.
* @param f the file name to open
* @param permission
* @param overwrite if a file with this name already exists, then if true,
* the
= Slave1/127.0.1.1
On Thu, May 14, 2009 at 11:43 PM, jason hadoop jason.had...@gmail.com
wrote:
The data node logs are on the datanode machines in the log directory.
You may wish to buy my book and read chapter 4 on hdfs management.
On Thu, May 14, 2009 at 9:39 PM, Pankil Doshi forpan
not
impact any locality properties.
Piotr
2009/5/15 jason hadoop jason.had...@gmail.com
A downside of this approach is that you will not likely have data
locality
for the data on shared file systems, compared with data coming from an
input
split.
That being said,
from your
master in the master file we have master and secondary
node, *both *processes getting started on the two servers listed. Cant we
have master and secondary node started seperately on two machines??
On Fri, May 15, 2009 at 9:39 AM, jason hadoop jason.had...@gmail.com
wrote:
I agree with billy
Ultimately it depends on how you write the Mapper.map method.
The framework supports a MultithreadedMapRunner which lets you set the
number of threads running your map method simultaneously.
Chapter 5 of my book covers this.
On Wed, May 13, 2009 at 11:10 PM, Shengkai Zhu geniusj...@gmail.com
The customary practice is to have your Reducer.reduce method handle the
filtering if you are reducing your output.
or the Mapper.map method if you are not.
On Wed, May 13, 2009 at 1:57 PM, Asim linka...@gmail.com wrote:
Hi,
I wish to output only selective records to the output files based on
any machine put in the conf/masters file becomes a secondary namenode.
At some point there was confusion on the safety of more than one machine,
which I believe was settled, as many are safe.
The secondary namenode takes a snapshot at 5 minute (configurable)
intervals, rebuilds the fsimage and
Sort order is preserved if your Mapper doesn't change the key ordering in
output. Partition name is not preserved.
What I have done is to manually work out what the partition number of the
output file should be for each map task, by calling the partitioner on an
input key, and then renaming the
You can decommission the datanode, and then un-decommission it.
On Thu, May 14, 2009 at 7:44 AM, Alexandra Alecu
alexandra.al...@gmail.comwrote:
Hi,
I want to test how Hadoop and HBase are performing. I have a cluster with 1
namenode and 4 datanodes. I use Hadoop 0.19.1 and HBase 0.19.2.
In the mapside join, the input file name is not visible. as the input is
actually a composite a large number of files.
I have started answering in www.prohadoopbook.com
On Thu, May 14, 2009 at 1:19 PM, Stuart White stuart.whi...@gmail.comwrote:
On Thu, May 14, 2009 at 10:25 AM, jason hadoop
You can have separate configuration files for the different datanodes.
If you are willing to deal with the complexity you can manually start them
with altered properties from the command line.
rsync or other means of sharing identical configs is simple and common.
Raghu, your technique will
A downside of this approach is that you will not likely have data locality
for the data on shared file systems, compared with data coming from an input
split.
That being said,
from your script, *hadoop dfs -get FILE -* will write the file to standard
out.
On Thu, May 14, 2009 at 10:01 AM, Piotr
You have to examine the datanode log files
the namenode does not start the datanodes, the start script does.
The name node passively waits for the datanodes to connect to it.
On Thu, May 14, 2009 at 6:43 PM, Pankil Doshi forpan...@gmail.com wrote:
Hello Everyone,
Actually I had a cluster
-hadoopmaster.out
hadoop-hadoop-secondarynamenode-hadoopmaster.out.1
history
Thanks
Pankil
On Thu, May 14, 2009 at 11:27 PM, jason hadoop jason.had...@gmail.com
wrote:
You have to examine the datanode log files
the namenode does not start the datanodes, the start script does.
The name node
I agree with billy. conf/masters is misleading as the place for secondary
namenodes.
On Thu, May 14, 2009 at 8:38 PM, Billy Pearson
sa...@pearsonwholesale.comwrote:
I thank the secondary namenode is set in the masters file in the conf
folder
misleading
Billy
Rakhi Khatwani
You may wish to set the separator to the string comma space ', ' for your
example.
chapter 7 of my book goes into this in some detail, and I posted a graphic
that visually depicts the process and the values
about a month ago.
The original post was titled 'Changing key/value separator in hadoop
Close the file after you write one block, the close is synchronous.
On Tue, May 12, 2009 at 11:50 PM, Xie, Tao xietao1...@gmail.com wrote:
DFSOutputStream.writeChunk() enqueues packets into data queue and after
that
it returns. So write is asynchronous.
I want to know the total actual time
Thanks chuck, I didn't read the post and focused on the commas
On Wed, May 13, 2009 at 2:38 PM, Chuck Lam chuck@gmail.com wrote:
The behavior you saw in Streaming (list of key,value instead of key,
list
of values) is indeed intentional, and it's part of the design differences
between
/triedtested solution?
thanks again
On Mon, May 11, 2009 at 12:41 AM, jason hadoop jason.had...@gmail.com
wrote:
You can cache the block in your task, in a pinned static variable, when
you
are reusing the jvms.
On Sun, May 10, 2009 at 2:30 PM, Matt Bowyer mattbowy...@googlemail.com
wrote
Now that I think about it, the reverse lookups in my clusters work.
On Mon, May 11, 2009 at 3:07 AM, Steve Loughran ste...@apache.org wrote:
jason hadoop wrote:
You should be able to relocate the cluster's IP space by stopping the
cluster, modifying the configuration files, resetting the dns
You can cache the block in your task, in a pinned static variable, when you
are reusing the jvms.
On Sun, May 10, 2009 at 2:30 PM, Matt Bowyer mattbowy...@googlemail.comwrote:
Hi,
I am trying to do 'on demand map reduce' - something which will return in
reasonable time (a few seconds).
My
AM, Sasha Dolgy sdo...@gmail.com wrote:
yes, that is the problem. two or hundreds...data streams in very quickly.
On Fri, May 8, 2009 at 8:42 PM, jason hadoop jason.had...@gmail.com
wrote:
Is it possible that two tasks are trying to write to the same file path?
On Fri, May 8, 2009
for the reply, but do I need to include every supporting jar
file to the application path? What is the -rel-?
George
jason hadoop wrote:
1) when running under windows, include the cygwin bin directory in your
windows path environment variable
2) eclipse is not so good at submitting
looks like you have different versions of the jars, or perhaps a someone has
run ant in one of your installation directories.
On Fri, May 8, 2009 at 7:54 PM, nguyenhuynh.mr nguyenhuynh...@gmail.comwrote:
Hi all!
I cannot start hdfs successful. I checked log file and found following
message:
://www.umiacs.umd.edu/~jimmylin/publications/Lin_etal_TR2009.pdfhttp://www.umiacs.umd.edu/%7Ejimmylin/publications/Lin_etal_TR2009.pdf
.
Regards,
Jeff
On Fri, May 8, 2009 at 2:49 PM, jason hadoop jason.had...@gmail.com
wrote:
Most of the people with this need are using some variant of memcached
You should be able to relocate the cluster's IP space by stopping the
cluster, modifying the configuration files, resetting the dns and starting
the cluster.
Be best to verify connectivity with the new IP addresses before starting the
cluster.
to the best of my knowledge the namenode doesn't care
Is it possible that two tasks are trying to write to the same file path?
On Fri, May 8, 2009 at 11:46 AM, Sasha Dolgy sdo...@gmail.com wrote:
Hi Tom (or anyone else),
Will SequenceFile allow me to avoid problems with concurrent writes to the
file? I stll continue to get the following
You an set the mapred.child.java.opts on a per job basis
either via -D mapred.child.java.ops=java options or via
conf.set(mapred.child.java.opts, java options).
Note: the conf.set must be done before the job is submitted.
On Fri, May 8, 2009 at 11:57 AM, Philip Zeyliger phi...@cloudera.comwrote:
Most of the people with this need are using some variant of memcached, or
other distributed hash table.
On Fri, May 8, 2009 at 10:07 AM, Joe joe_...@yahoo.com wrote:
Hi,
As a newcomer to Hadoop, I wonder any efficient way to support shared
content among all mappers. For example, to implement
1 - 100 of 216 matches
Mail list logo