Hi Ninad,
You can always use Java object serialization to store custom objects as files
in Hadoop distributed cache before map/reducer start running.
The thumb rule steps of such usage is-
a. Create the object while configuring your job, serialize it to a file and put
it is distributed cache
b.
Hi,
I just moved from pseudo distributed hadoop to a four machine full
distributed hadoop setup.
But, after I start the dfs, there is no live node showing up. If I make
master a slave too, then the datanode in master machine will show up.
I looked up all logs and found no errors. The only thing
Can you post your namenode's log ? It seems that your data node can not
connect to the name node.
On Wed, Mar 17, 2010 at 2:43 PM, William Kang weliam.cl...@gmail.comwrote:
Hi,
I just moved from pseudo distributed hadoop to a four machine full
distributed hadoop setup.
But, after I start
Hi Jeff,
Here is the log from my namenode:
/
2010-03-17 03:09:59,750 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
/
STARTUP_MSG: Starting NameNode
Hi Jeff,
I think I partly found out the reasons of this problem. The /etc/hosts
127.0.0.1 has the master's host name in it. And the namenode took 127.0.0.1
as the ip address of the namenode. I fixed it and I already found two nodes.
There is one still missing. I will let you guys know what
Preparing a Hadoop presentation here. For demonstration I start up a 5
machine m1.large cluster in EC2 via cloudera scripts ($hadoop-ec2
launch-cluster my-hadoop-cluster 5). Then I sent a 500 MB xml file over
into HDFS. The Mapper will receive a XML block as the key, select a
email address
Just FWD.
-- Forwarded message --
From: Edward J. Yoon edwardy...@apache.org
Date: Wed, Mar 17, 2010 at 5:47 PM
Subject: Google Research: MapReduce: The programming model and practice
To: hama-...@incubator.apache.org
FYI, http://research.google.com/pubs/pub36249.html
--
Best
Dear All,
I'm trying to run tests using MySQL as some kind of a datasource, so I
thought cloudera's sqoop would be a nice project to have in the production.
However, I'm not using the cloudera's hadoop distribution right now, and
actually I'm not thinking of switching from a main project to a
At least for MRUnit, I was not able to find it outside of the Cloudera
distribution (CDH). What I did: installing CDH locally using apt
(Ubuntu), searched for and copied the mrunit library into my local Maven
repository, and removed CDH after. I guess the same is somehow possible
for Sqoop.
Hi everybody,
as part of my project work at school I'm running some Hadoop jobs on a
cluster. I'd like to measure exactly how long each phase of the process
takes: mapping, shuffling (ideally divided in copying and sorting) and
reducing. The tasktracker logs do not seem to supply the start/end
These are good inputs Sanjay. Thanks for the help.
On Wed, Mar 17, 2010 at 11:33 AM, Sanjay Sharma sanjay.sha...@impetus.co.in
wrote:
Hi Ninad,
You can always use Java object serialization to store custom objects as
files in Hadoop distributed cache before map/reducer start running.
The
HI,
you can control the number of reducers by JobConf.setNumReduceTasks(n). The
number of mappers is defined by (file size) / (split size). By default the
split size is 64MB. Since you dataset is not very large, there should be no big
difference if you change these.
if you are only interested
Very good input not to sent the original xml over to the reducers. For
the JobConf.setNumReduceTasks(n) isn't that just a hint but the real
number will be determined based on the Partitioner I use, which will be
the default HashPartioner? One other thought I had, what will happen if
the values
Hi Reik,
the number of reducer is not a hint (mappers # is a hint). The default hash
partitioner will hash and sent records to each reducer in round-robin way
based on the reducers #. If the values list is too large to fit into heap
memory, then you will get an exception and job will fail
Folks,
Does anyone know if this earlier post ever reached a resolution? I am trying
to work through the same tutorial, and I have encountered the same issue. Of
the candidate problems Jason suggested, none of them seem to pan out in my case
(details below). I'm looking for suggestions as to
Thanks Gang, I will do some testing tomorrow - skip sending whole XML,
maybe adding some Reducers - and see where I end up.
Gang Luo wrote:
Hi Reik,
the number of reducer is not a hint (mappers # is a hint). The default hash
partitioner will hash and sent records to each reducer in
Thanks
On Mon, Mar 15, 2010 at 10:25 AM, Rekha Joshi rekha...@yahoo-inc.comwrote:
..dfs -rmr -skipTrash /user/hadoop/.Trash
recreates .Trash, on consecutive rmr...-skipTrash can be used generally if
you don't want a backup of deletes, here only to illustrate..
On 3/15/10 2:43 PM, Marcus
At the default log level, Hadoop job logs (the ones you also get in the
job's output directory under _logs/history) contain entries like the
following:
ReduceAttempt TASK_TYPE=REDUCE TASKID=tip_200809020551_0008_r_02
TASK_ATTEMPT_ID=task_200809020551_0008_r_02_0
START_TIME=1220331166789
On Mar 17, 2010, at 4:47 AM, Antonio D'Ettole wrote:
Hi everybody,
as part of my project work at school I'm running some Hadoop jobs on a
cluster. I'd like to measure exactly how long each phase of the
process
takes: mapping, shuffling (ideally divided in copying and sorting) and
reducing.
I'd like to be able to clear the contents of the jobs that have completed
running on the jobtracker webpage. Is there an easy way to do this without
restarting the cluster?
Hi Folks
The Austin HUG is meeting tomorrow night. I hope to see you there. We have
speakers from Rackspace (Stu Hood on Cassandra) and IBM (Gino Bustelo on
BigSheets).
Detailed Information is available at http://austinhug.blogspot.com/
Kind regards
Steve Watt
Hi all,
I doubt when does hadoop distributes the cache files. The moment we call
DistributedCache.addCacheFile() ? Will the time to distribute caches be counted
as part of the mapreduce job time?
Thanks,
-Gang
Hi,
Please let me know if you wil publish any kind of document, presentation,
video and else
Thanks in advance
Alexandre Jaquet
2010/3/17 Stephen Watt sw...@us.ibm.com
Hi Folks
The Austin HUG is meeting tomorrow night. I hope to see you there. We have
speakers from Rackspace (Stu Hood on
[cross posting to hive-user]
Oded - how did you create the table in Hive? Did you specify any row format
SerDe for the table? If not, then that may be the cause of this problem
since the default LazySimpleSerDe is unable to deserialize the custom
Writable key value pairs that you have used in
Thanks Ravi.
Here are some observations. I run job1 to generate some data used by the
following job2 without replication. The total size of the job 1 output is 25mb
and is in 50 files. I use distributed cache to sent all the files to nodes
running job2 tasks. When job2 starts, it stayed at map
No, I didn't specify any SerDe. I'll read up on that and see if it works.
Thanks.
-Original Message-
From: Arvind Prabhakar [mailto:arv...@cloudera.com]
Sent: Wednesday, March 17, 2010 10:40 PM
To: common-user@hadoop.apache.org; hive-u...@hadoop.apache.org
Subject: Re: WritableName
At the default log level, Hadoop job logs (the ones you also get in the
job's output directory under _logs/history)
Thanks Simone, that's exactly what I was looking for.
Look at the job history logs. They break down the times for each task
I understand you guys are talking about the same
Not sure if what you're asking is possible or not, but you could experiment
with these params to see if you could achieve a similar effect.
property
namemapred.userlog.limit.kb/name
value0/value
descriptionThe maximum size of user-logs of each task in KB. 0 disables
the cap.
/description
Moving to mapreduce-user@
On Mar 15, 2010, at 5:54 PM, abhishek sharma wrote:
Hi all,
Hadoop creates a directory (and some files) for each map and reduce
task attempts in logs/userlogs on each tasktracker.
Is there a way to configure Hadoop not to create these attempt logs?
Not really,
Hi Utku,
Apache Hadoop 0.20 cannot support Sqoop as-is. Sqoop makes use of the
DataDrivenDBInputFormat (among other APIs) which are not shipped with
Apache's 0.20 release. In order to get Sqoop working on 20, you'd need to
apply a lengthy list of patches from the project source repository to your
Alex Kozlov wrote:
Hi Brian,
Is your namenode running? Try 'hadoop fs -ls /'.
Alex
On Mar 12, 2010, at 5:20 PM, Brian Wolf brw...@gmail.com wrote:
Hi Alex,
I am back on this problem. Seems it works, but I have this issue
with connecting to server.
I can connect 'ssh localhost' ok.
31 matches
Mail list logo