Hi Manoj
From my limited knowledge on file appends in hdfs , i have seen more
recommendations to use sync() in the latest releases than using append().
Let us wait for some commiter to authoritatively comment on 'the production
readiness of append()' . :)
Regards
Bejoy KS
On Mon, Sep 10, 2012
Hi Manoj
You can load daily logs into a individual directories in hdfs and process them
daily. Keep those results in hdfs or hbase or dbs etc. Every day do the
processing, get the results and aggregate the same with the previously
aggregated results till date.
Regards
Bejoy KS
Sent from
That is a good pointer Harsh.
Thanks a lot.
But if IdentityMapper is being used shouldn't the job.xml reflect that? But
Job.xml always shows mapper as our CustomMapper.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: Harsh J ha...@cloudera.com
Date
Ok Got it now. That is a good piece of information.
Thank You :)
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: Harsh J ha...@cloudera.com
Date: Fri, 3 Aug 2012 16:28:27
To: mapreduce-user@hadoop.apache.org; bejoy.had...@gmail.com
Cc: Mohammad
on that as well.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: Mohammad Tariq donta...@gmail.com
Date: Thu, 2 Aug 2012 15:48:42
To: mapreduce-user@hadoop.apache.org
Reply-To: mapreduce-user@hadoop.apache.org
Subject: Re: Reading fields from a Text line
!..
Regards
Bejoy KS
reduce
tasks there is no gaurentee that one task will be scheduled on each node.
It can be like 2 in one node and 1 in another.
Regards
Bejoy KS
Hi Nathan
Alternatively you can have a look at Sqoop , which offers efficient data
transfers between rdbms and hdfs.
Regards
Bejoy KS
the framework triggers Identity Mapper instead of the custom
mapper provided with the configuration.
This seems like a bug to me . Filed a jira to track this issue
https://issues.apache.org/jira/browse/MAPREDUCE-4507
Regards
Bejoy KS
that runs
mapreduce jobs, for a non security enabled cluster it is mapred.
You need to increase this to a laarge value using
mapred soft nproc 1
mapred hard nproc 1
If you are running on a security enabled cluster, this value should be
raised for the user who submits the job.
Regards
Bejoy KS
Hi Tariq
KeyValueTextInputFormat is available from hadoop 1.0.1 version on
wards for the new mapreduce API
http://hadoop.apache.org/common/docs/r1.0.1/api/org/apache/hadoop/mapreduce/lib/input/KeyValueTextInputFormat.html
Regards
Bejoy KS
On Wed, Jul 25, 2012 at 8:07 PM, Mohammad Tariq donta
.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: Robert Dyer psyb...@gmail.com
Date: Thu, 12 Jul 2012 23:03:02
To: mapreduce-user@hadoop.apache.org
Reply-To: mapreduce-user@hadoop.apache.org
Subject: Jobs randomly not starting
I'm using Hadoop 1.0.3
Hi Pedro
In simple terms Streaming API is used in hadoop if you have your mapper or
reducer is in any language other than java . Say ruby or python.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: Pedro Costa psdc1...@gmail.com
Date: Sat, 16 Jun
? Is it required for the Map Reduce to
execute on the machines which has the data stored (DFS)?
Bejoy: MR framework takes care of this. Map tasks consider data locality.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: Girish Ravi giri...@srmtech.com
Date: Tue
Hi Girish
You can achice this using reduce side joins. Use MultipleInputFormat for
parsing two different sets of log files.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: Girish Ravi giri...@srmtech.com
Date: Tue, 12 Jun 2012 12:59:32
To add on, have a look at hive and pig. Those are perfect fit for similar use
cases.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: Bejoy KS bejoy.had...@gmail.com
Date: Tue, 12 Jun 2012 13:04:33
To: mapreduce-user@hadoop.apache.org
Reply
Hi Subbu,
The file/split processed by a mapper could be obtained from
WebUI as soon as the job is executed. However this detail can't be
obtained once the job is moved to JT history.
Regards
Bejoy
On Thu, May 3, 2012 at 6:25 PM, Kasi Subrahmanyam
kasisubbu...@gmail.com wrote:
Hi,
IdentityReducer is being
triggered.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: kasi subrahmanyam kasisubbu...@gmail.com
Date: Tue, 17 Apr 2012 19:10:33
To: mapreduce-user@hadoop.apache.org
Reply-To: mapreduce-user@hadoop.apache.org
Subject: Re
(theClass)
//set final/reduce output key value types
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class)
If both map output and reduce output key value types are the same you
just need to specify the final output types.
Regards
Bejoy KS
On Tue, Apr 17
Hi Bryan
Can you post in the error stack trace?
Regards
Bejoy KS
On Tue, Apr 17, 2012 at 8:41 AM, Bryan Yeung brye...@gmail.com wrote:
Hello Bejoy,
Thanks for your reply.
Isn't that exactly what I've done with my modifications to
WordCount.java? Could you have a look at the diff I
final, so that it can't be overridden at per job level. Overriding at job
level can lead to out of memory issues, if not handled wisely at job level.
*property
namemapred.child.java.opts/name
value-Xmx2048m/value*
finaltrue/final
/property
Regards
Bejoy KS
On Thu, Apr 5, 2012 at 10:12 PM, kasi
are adding the jar in HADOOP_HOME/lib , you need to add this at
all nodes.
Regards
Bejoy KS
On Wed, Apr 4, 2012 at 12:55 PM, Utkarsh Gupta utkarsh_gu...@infosys.comwrote:
Hi Devaraj,
I have already copied the required jar file in $HADOOP_HOME/lib folder.
Can you tell me where to add generic
/name
value4/value
/property
Regards
Bejoy KS
On Tue, Apr 3, 2012 at 1:45 PM, Fang Xin nusfang...@gmail.com wrote:
Hi all,
of course it's sensible that number of nodes in the cluster will
influence map / reduce task capacity, but what determines average task
per node?
Can the number
*physical cores (it is
an approximate number)
Hope it helps!...
Regards
Bejoy KS
On Tue, Apr 3, 2012 at 1:58 PM, Bejoy Ks bejoy.had...@gmail.com wrote:
Hi Xin
Yes, the number of worker nodes do count on the map and reduce
capacity of the cluster. The map and reduce task capacity/slots
Hi Xin
In a very simple way, just include the line of code in your Driver
class to check whether the output dir exists in hdfs, if exists delete that.
Regards
Bejoy KS
On Tue, Apr 3, 2012 at 4:09 PM, Christoph Schmitz
christoph.schm...@1und1.de wrote:
Hi Xin,
you can derive your own
Hi Radim
You are correct. If there is no reduce process then there won't be
any sort and shuffle phase. The output from mapper is copied directly to
hdfs itself.
Regards
Bejoy KS
2012/3/24 Radim Kolar h...@filez.com
i have mappers only job - number of reducers set to 0. Its hadoop 0.22
Vamshi
If you have set the number of reduce slots in a node to 5 and if
you have 4 nodes, then your cluster can run a max of 5*4 = 20 reduce tasks
at a time. If more reduce tasks are present those has to wait till
reduce slots becomes available.
In reducer the data locality is not
Hi Juwei
What is the value for mapred.output.compression.codec? It'd be
better to determine whether the output files are compressed by getting the
codec of the same and not just from the size of files.
Regards
Bejoy.K.S
On Fri, Feb 17, 2012 at 12:07 PM, Juwei Shi shiju...@gmail.com wrote:
Hi Tamizh
MultiFileInputFormat / CombineFileInputFormat is typically used
where the input files are relatively small (typically less than a block
size). When you use these, there is some loss in data locality, as all the
splits a mapper process won't be in the same node.
Hi Experts
A quick question. I have quite a few map reduce jobs running on my
cluster. One job's input itself has a large number of files, I'd like to
know which split was processed by each map task without doing any custom
logging (for successful, falied killed tasks) . I tried digging
Hi Mark
Have a look at CompositeInputFormat. I guess it is what you are
looking for to achieve map side joins. If you are fine with a Reduce side
join go in with MultipleInputFormat. I have tried the same sort of joins
using MultipleInputFormat and have scribbled something on the same.
HI
AFAIK on a map reduce application developer perspective there won't be
much changes. The APIs that you use are gonna be the same. This ensure
that your existing map reduce applications can be deployed on Yarn based
cluster without any code change.
Regards
Bejoy
On Fri, Jan 13, 2012 at
0.20.203.0
--
*From:* Satish Setty (HCL Financial Services)
*Sent:* Tuesday, January 10, 2012 8:57 AM
*To:* Bejoy Ks
*Cc:* mapreduce-user@hadoop.apache.org
*Subject:* RE: hadoop
Hi Bejoy,
Thanks for help. Changed values
mapred.min.split.size=0
--
*From:* Satish Setty (HCL Financial Services)
*Sent:* Monday, January 09, 2012 1:21 PM
*To:* Bejoy Ks
*Cc:* mapreduce-user@hadoop.apache.org
*Subject:* RE: hadoop
Hi Bejoy,
In hdfs I have set block size - 40bytes . Input Data set is as below
data1 (5*8=40 bytes)
data2
..
data10
Eyal
Hope you are looking for this one
http://www.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/
Regards
Bejoy.K.S
On Sat, Jan 7, 2012 at 12:25 PM, Eyal Golan egola...@gmail.com wrote:
hi,
can you please point out link to Cloudera's article?
application [jobtracker web UI ] does this require deployment or
application server container comes inbuilt with hadoop?
Regards
--
*From:* Bejoy Ks [bejoy.had...@gmail.com]
*Sent:* Friday, January 06, 2012 12:54 AM
*To:* mapreduce-user@hadoop.apache.org
Hi Satish
Please find some pointers in line
(a) How do we know number of map tasks spawned? Can this be controlled?
We notice only 4 jvms running on a single node - namenode, datanode,
jobtracker, tasktracker. As we understand depending on number of splits
that many map tasks are
Ann
Adding on to the responses, The map outputs are transferred to
corresponding reducer over http and no through TCP.
Definitely the available hardware definitely decides on the max num of
tasks that a node can handle, it depends on number of cores, available
physical memory etc. But that
Hi Tan
Adding on to Harsh's response.
*Map Reduce Slots*
It is maximum number of map and reduce tasks that can run
concurrently on your cluster/nodes. Say if you have a 10 node cluster(10
data nodes), each node would be assigned a specific number of map and
reduce tasks it can
'final').
Arun
On Dec 8, 2011, at 10:44 PM, Bejoy Ks wrote:
Hi experts
I have a query with the job.xml file in map reduce.I set some
value in mapred-site.xml and *marked as final*, say
mapred.num.reduce.tasks=15. When I submit my job I explicitly specified the
number of reducers
Hi experts
I have a query with the job.xml file in map reduce.I set some
value in mapred-site.xml and *marked as final*, say
mapred.num.reduce.tasks=15. When I submit my job I explicitly specified the
number of reducers as -D mapred.num.reduce.tasks=4. Now as expected my my
job should
Correction not mapred.num.reduce.tasks but mapred.reduce.tasks . :)
On Fri, Dec 9, 2011 at 12:14 PM, Bejoy Ks bejoy.had...@gmail.com wrote:
Hi experts
I have a query with the job.xml file in map reduce.I set some
value in mapred-site.xml and *marked as final*, say
Burak
If you have a continuous inflow of data, you can choose flume to
aggregate the files into larger sequence files or so if they are small and
when you have a substantial chunk of data(equal to hdfs block size). You
can push that data on to hdfs based on your SLAs you need to schedule
Hi Mat
I'm not sure of an implicit mechanism in hadoop that logs the input
splits(file names) each mapper is processing. To analyze that you may have
to do some custom logging. Just log the input file name on the start of map
method. The full file path in hdfs can be obtained from the
Hi Experts
I'm currently working out to incorporate a performance test plan
for a series of hadoop jobs.My entire application consists of map reduce,
hive and flume jobs chained one after another and I need to do some
rigorous performance testing to ensure that it would never break under
with
https://issues.apache.org/jira/secure/attachment/12486023/MapReduce_NextGen_Architecture.pdffor
an overview and then start looking at various docs under
http://people.apache.org/~acmurthy/yarn/.
-- Hitesh
On Oct 13, 2011, at 7:23 AM, Bejoy KS wrote:
Hi Experts
I'm really
Hi Experts
I'm really interested in understanding the end to end
flow,functionality,components and protocols in MRv2.Currently I don't know
any thing on MRv2,so I require some document that would lead me from
scratch. Could any one help me in pointing to some good documents on the
same?
Hi Sadak
You really don't need to fire a map reduce job to copy files from
a local file system to hdfs. You can do it in two easy ways
*Using linux CLI* - if you are going in with a shell script. The most
convenient option and handy.
hadoop fs -copyFromLocal file/dir in lfs destination
John,
Did you try out map join with hive? It uses the Distributed Cache and
hash maps to achieve the goal.
set hive.auto.convert.join = true;
I have* *tried the same over joins involving huge tables and a few smaller
tables.My smaller tables where less than 25MB(configuration tables) and It
/SUPPORT/Cloudera%27s+Hadoop+Demo+VM#Cloudera%27sHadoopDemoVM-DemoVMWareImage)
for VMware and vmware player.
The VM is 64 bit but my OS is 32 bit.
What can be the solution?
** **
** **
Regards,
Shreya
** **
*From:* Bejoy KS [mailto:bejoy.had...@gmail.com]
*Sent
Shashi
Here you'd definitely need a set of map reduce process to do the
aggregation of values on the reducer. Now for sorting the output in very
simple terms use another set of map reduce where the map output key would be
the value of the first Map Reduce output and the map output value
dumbo (but I don't think you are) the above solution
may not work but I can send you a pointer.
J
On Mon, Sep 12, 2011 at 8:27 AM, Bejoy KS bejoy.had...@gmail.com wrote:
Thanks Jeremy. I tried with your first suggestion and the mappers ran into
completion. But then the reducers failed
Hi
I wanted to try out hadoop steaming and got the sample python code for
mapper and reducer. I copied both into my lfs and tried running the steaming
job as mention in the documentation.
Here the command i used to run the job
hadoop jar
Hi
I was using cloudera training VM to test out my map reduce codes
which was working really well. Now i do have some requirements to run
hive,hbase,Sqoop as well on on this VM for testing purposes. For hive and
hbase I'm able to log in on to the cli client, but none of the commands are
Hi
I'm having a query here. Is it possible to have no mappers but
reducers alone? AFAIK If we need to avoid the tyriggering of reducers we can
set numReduceTasks to zero but such a setting on mapper wont work. So how
can it be achieved if possible?
Thank You
Regards
Bejoy.K.S
Hi
Is it possible to have my mapper in Perl and reducer in java. In my
existing legacy system some larger process is being handled by Perl and the
business logic of those are really complex. It is a herculean task to
convert all the Perl to java. But the reducer business logic which is
records=0
11/09/07 14:24:22 INFO mapred.JobClient: Reduce input records=0
/me takes off troll mask.
On Wed, Sep 7, 2011 at 1:30 PM, Bejoy KS bejoy.had...@gmail.com wrote:
Thanks Sonal. I was just thinking of some weird design and wanted to
make
sure whether there is a possibility like
Hi Experts
I was working on Hadoop mapreduce 0.18 API for some time. Now I
just tried to migrate some existing application to hadoop mapreduce 0.20
API. But after the migration, It seems like the reduce logic is not working.
Map output records and reduce output records show the same
58 matches
Mail list logo