Re: Reg: parsing all files file append

2012-09-10 Thread Bejoy Ks
Hi Manoj From my limited knowledge on file appends in hdfs , i have seen more recommendations to use sync() in the latest releases than using append(). Let us wait for some commiter to authoritatively comment on 'the production readiness of append()' . :) Regards Bejoy KS On Mon, Sep 10, 2012

Re: Reg: parsing all files file append

2012-09-09 Thread Bejoy KS
Hi Manoj You can load daily logs into a individual directories in hdfs and process them daily. Keep those results in hdfs or hbase or dbs etc. Every day do the processing, get the results and aggregate the same with the previously aggregated results till date. Regards Bejoy KS Sent from

Re: Reading fields from a Text line

2012-08-03 Thread Bejoy KS
That is a good pointer Harsh. Thanks a lot. But if IdentityMapper is being used shouldn't the job.xml reflect that? But Job.xml always shows mapper as our CustomMapper. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: Harsh J ha...@cloudera.com Date

Re: Reading fields from a Text line

2012-08-03 Thread Bejoy KS
Ok Got it now. That is a good piece of information. Thank You :) Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: Harsh J ha...@cloudera.com Date: Fri, 3 Aug 2012 16:28:27 To: mapreduce-user@hadoop.apache.org; bejoy.had...@gmail.com Cc: Mohammad

Re: Reading fields from a Text line

2012-08-02 Thread Bejoy KS
on that as well. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: Mohammad Tariq donta...@gmail.com Date: Thu, 2 Aug 2012 15:48:42 To: mapreduce-user@hadoop.apache.org Reply-To: mapreduce-user@hadoop.apache.org Subject: Re: Reading fields from a Text line

Re: Reading fields from a Text line

2012-08-02 Thread Bejoy Ks
!.. Regards Bejoy KS

Re: All reducers are not being utilized

2012-08-02 Thread Bejoy Ks
reduce tasks there is no gaurentee that one task will be scheduled on each node. It can be like 2 in one node and 1 in another. Regards Bejoy KS

Re: DBOutputWriter timing out writing to database

2012-08-02 Thread Bejoy Ks
Hi Nathan Alternatively you can have a look at Sqoop , which offers efficient data transfers between rdbms and hdfs. Regards Bejoy KS

Re: Reading fields from a Text line

2012-08-02 Thread Bejoy Ks
the framework triggers Identity Mapper instead of the custom mapper provided with the configuration. This seems like a bug to me . Filed a jira to track this issue https://issues.apache.org/jira/browse/MAPREDUCE-4507 Regards Bejoy KS

Re: Error reading task output

2012-07-27 Thread Bejoy Ks
that runs mapreduce jobs, for a non security enabled cluster it is mapred. You need to increase this to a laarge value using mapred soft nproc 1 mapred hard nproc 1 If you are running on a security enabled cluster, this value should be raised for the user who submits the job. Regards Bejoy KS

Re: KeyValueTextInputFormat absent in hadoop-0.20.205

2012-07-25 Thread Bejoy Ks
Hi Tariq KeyValueTextInputFormat is available from hadoop 1.0.1 version on wards for the new mapreduce API http://hadoop.apache.org/common/docs/r1.0.1/api/org/apache/hadoop/mapreduce/lib/input/KeyValueTextInputFormat.html Regards Bejoy KS On Wed, Jul 25, 2012 at 8:07 PM, Mohammad Tariq donta

Re: Jobs randomly not starting

2012-07-12 Thread Bejoy KS
. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: Robert Dyer psyb...@gmail.com Date: Thu, 12 Jul 2012 23:03:02 To: mapreduce-user@hadoop.apache.org Reply-To: mapreduce-user@hadoop.apache.org Subject: Jobs randomly not starting I'm using Hadoop 1.0.3

Re: Streaming in mapreduce

2012-06-16 Thread Bejoy KS
Hi Pedro In simple terms Streaming API is used in hadoop if you have your mapper or reducer is in any language other than java . Say ruby or python. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: Pedro Costa psdc1...@gmail.com Date: Sat, 16 Jun

Re: Map/Reduce | Multiple node configuration

2012-06-12 Thread Bejoy KS
? Is it required for the Map Reduce to execute on the machines which has the data stored (DFS)? Bejoy: MR framework takes care of this. Map tasks consider data locality. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: Girish Ravi giri...@srmtech.com Date: Tue

Re: Need logical help

2012-06-12 Thread Bejoy KS
Hi Girish You can achice this using reduce side joins. Use MultipleInputFormat for parsing two different sets of log files. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: Girish Ravi giri...@srmtech.com Date: Tue, 12 Jun 2012 12:59:32

Re: Need logical help

2012-06-12 Thread Bejoy KS
To add on, have a look at hive and pig. Those are perfect fit for similar use cases. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: Bejoy KS bejoy.had...@gmail.com Date: Tue, 12 Jun 2012 13:04:33 To: mapreduce-user@hadoop.apache.org Reply

Re: Getting filename in case of MultipleInputs

2012-05-03 Thread Bejoy Ks
Hi Subbu, The file/split processed by a mapper could be obtained from WebUI as soon as the job is executed. However this detail can't be obtained once the job is moved to JT history. Regards Bejoy On Thu, May 3, 2012 at 6:25 PM, Kasi Subrahmanyam kasisubbu...@gmail.com wrote: Hi,

Re: Reducer not firing

2012-04-17 Thread Bejoy KS
IdentityReducer is being triggered. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: kasi subrahmanyam kasisubbu...@gmail.com Date: Tue, 17 Apr 2012 19:10:33 To: mapreduce-user@hadoop.apache.org Reply-To: mapreduce-user@hadoop.apache.org Subject: Re

Re: map and reduce with different value classes

2012-04-16 Thread Bejoy Ks
(theClass) //set final/reduce output key value types         job.setOutputKeyClass(Text.class);         job.setOutputValueClass(IntWritable.class) If both map output and reduce output key value types are the same you just need to specify the final output types. Regards Bejoy KS On Tue, Apr 17

Re: map and reduce with different value classes

2012-04-16 Thread Bejoy Ks
Hi Bryan Can you post in the error stack trace? Regards Bejoy KS On Tue, Apr 17, 2012 at 8:41 AM, Bryan Yeung brye...@gmail.com wrote: Hello Bejoy, Thanks for your reply. Isn't that exactly what I've done with my modifications to WordCount.java?  Could you have a look at the diff I

Re: Unable to set the heap size on Amazon elastic mapreduce

2012-04-05 Thread Bejoy Ks
final, so that it can't be overridden at per job level. Overriding at job level can lead to out of memory issues, if not handled wisely at job level. *property namemapred.child.java.opts/name value-Xmx2048m/value* finaltrue/final /property Regards Bejoy KS On Thu, Apr 5, 2012 at 10:12 PM, kasi

Re: Including third party jar files in Map Reduce job

2012-04-04 Thread Bejoy Ks
are adding the jar in HADOOP_HOME/lib , you need to add this at all nodes. Regards Bejoy KS On Wed, Apr 4, 2012 at 12:55 PM, Utkarsh Gupta utkarsh_gu...@infosys.comwrote: Hi Devaraj, I have already copied the required jar file in $HADOOP_HOME/lib folder. Can you tell me where to add generic

Re: What determines the map task / reduce task capacity? average task per node?

2012-04-03 Thread Bejoy Ks
/name value4/value /property Regards Bejoy KS On Tue, Apr 3, 2012 at 1:45 PM, Fang Xin nusfang...@gmail.com wrote: Hi all, of course it's sensible that number of nodes in the cluster will influence map / reduce task capacity, but what determines average task per node? Can the number

Re: What determines the map task / reduce task capacity? average task per node?

2012-04-03 Thread Bejoy Ks
*physical cores (it is an approximate number) Hope it helps!... Regards Bejoy KS On Tue, Apr 3, 2012 at 1:58 PM, Bejoy Ks bejoy.had...@gmail.com wrote: Hi Xin Yes, the number of worker nodes do count on the map and reduce capacity of the cluster. The map and reduce task capacity/slots

Re: how to overwrite output in HDFS?

2012-04-03 Thread Bejoy Ks
Hi Xin In a very simple way, just include the line of code in your Driver class to check whether the output dir exists in hdfs, if exists delete that. Regards Bejoy KS On Tue, Apr 3, 2012 at 4:09 PM, Christoph Schmitz christoph.schm...@1und1.de wrote: Hi Xin, you can derive your own

Re: Mappers only job, output sorted?

2012-03-24 Thread Bejoy Ks
Hi Radim You are correct. If there is no reduce process then there won't be any sort and shuffle phase. The output from mapper is copied directly to hdfs itself. Regards Bejoy KS 2012/3/24 Radim Kolar h...@filez.com i have mappers only job - number of reducers set to 0. Its hadoop 0.22

Re: basic doubt on number of reduce tasks

2012-03-02 Thread Bejoy Ks
Vamshi If you have set the number of reduce slots in a node to 5 and if you have 4 nodes, then your cluster can run a max of 5*4 = 20 reduce tasks at a time. If more reduce tasks are present those has to wait till reduce slots becomes available. In reducer the data locality is not

Re: reduce output compression of Terasort

2012-02-17 Thread Bejoy Ks
Hi Juwei What is the value for mapred.output.compression.codec? It'd be better to determine whether the output files are compressed by getting the codec of the same and not just from the size of files. Regards Bejoy.K.S On Fri, Feb 17, 2012 at 12:07 PM, Juwei Shi shiju...@gmail.com wrote:

Re: num of reducer

2012-02-17 Thread Bejoy Ks
Hi Tamizh MultiFileInputFormat / CombineFileInputFormat is typically used where the input files are relatively small (typically less than a block size). When you use these, there is some loss in data locality, as all the splits a mapper process won't be in the same node.

Identify splits processed by each mapper

2012-01-16 Thread Bejoy Ks
Hi Experts A quick question. I have quite a few map reduce jobs running on my cluster. One job's input itself has a large number of files, I'd like to know which split was processed by each map task without doing any custom logging (for successful, falied killed tasks) . I tried digging

Re: What is the right way to do map-side joins in Hadoop 1.0?

2012-01-15 Thread Bejoy Ks
Hi Mark Have a look at CompositeInputFormat. I guess it is what you are looking for to achieve map side joins. If you are fine with a Reduce side join go in with MultipleInputFormat. I have tried the same sort of joins using MultipleInputFormat and have scribbled something on the same.

Re: Newbie Question On Next Generation Map Reduce

2012-01-13 Thread Bejoy Ks
HI AFAIK on a map reduce application developer perspective there won't be much changes. The APIs that you use are gonna be the same. This ensure that your existing map reduce applications can be deployed on Yarn based cluster without any code change. Regards Bejoy On Fri, Jan 13, 2012 at

Re: hadoop - increase number of map tasks.

2012-01-12 Thread Bejoy Ks
0.20.203.0 -- *From:* Satish Setty (HCL Financial Services) *Sent:* Tuesday, January 10, 2012 8:57 AM *To:* Bejoy Ks *Cc:* mapreduce-user@hadoop.apache.org *Subject:* RE: hadoop Hi Bejoy, Thanks for help. Changed values mapred.min.split.size=0

Re: hadoop

2012-01-09 Thread Bejoy Ks
-- *From:* Satish Setty (HCL Financial Services) *Sent:* Monday, January 09, 2012 1:21 PM *To:* Bejoy Ks *Cc:* mapreduce-user@hadoop.apache.org *Subject:* RE: hadoop Hi Bejoy, In hdfs I have set block size - 40bytes . Input Data set is as below data1 (5*8=40 bytes) data2 .. data10

Re: Is it possible to user hadoop archive to specify third party libs

2012-01-07 Thread Bejoy Ks
Eyal Hope you are looking for this one http://www.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/ Regards Bejoy.K.S On Sat, Jan 7, 2012 at 12:25 PM, Eyal Golan egola...@gmail.com wrote: hi, can you please point out link to Cloudera's article?

Re: hadoop

2012-01-07 Thread Bejoy Ks
application [jobtracker web UI ] does this require deployment or application server container comes inbuilt with hadoop? Regards -- *From:* Bejoy Ks [bejoy.had...@gmail.com] *Sent:* Friday, January 06, 2012 12:54 AM *To:* mapreduce-user@hadoop.apache.org

Re: hadoop

2012-01-05 Thread Bejoy Ks
Hi Satish Please find some pointers in line (a) How do we know number of map tasks spawned? Can this be controlled? We notice only 4 jvms running on a single node - namenode, datanode, jobtracker, tasktracker. As we understand depending on number of splits that many map tasks are

Re: Map Reduce Phase questions:

2011-12-17 Thread Bejoy Ks
Ann Adding on to the responses, The map outputs are transferred to corresponding reducer over http and no through TCP. Definitely the available hardware definitely decides on the max num of tasks that a node can handle, it depends on number of cores, available physical memory etc. But that

Re: Re: About slots of tasktracker and munber of map taskers

2011-12-12 Thread Bejoy Ks
Hi Tan Adding on to Harsh's response. *Map Reduce Slots* It is maximum number of map and reduce tasks that can run concurrently on your cluster/nodes. Say if you have a 10 node cluster(10 data nodes), each node would be assigned a specific number of map and reduce tasks it can

Re: Are the values available in job.xml the actual values used for job

2011-12-10 Thread Bejoy Ks
'final'). Arun On Dec 8, 2011, at 10:44 PM, Bejoy Ks wrote: Hi experts I have a query with the job.xml file in map reduce.I set some value in mapred-site.xml and *marked as final*, say mapred.num.reduce.tasks=15. When I submit my job I explicitly specified the number of reducers

Are the values available in job.xml the actual values used for job

2011-12-08 Thread Bejoy Ks
Hi experts I have a query with the job.xml file in map reduce.I set some value in mapred-site.xml and *marked as final*, say mapred.num.reduce.tasks=15. When I submit my job I explicitly specified the number of reducers as -D mapred.num.reduce.tasks=4. Now as expected my my job should

Re: Are the values available in job.xml the actual values used for job

2011-12-08 Thread Bejoy Ks
Correction not mapred.num.reduce.tasks but mapred.reduce.tasks . :) On Fri, Dec 9, 2011 at 12:14 PM, Bejoy Ks bejoy.had...@gmail.com wrote: Hi experts I have a query with the job.xml file in map reduce.I set some value in mapred-site.xml and *marked as final*, say

Re: Running a job continuously

2011-12-05 Thread Bejoy Ks
Burak If you have a continuous inflow of data, you can choose flume to aggregate the files into larger sequence files or so if they are small and when you have a substantial chunk of data(equal to hdfs block size). You can push that data on to hdfs based on your SLAs you need to schedule

Re: determining what files made up a failing task

2011-12-04 Thread Bejoy Ks
Hi Mat I'm not sure of an implicit mechanism in hadoop that logs the input splits(file names) each mapper is processing. To analyze that you may have to do some custom logging. Just log the input file name on the start of map method. The full file path in hdfs can be obtained from the

Performance test practices for hadoop jobs - capturing metrics

2011-11-14 Thread Bejoy Ks
Hi Experts I'm currently working out to incorporate a performance test plan for a series of hadoop jobs.My entire application consists of map reduce, hive and flume jobs chained one after another and I need to do some rigorous performance testing to ensure that it would never break under

Re: Next Gen map reduce(MRV2) - Tutorial

2011-10-14 Thread Bejoy KS
with https://issues.apache.org/jira/secure/attachment/12486023/MapReduce_NextGen_Architecture.pdffor an overview and then start looking at various docs under http://people.apache.org/~acmurthy/yarn/. -- Hitesh On Oct 13, 2011, at 7:23 AM, Bejoy KS wrote: Hi Experts I'm really

Next Gen map reduce(MRV2) - Tutorial

2011-10-13 Thread Bejoy KS
Hi Experts I'm really interested in understanding the end to end flow,functionality,components and protocols in MRv2.Currently I don't know any thing on MRv2,so I require some document that would lead me from scratch. Could any one help me in pointing to some good documents on the same?

Re: Hadoop file uploads

2011-10-04 Thread Bejoy KS
Hi Sadak You really don't need to fire a map reduce job to copy files from a local file system to hdfs. You can do it in two easy ways *Using linux CLI* - if you are going in with a shell script. The most convenient option and handy. hadoop fs -copyFromLocal file/dir in lfs destination

Re: Out of heap space errors on TTs

2011-09-19 Thread Bejoy KS
John, Did you try out map join with hive? It uses the Distributed Cache and hash maps to achieve the goal. set hive.auto.convert.join = true; I have* *tried the same over joins involving huge tables and a few smaller tables.My smaller tables where less than 25MB(configuration tables) and It

Re: Issues starting TaskTracker

2011-09-14 Thread Bejoy KS
/SUPPORT/Cloudera%27s+Hadoop+Demo+VM#Cloudera%27sHadoopDemoVM-DemoVMWareImage) for VMware and vmware player. The VM is 64 bit but my OS is 32 bit. What can be the solution? ** ** ** ** Regards, Shreya ** ** *From:* Bejoy KS [mailto:bejoy.had...@gmail.com] *Sent

Re: How to sort key,value pair by value(In ascending)

2011-09-14 Thread Bejoy KS
Shashi Here you'd definitely need a set of map reduce process to do the aggregation of values on the reducer. Now for sorting the output in very simple terms use another set of map reduce where the map output key would be the value of the first Map Reduce output and the map output value

Re: Hadoop Streaming job Fails - Permission Denied error

2011-09-13 Thread Bejoy KS
dumbo (but I don't think you are) the above solution may not work but I can send you a pointer. J On Mon, Sep 12, 2011 at 8:27 AM, Bejoy KS bejoy.had...@gmail.com wrote: Thanks Jeremy. I tried with your first suggestion and the mappers ran into completion. But then the reducers failed

Hadoop Streaming job Fails - Permission Denied error

2011-09-12 Thread Bejoy KS
Hi I wanted to try out hadoop steaming and got the sample python code for mapper and reducer. I copied both into my lfs and tried running the steaming job as mention in the documentation. Here the command i used to run the job hadoop jar

Hive and Hbase not working with cloudera VM

2011-09-08 Thread Bejoy KS
Hi I was using cloudera training VM to test out my map reduce codes which was working really well. Now i do have some requirements to run hive,hbase,Sqoop as well on on this VM for testing purposes. For hive and hbase I'm able to log in on to the cli client, but none of the commands are

No Mapper but Reducer

2011-09-07 Thread Bejoy KS
Hi I'm having a query here. Is it possible to have no mappers but reducers alone? AFAIK If we need to avoid the tyriggering of reducers we can set numReduceTasks to zero but such a setting on mapper wont work. So how can it be achieved if possible? Thank You Regards Bejoy.K.S

Perl Mapper with Java Reducer

2011-09-07 Thread Bejoy KS
Hi Is it possible to have my mapper in Perl and reducer in java. In my existing legacy system some larger process is being handled by Perl and the business logic of those are really complex. It is a herculean task to convert all the Perl to java. But the reducer business logic which is

Re: No Mapper but Reducer

2011-09-07 Thread Bejoy KS
records=0 11/09/07 14:24:22 INFO mapred.JobClient: Reduce input records=0 /me takes off troll mask. On Wed, Sep 7, 2011 at 1:30 PM, Bejoy KS bejoy.had...@gmail.com wrote: Thanks Sonal. I was just thinking of some weird design and wanted to make sure whether there is a possibility like

Hadoop Mapreduce 0.20 - reduce job not executing as desired

2011-09-06 Thread Bejoy KS
Hi Experts I was working on Hadoop mapreduce 0.18 API for some time. Now I just tried to migrate some existing application to hadoop mapreduce 0.20 API. But after the migration, It seems like the reduce logic is not working. Map output records and reduce output records show the same