Re: How to reduce number of splits in DataDrivenDBInputFormat?

2011-01-19 Thread Sonal Goyal
Which hadoop version are you on? You can alternatively try using hiho from https://github.com/sonalgoyal/hiho to get your data from the db. Please write to me directly if you need any help there. Thanks and Regards, Sonal Connect Hadoop with databases, Salesfor

Re: How to reduce number of splits in DataDrivenDBInputFormat?

2011-01-19 Thread Rohit Kelkar
you could try out this piece of code before job.waitForCompletion() FileSystem dfs = FileSystem.get(conf); long fileSize = dfs.getFileStatus(new Path(hdfsFile)).getLen(); long maxSplitSize = fileSize / NUM_OF_MAP_TASKS; //in your case NUM_OF_MAP_TASKS = 4 conf.setLong("mapred.max.split.size", maxS

Re: How to reduce number of splits in DataDrivenDBInputFormat?

2011-01-19 Thread Joan
Hi Sonal, I put both configurations: job.getConfiguration().set("mapreduce.job.maps","4"); job.getConfiguration().set("mapreduce.map.tasks","4"); But both configurations don't run. I also try to set "mapred.map.task" but It neither run. Joan 2011/1/20 Sonal Goyal > Joan, > >

Re: use counter to statistics file row number

2011-01-19 Thread venkatesh kavuluri
(Bcc general@. This is for Hadoop project level discussions. Includingmapreduce -user@) Liu, If you want the count of number of records in your input data set, the map/reduce framework provides a default counter "Map input records". The only caution to follow regarding the custom counters is to n

Re: How to reduce number of splits in DataDrivenDBInputFormat?

2011-01-19 Thread Sonal Goyal
Joan, You should be able to set the mapred.map.tasks property to the maximum number of mappers you want. This can control parallelism. Thanks and Regards, Sonal Connect Hadoop with databases, Salesforce, FTP servers and others

Re: how to write custom object using M/R

2011-01-19 Thread David Rosenstrauch
Maybe change "id" to be an IntWritable, and "str" to be a Text? HTH, DR On 01/19/2011 09:36 AM, Joan wrote: Hi Lance, My custom object has Writable implement but I don't overrride toString method? *public class MyWritable implements DBWritable, Writable, Cloneable { int id; Strin

Re: cross product of two files using MapReduce - pls suggest

2011-01-19 Thread Ashutosh Chauhan
Pig has a built-in CROSS operator. grunt> a = load 'file1'; grunt> b = load 'file2'; grunt> c = cross a,b; grunt> store c into 'file3'; Ashutosh > On Wed, Jan 19, 2011 at 03:35, Rohit Kelkar wrote: >> I have two files, A and D, containing (vectorId, vector) on each line. >> |D| = 100,000 a

Re: cross product of two files using MapReduce - pls suggest

2011-01-19 Thread Jason
I am afraid that by reading an hdfs file manually in your mapper, you are loosing data locality. You can try putting smaller vectors into distributed cache and preload them all in memory in the mapper setup. This implies that they can fit in memory and also that you can change your m/r to run ov

How to reduce number of splits in DataDrivenDBInputFormat?

2011-01-19 Thread Joan
Hi, I want to reduce number of splits because I think that I get many splits and I want to reduce these splits. While my job is running I can see: *INFO mapreduce.Job: map ∞% reduce 0%* I'm using DataDrivenDBInputFormat: * ** setInput* *public static void setInput(Job

Re: how to write custom object using M/R

2011-01-19 Thread Joan
Hi, I tried but it didnt work. I don't understand why not it works, I only want that the first reducer write my object into DHFS and the second mapper reads this object from DHFS. I'm try to write object with SequenceFileOutFormat and I've have my own Writable, obviously my object implements Wri

Re: how to write custom object using M/R

2011-01-19 Thread Joan
2011/1/18 David Rosenstrauch > I assumed you were already doing this but yes, Alain is correct, you need > to set the output format too. > > I initialize writing to sequence files like so: > > job.setOutputFormatClass(SequenceFileOutputFormat.class); > FileOutputFormat.setOutputName(job, dataSour

Re: how to write custom object using M/R

2011-01-19 Thread Joan
Hi Lance, My custom object has Writable implement but I don't overrride toString method? *public class MyWritable implements DBWritable, Writable, Cloneable { int id; String str; @Override public void readFields(ResultSet rs) throws SQLException { id = rs.getInt(1);

cross product of two files using MapReduce - pls suggest

2011-01-19 Thread Rohit Kelkar
I have two files, A and D, containing (vectorId, vector) on each line. |D| = 100,000 and |A| = 1000. Dimensionality of the vectors = 100 Now I want to execute the following for eachItem in A: for eachElem in D: dot_product = eachItem * eachElem save(dot_product) What I tried