date:20110119

Re: How to reduce number of splits in DataDrivenDBInputFormat?

2011-01-19 Thread Sonal Goyal

Which hadoop version are you on? You can alternatively try using hiho from https://github.com/sonalgoyal/hiho to get your data from the db. Please write to me directly if you need any help there. Thanks and Regards, Sonal Connect Hadoop with databases, Salesfor

Re: How to reduce number of splits in DataDrivenDBInputFormat?

2011-01-19 Thread Rohit Kelkar

you could try out this piece of code before job.waitForCompletion() FileSystem dfs = FileSystem.get(conf); long fileSize = dfs.getFileStatus(new Path(hdfsFile)).getLen(); long maxSplitSize = fileSize / NUM_OF_MAP_TASKS; //in your case NUM_OF_MAP_TASKS = 4 conf.setLong("mapred.max.split.size", maxS

Re: How to reduce number of splits in DataDrivenDBInputFormat?

2011-01-19 Thread Joan

Hi Sonal, I put both configurations: job.getConfiguration().set("mapreduce.job.maps","4"); job.getConfiguration().set("mapreduce.map.tasks","4"); But both configurations don't run. I also try to set "mapred.map.task" but It neither run. Joan 2011/1/20 Sonal Goyal > Joan, > >

Re: use counter to statistics file row number

2011-01-19 Thread venkatesh kavuluri

(Bcc general@. This is for Hadoop project level discussions. Includingmapreduce -user@) Liu, If you want the count of number of records in your input data set, the map/reduce framework provides a default counter "Map input records". The only caution to follow regarding the custom counters is to n

Re: How to reduce number of splits in DataDrivenDBInputFormat?

2011-01-19 Thread Sonal Goyal

Joan, You should be able to set the mapred.map.tasks property to the maximum number of mappers you want. This can control parallelism. Thanks and Regards, Sonal Connect Hadoop with databases, Salesforce, FTP servers and others

Re: how to write custom object using M/R

2011-01-19 Thread David Rosenstrauch

Maybe change "id" to be an IntWritable, and "str" to be a Text? HTH, DR On 01/19/2011 09:36 AM, Joan wrote: Hi Lance, My custom object has Writable implement but I don't overrride toString method? *public class MyWritable implements DBWritable, Writable, Cloneable { int id; Strin

Re: cross product of two files using MapReduce - pls suggest

2011-01-19 Thread Ashutosh Chauhan

Pig has a built-in CROSS operator. grunt> a = load 'file1'; grunt> b = load 'file2'; grunt> c = cross a,b; grunt> store c into 'file3'; Ashutosh > On Wed, Jan 19, 2011 at 03:35, Rohit Kelkar wrote: >> I have two files, A and D, containing (vectorId, vector) on each line. >> |D| = 100,000 a

Re: cross product of two files using MapReduce - pls suggest

2011-01-19 Thread Jason

I am afraid that by reading an hdfs file manually in your mapper, you are loosing data locality. You can try putting smaller vectors into distributed cache and preload them all in memory in the mapper setup. This implies that they can fit in memory and also that you can change your m/r to run ov

How to reduce number of splits in DataDrivenDBInputFormat?

2011-01-19 Thread Joan

Hi, I want to reduce number of splits because I think that I get many splits and I want to reduce these splits. While my job is running I can see: *INFO mapreduce.Job: map ∞% reduce 0%* I'm using DataDrivenDBInputFormat: * ** setInput* *public static void setInput(Job

Re: how to write custom object using M/R

2011-01-19 Thread Joan

Hi, I tried but it didnt work. I don't understand why not it works, I only want that the first reducer write my object into DHFS and the second mapper reads this object from DHFS. I'm try to write object with SequenceFileOutFormat and I've have my own Writable, obviously my object implements Wri

Re: how to write custom object using M/R

2011-01-19 Thread Joan

2011/1/18 David Rosenstrauch > I assumed you were already doing this but yes, Alain is correct, you need > to set the output format too. > > I initialize writing to sequence files like so: > > job.setOutputFormatClass(SequenceFileOutputFormat.class); > FileOutputFormat.setOutputName(job, dataSour

Re: how to write custom object using M/R

2011-01-19 Thread Joan

Hi Lance, My custom object has Writable implement but I don't overrride toString method? *public class MyWritable implements DBWritable, Writable, Cloneable { int id; String str; @Override public void readFields(ResultSet rs) throws SQLException { id = rs.getInt(1);

cross product of two files using MapReduce - pls suggest

2011-01-19 Thread Rohit Kelkar

I have two files, A and D, containing (vectorId, vector) on each line. |D| = 100,000 and |A| = 1000. Dimensionality of the vectors = 100 Now I want to execute the following for eachItem in A: for eachElem in D: dot_product = eachItem * eachElem save(dot_product) What I tried

Re: How to reduce number of splits in DataDrivenDBInputFormat?

Re: How to reduce number of splits in DataDrivenDBInputFormat?

Re: How to reduce number of splits in DataDrivenDBInputFormat?

Re: use counter to statistics file row number

Re: How to reduce number of splits in DataDrivenDBInputFormat?

Re: how to write custom object using M/R

Re: cross product of two files using MapReduce - pls suggest

Re: cross product of two files using MapReduce - pls suggest

How to reduce number of splits in DataDrivenDBInputFormat?

Re: how to write custom object using M/R

Re: how to write custom object using M/R

Re: how to write custom object using M/R

cross product of two files using MapReduce - pls suggest

13 matches

Site Navigation

Mail list logo

Footer information