Does anyone know how to use caffe through hadoop MR jobs?

2017-01-13 Thread Madhav Sharan
if someone who has experience with caffe and hadoop can share their thoughts on this. Our project is at [0] in case more detail on use case is required. [0] - https://github.com/USCDataScience/hadoop-pot/ Thanks -- Madhav Sharan

Re: merging small files in HDFS

2016-11-03 Thread Madhav Sharan
- https://github.com/USCDataScience/hadoop-pot/blob/master/hadoop-pot-core/src/main/java/org/pooledtimeseries/seqfile/TextVectorsToSequenceFile.java [1] - Blog for handling small files - http://blog.cloudera.com/blog/2009/02/the-small-files-problem/ Cheers! -- Madhav Sharan On Thu, Nov 3, 2016 at 6

Re: Fast way to read thousands of double value in hadoop jobs

2016-08-18 Thread Madhav Sharan
in SeqFile and my map jobs are faster. -- Madhav Sharan On Wed, Aug 17, 2016 at 11:07 PM, Daniel Haviv <danielru...@gmail.com> wrote: > Store them within a sequencefile > > > On Thursday, 18 August 2016, Madhav Sharan <msha...@usc.edu> wrote: > >> Hi , can so

Fast way to read thousands of double value in hadoop jobs

2016-08-17 Thread Madhav Sharan
/USCDataScience/hadoop-pot/blob/master/src/main/java/org/pooledtimeseries/PoT.java#L596 -- Madhav Sharan

Pairwise similarity using map reduce

2016-08-10 Thread Madhav Sharan
so that I read file only once and my mapper jobs receive contents of file rather than file path. Can someone please share any technique they have used in past that might help? Thanks -- Madhav Sharan

Re: All nodes are not used

2016-08-09 Thread Madhav Sharan
Thanks Mahesh Till now I am not able to run the whole job in a limited time period. So I am looking for optimizations and resource utilization. May be I can try tweaking input split size if it helps. Thanks for your help, It explains the behaviour -- Madhav Sharan On Tue, Aug 9, 2016 at 1:28

Re: All nodes are not used

2016-08-09 Thread Madhav Sharan
was doing experiments and if I split input file into N files where N = number of cores then my job starts running on all cores. So may be I need to look at split size. Any trick to set split size = number of cores? I can try adjusting mapred.min.split.size manually otherwise. -- Madhav Sharan

Re: All nodes are not used

2016-08-08 Thread Madhav Sharan
/ MeanChiSquareDistanceCalculation.java#L135 -- Madhav Sharan

All nodes are not used

2016-08-08 Thread Madhav Sharan
/MeanChiSquareDistanceCalculation.java#L135 -- Madhav Sharan

Reduce time reading files from HDFS in mapper jobs

2016-08-02 Thread Madhav Sharan
ttp://blog.cloudera.com/blog/2009/02/the-small-files-problem/ -- Madhav Sharan

Re: Output File could only be replicated to 0 nodes

2016-08-01 Thread Madhav Sharan
: 7336382464 (6.83 GB) Non DFS Used: 60541867008 (56.38 GB) DFS Remaining: 6155422720 (5.73 GB) DFS Used%: 9.91% DFS Remaining%: 8.31% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 847 -- Madhav Sharan On Mon, Jul

Re: Reading video files from HDFS using OpenCV

2016-07-25 Thread Madhav Sharan
Thanks a lot Ron. It helps -- Madhav Sharan On Sun, Jul 24, 2016 at 2:19 PM, Ron Gonzalez <zlgonza...@yahoo.com> wrote: > In a manner of speaking. I would imagine that you would like to take > advantage of resource management that comes with yarn. If you're planning > to make

Output File could only be replicated to 0 nodes

2016-07-24 Thread Madhav Sharan
$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1255) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449) -- Madhav Sharan

Re: Reading video files from HDFS using OpenCV

2016-07-24 Thread Madhav Sharan
Hi Ron, Thanks for replying. Unfortunately I could not find a VideoCapture method accepting stream input. I will look into second option. Will it be similar to copying file from hdfs to a tmp directory and then using tmp file? -- Madhav Sharan On Sun, Jul 24, 2016 at 12:08 PM, Ron's Yahoo

Reading video files from HDFS using OpenCV

2016-07-24 Thread Madhav Sharan
s.java#L66 -- Madhav Sharan