Re: how to create mapreduce program for matrix computation

2012-03-30 Thread Jagat
Dear Fang There are considerable tutorials available for doing matrix operations. For example just a quick google search for hadoop matrix multiplications has given this result. http://www.norstad.org/matrix-multiply/index.html Is there any specific matrix operation you are looking for ? If yo

Re: I cannot post message

2012-03-30 Thread Fang Xin
I see, thank you very much On Sat, Mar 31, 2012 at 12:13 PM, Subir S wrote: > You won't receive ur post, others will. You are able to post; Proof is my > reply. > > On 3/31/12, Fang Xin wrote: >> all >> >> sorry to bother, as a new user, it seems that I cannot post anything. >> I've tried twice

Re: I cannot post message

2012-03-30 Thread Subir S
You won't receive ur post, others will. You are able to post; Proof is my reply. On 3/31/12, Fang Xin wrote: > all > > sorry to bother, as a new user, it seems that I cannot post anything. > I've tried twice yesterday, but I didn't receive my own post... > > can anyone enlighten me? thanks >

I cannot post message

2012-03-30 Thread Fang Xin
all sorry to bother, as a new user, it seems that I cannot post anything. I've tried twice yesterday, but I didn't receive my own post... can anyone enlighten me? thanks

Re: Read key and values from HDFS

2012-03-30 Thread Harsh J
Hi, Just use "hadoop fs -text ". It would read most of these files without breaking a sweat :) You can look at its implementation inside FsShell.java if you want to implement/reuse things in Java. On Fri, Mar 30, 2012 at 11:01 PM, kasi subrahmanyam wrote: > Hi Pedro i am not sure we have a sing

Re: Best practices configuring libraries on the backend.

2012-03-30 Thread Dmitriy Lyubimov
yes. JAVA_LIBRARY_PATH seems to be the approach that works (rather than just putting it into tasktracker_opts etc.) Thanks. On Wed, Mar 28, 2012 at 9:57 PM, Harsh J wrote: > George, > > This ought to work. Did you restart all your TTs to have it set into effect? > > Also, the right way to do thi

Re: Read key and values from HDFS

2012-03-30 Thread kasi subrahmanyam
Hi Pedro i am not sure we have a single method for reading the data in output files for different otutput formats. But for sequence files we can use SequenceFile.Reader class in the API to read the sequence files. On Fri, Mar 30, 2012 at 10:49 PM, Pedro Costa wrote: > > The ReduceTask can save t

Re: Read key and values from HDFS

2012-03-30 Thread GUOJUN Zhu
I believe there is no requirement to save both key and value for the OutputFormat, therefore, it is not guaranteed that you can extract (key,value) pair from a file generated by an arbitrary OutputFormat. Zhu, Guojun Modeling Sr Graduate 571-3824370 guojun_...@freddiemac.com Financial Engineeri

Making gzip splittable for Hadoop

2012-03-30 Thread Niels Basjes
Hi, In many Hadoop production environments you get gzipped files as the raw input. Usually these are Apache HTTPD logfiles. When putting these gzipped files into Hadoop you are stuck with exactly 1 map task per input file. In many scenarios this is fine. However when doing a lot of work in this ve

Re: Performance improvement-Cluster vs Pseudo

2012-03-30 Thread Markus Jelsma
A Hadoop cluster of low-end machines (2 cores, 2GB RAM) can, with a parsing fetcher and proper configuration and evenly distributed fetch lists, process up to ~15 URL's per second per node. Such a machine has a single mapper and a single reducer running because of limited memory. On Friday 30 M

Re: Performance improvement-Cluster vs Pseudo

2012-03-30 Thread ashish vyas
@ Christoph: Thanks for replying. I would try with more nodes/larger url set to see how much improvement in processing time i get from cluster. @mapreduce-mailing-community: It would be great if anybody can help me with Nutch benchmark on small cluster since it would help me in determining no. of

AW: Performance improvement-Cluster vs Pseudo

2012-03-30 Thread Christoph Schmitz
Hi Ashish, IMHO your numbers (2 machines, 10 URLs) are way too small to outweigh the natural overhead that occurs with a distributed computation (distributing the program code, coordinating the distributed file system, making sure everybody is starting and stopping, etc.). Also, if you're web c

Performance improvement-Cluster vs Pseudo

2012-03-30 Thread ashish vyas
> > Hi, > > > I have setup hadoop clutser(2 node cluster) and I am running Nutch crawl > on it. I am trying to compare results and improvement in processing time > when I crawl with 10 URL’s and depth 2. When I am running the crawl on > cluster its taking more time than pseudo cluster which in turn