Re: Spark vs Tez

2014-10-17 Thread kartik saxena
I did a performance benchmark during my summer internship . I am currently a grad student. Can't reveal much about the specific project but Spark is still faster than around 4-5th iteration of Tez of the same query/dataset. By Iteration I mean utilizing the hot-container property of Apache Tez .

Re: The future of MapReduce

2014-07-01 Thread kartik saxena
Spark https://spark.apache.org/ is also getting a lot attention with its in-memory computations and caching features. Performance wise it is being touted better than mahout because machine learning involves iterative computations and Spark could cache these computations in-memory for faster

Re: HDFS data transfer!

2009-06-10 Thread kartik saxena
I would suppose about 2-3 hours. It took me some 2 days to load a 160 Gb file. Secura On Wed, Jun 10, 2009 at 11:56 AM, Sugandha Naolekar sugandha@gmail.comwrote:It Hello! If I try to transfer a 5GB VDI file from a remote host(not a part of hadoop cluster) into HDFS, and get it back, how

Indexing on top of Hadoop

2009-06-10 Thread kartik saxena
Hi, I have a huge LDIF file in order of GBs spanning some million user records. I am running the example Grep job on that file. The search results have not really been upto expectations because of it being a basic per line , brute force. I was thinking of building some indexes inside HDFS for