I did a performance benchmark during my summer internship . I am currently
a grad student. Can't reveal much about the specific project but Spark is
still faster than around 4-5th iteration of Tez of the same query/dataset.
By Iteration I mean utilizing the hot-container property of Apache Tez .
Spark https://spark.apache.org/ is also getting a lot attention with its
in-memory computations and caching features. Performance wise it is being
touted better than mahout because machine learning involves iterative
computations and Spark could cache these computations in-memory for faster
I would suppose about 2-3 hours. It took me some 2 days to load a 160 Gb
file.
Secura
On Wed, Jun 10, 2009 at 11:56 AM, Sugandha Naolekar
sugandha@gmail.comwrote:It
Hello!
If I try to transfer a 5GB VDI file from a remote host(not a part of hadoop
cluster) into HDFS, and get it back, how
Hi,
I have a huge LDIF file in order of GBs spanning some million user records.
I am running the example Grep job on that file. The search results have
not really been
upto expectations because of it being a basic per line , brute force.
I was thinking of building some indexes inside HDFS for