The conjugate gradient method has been shown to be very efficient at
solving the least squares error problem in matrix factorization:
http://www.benfrederickson.com/fast-implicit-matrix-factorization/.
This post is motivated by:
We started playing with Ignite back Hadoop, hive and spark services, and
looking to move to it as our default for deployment going forward, still
early but so far its been pretty nice and excited for the flexibility it
will provide for our particular use cases.
Would say in general its worth
, Spark SQL, MLLib
*Use case: *We are using Spark for supporting analytics on both our
relational and event data, building data products, and big data processing.
Thanks!
-Nate
Maybe some flink benefits from some pts they outline here:
http://flink.apache.org/news/2015/05/11/Juggling-with-Bits-and-Bytes.html
Probably if re-ran the benchmarks with 1.5/tungsten line would close the gap a
bit(or a lot) with spark moving towards similar style off-heap memory mgmt,
Might also want to look at Y! post, looks like they are experimenting with
similar efforts in large scale word2vec:
http://yahooeng.tumblr.com/post/118860853846/distributed-word2vec-on-top-of-pistachio
-Original Message-
From: Xiangrui Meng [mailto:men...@gmail.com]
Sent: Tuesday,
SparkSQL supports JDBC/ODBC connectivity, so if that's the route you
needed/wanted to connect through you could do so via java/php apps. Havent
used either so cant speak to the developer experience, assume its pretty
good as would be preferred method for lots of third party enterprise
Ignite guys spoke at the bigtop workshop last week at Scale, posted slides
here:
https://cwiki.apache.org/confluence/display/BIGTOP/SCALE13x
Couple main pts around comments made during the preso.., although incubating
apache (first code drop was last week I believe).., tech is battle tested
with
Cant speak to the internals of SparkSubmit and how to reproduce sans jvm,
guess would depend if you want/need to support various deployment
enviroments (stand-alone, mesos, yarn, etc)
If just need YARN, or looking at starting point, might want to look at
capabilities of YARN API: