This is great news! I know that Twitter has done something similar with UDFs for Pig, as described in this paper: http://www.umiacs.umd.edu/~jimmylin/publications/Lin_Kolcz_SIGMOD2012.pdf
I'm glad to see the same thing start with Hive. Dean On Wed, Oct 2, 2013 at 10:21 AM, Makoto YUI <yuin...@gmail.com> wrote: > Hello all, > > My employer, AIST, has given the thumbs up to open source our machine > learning library, named Hivemall. > > Hivemall is a scalable machine learning library running on Hive/Hadoop, > licensed under the LGPL 2.1. > > https://github.com/myui/hivemall > > Hivemall provides machine learning functionality as well as feature > engineering functions through UDFs/UDAFs/UDTFs of Hive. It is designed > to be scalable to the number of training instances as well as the number > of training features. > > Hivemall is very easy to use as every machine learning step is done > within HiveQL. > > -- Installation is just as follows: > add jar /tmp/hivemall.jar; > source /tmp/define-all.hive; > > -- Logistic regression is performed by a query. > SELECT > feature, > avg(weight) as weight > FROM > (SELECT logress(features,label) as (feature,weight) FROM > training_features) t > GROUP BY feature; > > You can find detailed examples on our wiki pages. > https://github.com/myui/hivemall/wiki/_pages > > Though we consider that Hivemall is much easier to use and more scalable > than Mahout for classification/regression tasks, please check it by > yourself. If you have a Hive environment, you can evaluate Hivemall > within 5 minutes or so. > > Hope you enjoy the release! Feedback (and pull request) is always welcome. > > Thank you, > Makoto > -- Dean Wampler, Ph.D. @deanwampler http://polyglotprogramming.com