This is great news! I know that Twitter has done something similar with
UDFs for Pig, as described in this paper:
http://www.umiacs.umd.edu/~jimmylin/publications/Lin_Kolcz_SIGMOD2012.pdf

I'm glad to see the same thing start with Hive.

Dean


On Wed, Oct 2, 2013 at 10:21 AM, Makoto YUI <yuin...@gmail.com> wrote:

> Hello all,
>
> My employer, AIST, has given the thumbs up to open source our machine
> learning library, named Hivemall.
>
> Hivemall is a scalable machine learning library running on Hive/Hadoop,
> licensed under the LGPL 2.1.
>
>   https://github.com/myui/hivemall
>
> Hivemall provides machine learning functionality as well as feature
> engineering functions through UDFs/UDAFs/UDTFs of Hive. It is designed
> to be scalable to the number of training instances as well as the number
> of training features.
>
> Hivemall is very easy to use as every machine learning step is done
> within HiveQL.
>
> -- Installation is just as follows:
> add jar /tmp/hivemall.jar;
> source /tmp/define-all.hive;
>
> -- Logistic regression is performed by a query.
> SELECT
>   feature,
>   avg(weight) as weight
> FROM
>  (SELECT logress(features,label) as (feature,weight) FROM
> training_features) t
> GROUP BY feature;
>
> You can find detailed examples on our wiki pages.
> https://github.com/myui/hivemall/wiki/_pages
>
> Though we consider that Hivemall is much easier to use and more scalable
> than Mahout for classification/regression tasks, please check it by
> yourself. If you have a Hive environment, you can evaluate Hivemall
> within 5 minutes or so.
>
> Hope you enjoy the release! Feedback (and pull request) is always welcome.
>
> Thank you,
> Makoto
>



-- 
Dean Wampler, Ph.D.
@deanwampler
http://polyglotprogramming.com

Reply via email to