Re: [ANNOUNCE] Mahout Con 2020 (A sub-track of ApacheCon @ Home)

2020-12-13 Thread Eric Link
Unsubscribe On Wed, Aug 12, 2020, 7:59 AM Trevor Grant wrote: > Hey all, > > We got enough people to volunteer for talks that we are going to be putting > on our very own track at ApacheCon (@Home) this year! > > Check out the schedule here: >

Re: [ANNOUNCE] Apache Mahout 14.1 Release

2020-12-13 Thread Eric Link
Unsubscribe On Thu, Oct 8, 2020, 9:14 AM Andrew Musselman wrote: > The Apache Mahout PMC is pleased to announce the release of Mahout 14.1. > Mahout's goal is to create an environment for quickly creating > machine-learning applications that scale and run on the highest-performance > parallel

Re: Error spark-mahout when spark-submit mode cluster

2018-08-08 Thread Eric Link
cutor 0): java.lang.IllegalStateException: unread > block data > at > java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2773) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1599) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278) > at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:427) > at > org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75) > at > org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:301) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Thanks a lot for your time. > Cheers. -- Eric Link 214.641.5465

Re: "LLR with time"

2017-11-19 Thread Eric Link
>> like 3 day, week, or month buckets for “hot”. This is to remove > >>> cyclical > >>>>>> affects from the frequencies as much as possible since you need 3 > >>>> buckets > >>>>>> to see the change in change, 2 for the change, and 1 for the event > >>>>> volume. > >>>>>> > >>>>>> > >>>>>> On Nov 10, 2017, at 4:12 PM, Pat Ferrel <p...@occamsmachete.com> > >>> wrote: > >>>>>> > >>>>>> So your idea is to find anomalies in event frequencies to detect > >> “hot” > >>>>>> items? > >>>>>> > >>>>>> Interesting, maybe Ted will chime in. > >>>>>> > >>>>>> What I do is take the frequency, first, and second, derivatives as > >>>>>> measures of popularity, increasing popularity, and increasingly > >>>>> increasing > >>>>>> popularity. Put another way popular, trending, and hot. This is > >> simple > >>>> to > >>>>>> do by taking 1, 2, or 3 time buckets and looking at the number of > >>>> events, > >>>>>> derivative (difference), and second derivative. Ranking all items > > by > >>>>> these > >>>>>> value gives various measures of popularity or its increase. > >>>>>> > >>>>>> If your use is in a recommender you can add a ranking field to all > >>> items > >>>>>> and query for “hot” by using the ranking you calculated. > >>>>>> > >>>>>> If you want to bias recommendations by hotness, query with user > >>> history > >>>>>> and boost by your hot field. I suspect the hot field will tend to > >>>>> overwhelm > >>>>>> your user history in this case as it would if you used anomalies > > so > >>>> you’d > >>>>>> also have to normalize the hotness to some range closer to the one > >>>>> created > >>>>>> by the user history matching score. I haven’t found a vey good way > >> to > >>>> mix > >>>>>> these in a model so use hot as a method of backfill if you cannot > >>> return > >>>>>> enough recommendations or in places where you may want to show > > just > >>> hot > >>>>>> items. There are several benefits to this method of using hot to > >> rank > >>>> all > >>>>>> items including the fact that you can apply business rules to them > >>> just > >>>>> as > >>>>>> normal recommendations—so you can ask for hot in “electronics” if > >> you > >>>>> know > >>>>>> categories, or hot "in-stock" items, or ... > >>>>>> > >>>>>> Still anomaly detection does sound like an interesting approach. > >>>>>> > >>>>>> > >>>>>> On Nov 10, 2017, at 3:13 PM, Johannes Schulte < > >>>>> johannes.schu...@gmail.com> > >>>>>> wrote: > >>>>>> > >>>>>> Hi "all", > >>>>>> > >>>>>> I am wondering what would be the best way to incorporate event > > time > >>>>>> information into the calculation of the G-Test. > >>>>>> > >>>>>> There is a claim here > >>>>>> https://de.slideshare.net/tdunning/finding-changes-in-real-data > >>>>>> > >>>>>> saying "Time aware variant of G-Test is possible" > >>>>>> > >>>>>> I remember i experimented with exponentially decayed counts some > >> years > >>>>> ago > >>>>>> and this involved changing the counts to doubles, but I suspect > >> there > >>> is > >>>>>> some smarter way. What I don't get is the relation to a data > >> structure > >>>>> like > >>>>>> T-Digest when working with a lot of counts / cells for every > >>> combination > >>>>> of > >>>>>> items. Keeping a t-digest for every combination seems unfeasible. > >>>>>> > >>>>>> How would one incorporate event time into recommendations to > > detect > >>>>>> "hotness" of certain relations? Glad if someone has an idea... > >>>>>> > >>>>>> Cheers, > >>>>>> > >>>>>> Johannes > >>>>>> > >>>>>> > >>>>>> > >>>>> > >>>> > >>>> > >>> > >> > > > > -- Eric Link 214.641.5465

Re: Algorithms of prediction

2016-02-29 Thread Eric Link
unsubscribe On Thu, Feb 25, 2016 at 2:47 PM Keith Aumiller < keith.aumil...@stlouisintegration.com> wrote: > I use h2o and it's good with an easy interface to learn for a new user. > Even without the R libraries > > > On Thu, Feb 25, 2016 at 10:54 AM, Ted Dunning > wrote:

Re: clusterpp is only writing directories for about half of my clusters.

2012-10-20 Thread Eric Link
We are looking at using mahout in our organization. We have a need to do statistical analysis and do clustering and make recommendations. What is the 'sweet spot' for doing this with mahout? Meaning, what types of data sets and data volumes are the best fit for using a tool like mahout,

Re: Recommendations for new users

2012-10-12 Thread Eric Link
Do you have a link to your stack overflow answer? Thx. - Eric On Oct 12, 2012, at 10:54 AM, Sean Owen sro...@gmail.com wrote: See my answer on StackOverflow. Yes it is important. On Oct 12, 2012 4:23 PM, Ahmet Ylmaz ahmetyilmazefe...@yahoo.com wrote: Hi, We are planning to use Mahout for