Unsubscribe
On Wed, Aug 12, 2020, 7:59 AM Trevor Grant wrote:
> Hey all,
>
> We got enough people to volunteer for talks that we are going to be putting
> on our very own track at ApacheCon (@Home) this year!
>
> Check out the schedule here:
>
Unsubscribe
On Thu, Oct 8, 2020, 9:14 AM Andrew Musselman wrote:
> The Apache Mahout PMC is pleased to announce the release of Mahout 14.1.
> Mahout's goal is to create an environment for quickly creating
> machine-learning applications that scale and run on the highest-performance
> parallel
cutor 0): java.lang.IllegalStateException: unread
> block data
> at
> java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2773)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1599)
> at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
> at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:427)
> at
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
> at
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
> at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:301)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Thanks a lot for your time.
> Cheers.
--
Eric Link
214.641.5465
>> like 3 day, week, or month buckets for “hot”. This is to remove
> >>> cyclical
> >>>>>> affects from the frequencies as much as possible since you need 3
> >>>> buckets
> >>>>>> to see the change in change, 2 for the change, and 1 for the event
> >>>>> volume.
> >>>>>>
> >>>>>>
> >>>>>> On Nov 10, 2017, at 4:12 PM, Pat Ferrel <p...@occamsmachete.com>
> >>> wrote:
> >>>>>>
> >>>>>> So your idea is to find anomalies in event frequencies to detect
> >> “hot”
> >>>>>> items?
> >>>>>>
> >>>>>> Interesting, maybe Ted will chime in.
> >>>>>>
> >>>>>> What I do is take the frequency, first, and second, derivatives as
> >>>>>> measures of popularity, increasing popularity, and increasingly
> >>>>> increasing
> >>>>>> popularity. Put another way popular, trending, and hot. This is
> >> simple
> >>>> to
> >>>>>> do by taking 1, 2, or 3 time buckets and looking at the number of
> >>>> events,
> >>>>>> derivative (difference), and second derivative. Ranking all items
> > by
> >>>>> these
> >>>>>> value gives various measures of popularity or its increase.
> >>>>>>
> >>>>>> If your use is in a recommender you can add a ranking field to all
> >>> items
> >>>>>> and query for “hot” by using the ranking you calculated.
> >>>>>>
> >>>>>> If you want to bias recommendations by hotness, query with user
> >>> history
> >>>>>> and boost by your hot field. I suspect the hot field will tend to
> >>>>> overwhelm
> >>>>>> your user history in this case as it would if you used anomalies
> > so
> >>>> you’d
> >>>>>> also have to normalize the hotness to some range closer to the one
> >>>>> created
> >>>>>> by the user history matching score. I haven’t found a vey good way
> >> to
> >>>> mix
> >>>>>> these in a model so use hot as a method of backfill if you cannot
> >>> return
> >>>>>> enough recommendations or in places where you may want to show
> > just
> >>> hot
> >>>>>> items. There are several benefits to this method of using hot to
> >> rank
> >>>> all
> >>>>>> items including the fact that you can apply business rules to them
> >>> just
> >>>>> as
> >>>>>> normal recommendations—so you can ask for hot in “electronics” if
> >> you
> >>>>> know
> >>>>>> categories, or hot "in-stock" items, or ...
> >>>>>>
> >>>>>> Still anomaly detection does sound like an interesting approach.
> >>>>>>
> >>>>>>
> >>>>>> On Nov 10, 2017, at 3:13 PM, Johannes Schulte <
> >>>>> johannes.schu...@gmail.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>> Hi "all",
> >>>>>>
> >>>>>> I am wondering what would be the best way to incorporate event
> > time
> >>>>>> information into the calculation of the G-Test.
> >>>>>>
> >>>>>> There is a claim here
> >>>>>> https://de.slideshare.net/tdunning/finding-changes-in-real-data
> >>>>>>
> >>>>>> saying "Time aware variant of G-Test is possible"
> >>>>>>
> >>>>>> I remember i experimented with exponentially decayed counts some
> >> years
> >>>>> ago
> >>>>>> and this involved changing the counts to doubles, but I suspect
> >> there
> >>> is
> >>>>>> some smarter way. What I don't get is the relation to a data
> >> structure
> >>>>> like
> >>>>>> T-Digest when working with a lot of counts / cells for every
> >>> combination
> >>>>> of
> >>>>>> items. Keeping a t-digest for every combination seems unfeasible.
> >>>>>>
> >>>>>> How would one incorporate event time into recommendations to
> > detect
> >>>>>> "hotness" of certain relations? Glad if someone has an idea...
> >>>>>>
> >>>>>> Cheers,
> >>>>>>
> >>>>>> Johannes
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >
>
>
--
Eric Link
214.641.5465
unsubscribe
On Thu, Feb 25, 2016 at 2:47 PM Keith Aumiller <
keith.aumil...@stlouisintegration.com> wrote:
> I use h2o and it's good with an easy interface to learn for a new user.
> Even without the R libraries
>
>
> On Thu, Feb 25, 2016 at 10:54 AM, Ted Dunning
> wrote:
We are looking at using mahout in our organization. We have a need to do
statistical analysis and do clustering and make recommendations. What is the
'sweet spot' for doing this with mahout? Meaning, what types of data sets and
data volumes are the best fit for using a tool like mahout,
Do you have a link to your stack overflow answer? Thx. - Eric
On Oct 12, 2012, at 10:54 AM, Sean Owen sro...@gmail.com wrote:
See my answer on StackOverflow. Yes it is important.
On Oct 12, 2012 4:23 PM, Ahmet Ylmaz ahmetyilmazefe...@yahoo.com wrote:
Hi,
We are planning to use Mahout for