+1
From: Andrew Musselman
Sent: Thursday, January 7, 2021 2:46 PM
To: user@mahout.apache.org
Cc: Mahout Dev List
Subject: Re: PyMahout (incore) (alpha v0.1)
Thanks Trevor, looking forward to trying it out.
On Wed, Jan 6, 2021 at 5:30 PM Peng Zhang wrote:
>
Fixing some typos and mistyping:
Hi All,
Ive finished RC4 of Apache Mahout 0.14. Unfortunately Right off the bat, I can
see an issue, which will require a new RC (only SHA-1) checksums have been
included. Apache requires at least sha256. So I will -1 it myself. I'd drop
it and fix that
Hi All,
Ive finished RC4 of Apache Mahout 0.14. Unfortunately Right off the bat, I can
see an issue, which will require a new RC (only SHA-1) checksums have been
included. Apache requires at least sha256. So I will -1 it myself. I'd drop
it and fix that now, but its late.
@mahout PMCs
Signs and hashes were not generated for some artifacts. Cancelling this vote.
Will send a link for RC2 soon.
--andy
From: Andrew Palumbo
Sent: Saturday, December 14, 2019 10:36 PM
To: d...@mahout.apache.org ; user@mahout.apache.org
; priv...@mahout.apache.org
Below please find the first candidate release for Mahout 14.1
https://repository.apache.org/content/repositories/orgapachemahout-1058
Off the bat I can see some trivial naming issues that we need to work out.
This is in part due to the nested module structure.. We must come to a
consensus on
Mahout meeting notes 12.6.2019
==
A meeting was held today, Friday 6 Dec 2019 to discuss to discuss the current
state of the project, planned releases and a general path forward.
Joe Olson, Andrew Palumbo and Trevor Grant met via Google Hangouts at 10:15 AM
my mistake- i had a typo in the unsubscribe:
user-unsubscr...@mahout.apache.org
To unsubscribe, send an email there.
Thank you,
Andy
From: Andrew Palumbo
Sent: Saturday, January 12, 2019 4:34 PM
To: user@mahout.apache.org
Subject: Re: AbstractJob class
If you're still looking for the deprecated MapReduce version of AbstractJob it
is no longer In the `core` module. It can be found in the `mahout-mr` module.
If you still wish to leave the list please send a message to:
user-unsubsubscr...@mahout.apache.org
Thanks,
Andy
Looking forward to it
Probably my calendar messed it up.
Thx
--andy
On Sep 3, 2018 10:32 AM, Andrew Musselman wrote:
Huh, it's supposed to be
"Friday, September 7
9:00 – 10:00am"
On Mon, Sep 3, 2018 at 10:28 AM Andrew Palumbo wrote:
> FYI, @andrew.. the calendar invite reads 9 am Sept 3 for me (android).
>
FYI, @andrew.. the calendar invite reads 9 am Sept 3 for me (android).
Thanks Andrew talk to you then!
Please join me in congratulating Andrew Musselman as the new Chair of the
Apache Mahout Project Management Committee. I would like to thank Andrew
for stepping up, all of us who have worked with him over the years know his
dedication to the project to be invaluable. I look forward to Andrew
Thanks guys!
My apologies, I was off sync by on our board report cycle. The Mahout board
report is due for next month. I will fie the report for the October meeting.
Thanks all, and sorry for any inconvenience.
--andy
From: Andrew Palumbo <ap@outlook.com>
A reminder - the Calendar Q3 board Meeting is 20 September.
We must submit a board report at least 1 week prior.
Please gather a record of recent talks, etc. for the report due 13 September.
I will post a Google doc in the upcoming days.
--andy
Fyi @here
Sent from my Verizon Wireless 4G LTE smartphone
Original message
From: Nic Lane
Date: 08/26/2017 3:33 AM (GMT-08:00)
To: Nic Lane
Subject: CFP: IEEE Computer Magazine -- Special Issue on Mobile and Embedded
Deep Learning
Hi
I should mention that the densisty is currently set quite high, and we've been
discussing a user defined setting for this. Something that we have not worked
in yet.
From: Andrew Palumbo <ap@outlook.com>
Sent: Monday, August 21, 2017 2:44:35 PM
To
We do currently have optimizations based on density analysis in use e.g.: in
AtB.
https://github.com/apache/mahout/blob/08e02602e947ff945b9bd73ab5f0b45863df3e53/math-scala/src/main/scala/org/apache/mahout/math/scalabindings/package.scala#L431
+1 to PR. thanks for pointing this out.
--andy
Welcome again, Holden, Great to have you on board!
--andy
From: Trevor Grant
Sent: Tuesday, July 18, 2017 12:32:07 AM
To: user@mahout.apache.org; Mahout Dev List
Subject: New Committer: Holden Karau
The Project Management Committee
Please note: with the exception of newly created issues for which email
notifications will continue to be sent to d...@mahout.apache.org, all JIRA
email notifications (comments, closing, etc.) have been moved to:
iss...@mahout.apache.org.
All committers and developers, please ensure
Holden Karau <holden.ka...@gmail.com>, user@mahout.apache.org, Dmitriy
Lyubimov <dlie...@gmail.com>, Andrew Palumbo <apalu...@apache.org>
Subject: Re: [DISCUSS] Naming convention for multiple spark/scala combos
IIRC these all fit sbt’s conventons?
On Jul 7, 2017, at 2:05 PM, T
As a reminder:
Jira email comments will be moved to:
iss...@mahout.apache.org
all committers, devs and anyone else interested in Bug Fix, New Feature and
issue planning comments must subscribe:
issues-subscr...@mahout.apache.org
-- Andy
Pat - I just want to clear one point up.. Trevor volunteering to head up this
release and the git-flow plans are independent of each other. The 0.13.1
release was originally planned as a quick follow up to 0.13.0 for each
scala/spark conf combo I think this will be 6 artifacts.. spark 1.6.x -
Hello Sebsastian,
> - macOS Sierra 10.12.4 (MacBook Pro, Retina, 13-inch, Early 2015)
The native code in the javacpp modules does not have a build profile for MacOS.
We do currently have a jira open for this:
https://issues.apache.org/jira/browse/MAHOUT-1908
the fix build for mac should
der for time being?
Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org
*"Fortunate is he, who is able to know the causes of things." -Virgil*
On Tue, May 9, 2017 at 10:17 AM, Andrew Palumbo <ap
I disagree with it being too bland- I find the open space and the formatting
much easier to navigate and read docs from.
From: Khurrum Nasim
Sent: Monday, May 8, 2017 2:36:54 PM
To: Mahout Dev List; user@mahout.apache.org;
Trevor, That link takes me back to the old m.a.o page.
From: Trevor Grant
Sent: Monday, May 8, 2017 1:53:46 PM
To: Mahout Dev List; user@mahout.apache.org
Subject: New Website is Staged
Hey all,
The new website is staged. You can view
+1 :)
Sent from my Verizon Wireless 4G LTE smartphone
Original message
From: "Scott C. Cote"
Date: 05/06/2017 2:43 PM (GMT-08:00)
To: user@mahout.apache.org
Cc: Trevor Grant , Mahout Dev List
Subject:
Welcome!!
Sent from my Verizon Wireless 4G LTE smartphone
Original message
From: Jim Jagielski
Date: 05/04/2017 10:33 AM (GMT-08:00)
To: d...@mahout.apache.org
Cc: priv...@mahout.apache.org, user@mahout.apache.org, Aditya
;
> Ubuntu 16.04.1 LTS
>
>
>
> Trevor Grant
> Data Scientist
> https://github.com/rawkintrevo
> http://stackexchange.com/users/3002022/rawkintrevo
> http://trevorgrant.org
>
> *"Fortunate is he, who is able to know the causes of things." -Virgil*
>
>
+1 (binding)
Built and tested source distribution with both profiles -Pviennacl and
-Pviennacl-omp.
Ran SparseSparseDrmTimer.mscala through the shell in both pseudo cluster and
local[2] mode
Tested with several iterations and combinations of arguments eg:
-viennacl-omp.jar
I think that we need to try a different build command.
Andy
From: Andrew Palumbo <ap@outlook.com>
Sent: Wednesday, April 12, 2017 10:42:30 PM
To: user@mahout.apache.org
Subject: Re: [VOTE] Apache Mahout 0.13.0 Release Candidate
It look
It looks like we're missing some jars from the binary distro:
bin
mahout-integration-0.13.0.jar
confmahout-math-0.13.0.jar derby.log
mahout-math-scala_2.10-0.13.0.jar docs
mahout-mr-0.13.0.jar examples
Pat-
What can we do from the mahout side? Would we need any new data structures?
Trevor and I were just discussing some of the troubles of near real time
matrix streaming.
From: Pat Ferrel
Sent: Monday, March 27, 2017 2:42:55 PM
To:
That's pretty cool.
Sent from my Verizon Wireless 4G LTE smartphone
Original message
From: Ted Dunning
Date: 03/24/2017 7:22 PM (GMT-08:00)
To: user@mahout.apache.org
Cc: Mahout Dev List
Subject: Re: Marketing
On Fri, Mar 24,
+1 on revamp.
Sent from my Verizon Wireless 4G LTE smartphone
Original message
From: Trevor Grant
Date: 03/23/2017 12:36 PM (GMT-08:00)
To: user@mahout.apache.org, d...@mahout.apache.org
Subject: Marketing
Hey user and dev,
With 0.13.0 the Apache
% mahoutVersion,
"org.apache.mahout" % "mahout-math" % mahoutVersion,
"org.apache.mahout" % "mahout-hdfs" % mahoutVersion,
BTW I’ve compiled Mahout locally with Scala 2.11 so it may just be a case for
someone having time to update the release process.
I’ll
I will verify keys tonight.
Sent from my Verizon Wireless 4G LTE smartphone
Original message
From: Andrew Musselman
Date: 03/01/2017 10:20 AM (GMT-08:00)
To: user@mahout.apache.org, d...@mahout.apache.org
Subject: Re: [VOTE] Apache Mahout 0.13.0
From: Trevor Grant
Sent: Wednesday, February 1, 2017 5:01 PM
To: d...@mahout.apache.org
Subject: Starter Issues
Hey all,
I know there are some folks on here who have been interested in getting
more involved with the project,
From: Isabel Drost
Sent: Wednesday, February 1, 2017 4:55 AM
To: Dmitriy Lyubimov
Cc: user@mahout.apache.org
Subject: Re: Mahout ML vs Spark Mlib vs Mahout-Spark integreation
On Tue, Jan 31, 2017 at 04:06:36PM -0800, Dmitriy Lyubimov
Raviteja,
Before opening a Jira, could you explain what changes you made on the
d...@mahout.apache.org list, and explain the errors that you're getting?
We don't use attachments so please include in your text.
Thanks,
Andy
From: Andrew Palumbo <
Hello Raviteja,
Could you start a JIRA issue for this, and post your output there?
Instructions are in the "Making Changes" section here:
http://mahout.apache.org/developers/how-to-contribute.html
Apache Mahout: Scalable machine learning and data
* per the response in the jira...
Original message
From: Andrew Palumbo <ap@outlook.com>
Date: 07/27/2016 9:40 PM (GMT-05:00)
To: user@mahout.apache.org
Subject: RE: Text clustering how to?
Right, so per the response in the software, maybe you would be inte
,
Andy
Original message
From: Raviteja Lokineni <raviteja.lokin...@gmail.com>
Date: 07/27/2016 9:30 PM (GMT-05:00)
To: user@mahout.apache.org
Subject: RE: Text clustering how to?
Already doing that.
On Jul 27, 2016 9:28 PM, "Andrew Palumbo" <ap@outlo
I don't think the response was completely sarcastic. The point is if you want
to learn more about the subject you might do well to dig in and get your hands
dirty updating the code. It's the push/kick in the right direction that you
asked for.
Original message
From:
+1 (binding) that is.. per last email tested MR wikipedia example and spark
document classifier without issue.
Original message
From: Andrew Palumbo <ap@outlook.com>
Date: 06/10/2016 10:44 PM (GMT-05:00)
To: d...@mahout.apache.org, user@mahout.apache.org
Subject: RE:
+1 ran classify-wikipedia.sh MR script, launched shell and ran
spark-document-classifier.mscala in standalone cluster mode.
Original message
From: Andrew Musselman
Date: 06/10/2016 9:23 PM (GMT-05:00)
To: user@mahout.apache.org
Cc: mahout
bit of
background about himself.
Congratulations and Welcome!
-Andrew Palumbo
On Behalf of the Mahout PMC
+1 (binding) tested a clean source build.
From: Suneel Marthi
Sent: Wednesday, May 18, 2016 6:23:57 PM
To: mahout; user@mahout.apache.org
Subject: Re: [VOTE] Apache Mahout 0.12.1 Release
Verified {src} * {tar, zip}
Ran a clean build
Hello, the elements of the vector are not actually probabilities, they are
scores, the classification is a winner takes all approach, assigning the
classification to the class with the max score.
See: http://mahout.apache.org/users/algorithms/spark-naive-bayes.html for an
overview of the
u Andy for stepping in!
On Wed, Apr 20, 2016 at 5:00 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:
> congrats!
>
> On Wed, Apr 20, 2016 at 4:55 PM, Suneel Marthi <smar...@apache.org> wrote:
>
> > Please join me in congratulating Andrew Palumbo on becoming our new
&g
ran through mapreduce/spark examples from the binary .tar.gz distro. ran the
spark-shell and tested the spark-document-classifier.mscala script.
+1
From: Andrew Musselman
Sent: Monday, April 11, 2016 12:43 PM
To:
Built and tested src tar. Ran through classification and clustering examples
in the .zip and .tar binary distro covering Spark single machine, MapReduce
pseudo-cluster and MAHOUT_LOCAL. Ran the spark-shell with some simple
distributed matrix multiplication and I/O tests in single machine
;
> http://www.amazon.com/Apache-Mahout-MapReduce-Dmitriy-Lyubimov/dp/1523775785
> >
> >
> > On Thu, Feb 25, 2016 at 9:55 AM, Pavan K Narayanan <
> > pavan.naraya...@gmail.com> wrote:
> >
> >> Andrew, can you please attach table of contents if you don't
uce-Dmitriy-Lyubimov/dp/1523775785
>>
>>
>> On Thu, Feb 25, 2016 at 9:55 AM, Pavan K Narayanan <
>> pavan.naraya...@gmail.com> wrote:
>>
>> > Andrew, can you please attach table of contents if you don't mind.
>> > On Feb 25, 2016 8:05 AM, "Andrew Pal
The new book, "Apache Mahout: Beyond MapReduce" has been released. Written by
Mahout committers Dmitriy Lyubimov and Andrew Palumbo, this book covers
previously undocumented features of Mahout releases 0.10 and 0.11.
For more information please see the announcement
Please update to Mahout 0.11.1 for spark versions > 1.3.
Original message
From: Zhun Shen
Date: 02/23/2016 8:57 PM (GMT-05:00)
To: user@mahout.apache.org
Subject: mahout spark-itemsimilarity does not work on EMR 4.3
Hi,
mahout version: 0.11.0
EMR
thank you for reporting this. The "-el" option was removed from 'mahout
trainnb' in v0.10.0
It is now the default action.
That piece of documentation needs to be updated.
Andy
Original message
From: Andrew Musselman
Date: 02/04/2016 2:20 PM
s done , is that right ?
Thanks,
Alok Tanna
On Thu, Jan 14, 2016 at 4:26 PM, Andrew Palumbo
<ap@outlook.com<mailto:ap@outlook.com>> wrote:
The poor results you are seeing by testing are because you've run seq2sparse on
each set independently. This will create two different d
The poor results you are seeing by testing are because you've run seq2sparse on
each set independently. This will create two different dictionaries, which
serve as the vector index for each term in your vocabulary. You must use the
same dictionary that you trained your model on to vectorize
1. Downloaded and built {src} {tar}- all tests passed.
2. Started shell from {src} {bin} *{tar} distro and ran some distributed
algebra and I/O tests- no problems.
3. Ran MR Wikipedia example.
4. Ran Spark CLI naive bayes examples.
+1 (binding)
From:
The Mahout 0.11.0 Shell requires Spark 1.3. Please try with Spark 1.3.1.
On 10/08/2015 10:37 PM, go canal wrote:
> I tried Spark 1.4.1, same error. Then I saw the same error from shell
> command. So I suspect that it is the environment configuration problem.
> I have followed this
Verified source tar and zip, all tests pass.
Ran through all options of the classification and clustering examples in
the binary tar.gz distribution in pseudo-cluster mode for MR and Spark
without incident.
Ran through one option each in the .zip Classification and Clustering
examples in
Please note that mahout lucene2seq and all related classes will be
deprecated as of the upcoming Mahout 0.10.2 release.
Thank You,
Andy
. Andrew Palumbo: Verify examples locally - {binary} * {zip, tar}
4. Suneel: Verify build and tests - {source} * {zip, tar}
5. Pat: Verify examples locally - {source} * {zip, tar}
The LICENSE and NOTICE files have not been updated this time and will be
addressed in future releases.
On Sat, May 30
It looks like you have a mahout 0.9 install trying to run the mahout 0.10.0
Naive Bayes script. The command line options have changed slightly for mahout
0.10.0 MapReduce trainnb.
mahout-examples-0.9-cdh5.3.0-job.jar
15/04/27 16:41:27 WARN
Sent from my Verizon Wireless 4G LTE smartphone
After testing examples locally from .tar and .zip distribution and
testing the staged mahout-math artifact in a java application, am happy
with this release
+1 (binding)
On 04/11/2015 11:45 AM, Suneel Marthi wrote:
After checking the {source} * {tar,zip} and running a few tests locally, I
am
If you vectorized your training data with seq2sparse, you'll need to use
the df-count and dictionary from the training set. You can then
tokenize a new document with a lucene analyzer and count the term
frequencies for all terms in the dictionary. You can then use the
TFIDF class:
/divdiv
/div
Hi,
I have created the JIRA MAHOUT-1635 with respect to this issue.
Thanks,
Suman.
-Original Message-
From: Andrew Palumbo [mailto:ap@outlook.com]
Sent: Tuesday, December 16, 2014 3:06 PM
To: user@mahout.apache.org
Subject: RE: Providing classification labels to Naive Bayes
Hi Suman,
Attachments don't come through on the user list. Would you mind starting a
Jira issue for this with an small example of your data and the error that
you're receiving? This may be a feature that was not fully implemented in the
most recent MapReduce version of Naive Bayes.
Thanks,
/org/apache/mahout/classifier/naivebayes/test/TestNaiveBayesDriver.java
Thx
Jakub
On 1 December 2014 at 21:12, Andrew Palumbo ap@outlook.com wrote:
However the sequence of steps as described in Mahout Cookbook seems to me
incorrect as:
this is entirely possible
Hi Jakub,
The step that you are missing is `$mahout seqdir ...`. in this step each file
in each directory (where the directory is the Category) is converted into a
sequence file of form Text,Text where the Text key is /Category/doc_id.
`$mahout seq2sparse ...` vectorizes the output of
All input is merged into single dir:
*cp -R ${WORK_DIR}/20news-bydate*/*/* ${WORK_DIR}/20news-all*
as well the above line should read as follows.
$ cp -R ${WORK_DIR}/20news-bydate/*/* ${WORK_DIR}/20news-all
see: http://mahout.apache.org/users/classification/twenty-newsgroups.html
me modified if need be.
Thanks
Jakub
On 1 December 2014 at 17:43, Andrew Palumbo ap@outlook.com wrote:
Hi Jakub,
The step that you are missing is `$mahout seqdir ...`. in this step each
file in each directory (where the directory is the Category
The Naive Bayes model is serialized automatically as naiveBayesModel.bin in
by TrainNaiveBayesJob.java [1] (the driver for $ mahout trainnb).
To give you an idea of how, the serialization code can be found in
NaiveBayesModel.java line 135 [2]
[1]
Hello Hersheeta,
Are you vectorizing the new text using the same dictionary as you used to train
the models? If not, this will likely severely impact the performance of the
classifier.
Date: Fri, 24 Oct 2014 21:28:06 +0530
Subject: Categorization of documents using clustering and
I just built with Java 1.6 and everything was successful. I tested with 1.7
before committing it and that was successful as well.
Date: Mon, 22 Sep 2014 09:05:46 -0700
Subject: Any idea why h20 module compilation is crashing?
From: dlie...@gmail.com
To: user@mahout.apache.org
Hello,
my $SCALA_HOME is actually unset for some reason. I do have 2.10.3 on my
machine. But its not in my path. I'll try setting it and building again.
From: ap@outlook.com
To: user@mahout.apache.org
Subject: RE: Any idea why h20 module compilation is crashing?
Date: Mon, 22 Sep 2014 12:18:15
I'll get 2.10.4 and try with that.
From: ap@outlook.com
To: user@mahout.apache.org
Subject: RE: Any idea why h20 module compilation is crashing?
Date: Mon, 22 Sep 2014 12:24:19 -0400
my $SCALA_HOME is actually unset for some reason. I do have 2.10.3 on my
machine. But its not in my
yeah its building ok..
the h2o module seems to build ok here with 2.10.4. I'll double check with the
full build..
[INFO]
[INFO] BUILD SUCCESS
[INFO]
be entirely possible that it is time for us to move to a larger
cluster. I am just curious how much disk space should we expect to use for NB
on full wiki dataset ?
Thanks!
Wei
Andrew Palumbo ---08/27/2014 05:17:56 PM---Subject: RE: any pointer to run
wikipedia bayes example To: user
to the vectorization, I am not sure if it is feasible ?
Thanks a lot !
Wei
Andrew Palumbo ---08/21/2014 02:28:45 PM---Hello, Yes, If you work off of the
current trunk, you can use the classify-wiki.sh example. There i
From: Andrew Palumbo ap@outlook.com
To: user
would just need to bypass the label data part and go directly
to the vectorization, I am not sure if it is feasible ?
Thanks a lot !
Wei
Andrew Palumbo ---08/21/2014 02:28:45 PM---Hello, Yes, If you work off of the
current trunk, you can use the classify-wiki.sh example
Thank you very much !
Wei
(2) Is it legit to use some other catogeries other than
Andrew Palumbo ---08/21/2014 02:28:45 PM---Hello, Yes, If you work off of the
current trunk, you can use the classify-wiki.sh example. There i
From: Andrew Palumbo ap@outlook.com
To: user
Hello,
Yes, If you work off of the current trunk, you can use the classify-wiki.sh
example. There is currently no documentation on the Mahout site for this.
You can run this script to build and test an NB classifier for option (1) 10
arbitrary countries or option (2) 2 countries (United
Hi Toyoharu,
Mahout Naive Bayes uses Laplace smoothing (the alpha_I parameter with
default=1) to deal with terms unseen by the training set. See Rennie et al.
sec. 2.3 [1].
Your modification will certainly work, and may in fact give better results for
the problem that your working on.
Hi Subbu,
There is currently no way to update an already trained Naive Bayes Model.
You'd have to retrain on the full 2 million records.
You could probably hack TrainNaiveBayesJob.java [1] to meet your needs if you
anticipated this as something that you'd need to do in the future, but
Hi Namit,
The current Naive Bayes implementation is based on MapReduce and therefore
dependant on Hadoop. You could run mahout trainnb and mahout testnb scripts
locally by setting the environment variable MAHOUT_LOCAL=true.
This will keep everything on your local filesystem and prevent
, 6 May 2014 21:04:18 +0300
Subject: Re: Mahout Naive Bayes CSV Classification
To: user@mahout.apache.org
Yes
On Mon, May 5, 2014 at 10:51 PM, Andrew Palumbo ap@outlook.com wrote:
Jossef,
Does your training set have any features with a zero value for all
instances?
Date
from the CSV
i'm using can be found here:
https://gist.github.com/Jossef/e6c8fc0c31f0c2bf036a
On May 5, 2014 5:53 AM, Andrew Palumbo ap@outlook.com wrote:
Hi Jossef,
I can answer your first two questions for you:
1) Are these predicted values normal?
Yes, negative scores
Hi Jossef,
I can answer your first two questions for you:
1) Are these predicted values normal?
Yes, negative scores are normal.
2) For now, i'm assuming that the max value 'wins'. is that correct?
That is correct, NaiveBayes uses a winner takes all approach to to class
assignment based
Congratulations Pat!
Subject: Re: Welcome Pat Ferrel as new committer on Mahout
From: andrew.mussel...@gmail.com
Date: Thu, 24 Apr 2014 06:44:43 -0700
CC: user@mahout.apache.org
To: d...@mahout.apache.org
Great news, welcome Pat!
On Apr 24, 2014, at 3:19 AM, Sebastian Schelter
93 matches
Mail list logo