Re: PyMahout (incore) (alpha v0.1)

2021-01-10 Thread Andrew Palumbo
+1 From: Andrew Musselman Sent: Thursday, January 7, 2021 2:46 PM To: user@mahout.apache.org Cc: Mahout Dev List Subject: Re: PyMahout (incore) (alpha v0.1) Thanks Trevor, looking forward to trying it out. On Wed, Jan 6, 2021 at 5:30 PM Peng Zhang wrote: >

[ANNOUNCE] Mahout 14.1 RC4

2020-02-01 Thread Andrew Palumbo
Fixing some typos and mistyping: Hi All, Ive finished RC4 of Apache Mahout 0.14. Unfortunately Right off the bat, I can see an issue, which will require a new RC (only SHA-1) checksums have been included. Apache requires at least sha256. So I will -1 it myself. I'd drop it and fix that

[ANNOUNCE] Mahout 0.14 RC4

2020-02-01 Thread Andrew Palumbo
Hi All, Ive finished RC4 of Apache Mahout 0.14. Unfortunately Right off the bat, I can see an issue, which will require a new RC (only SHA-1) checksums have been included. Apache requires at least sha256. So I will -1 it myself. I'd drop it and fix that now, but its late. @mahout PMCs

Re: [ANNOUNCE] Apache Mahout 0.14.1 RC-1

2019-12-16 Thread Andrew Palumbo
Signs and hashes were not generated for some artifacts. Cancelling this vote. Will send a link for RC2 soon. --andy From: Andrew Palumbo Sent: Saturday, December 14, 2019 10:36 PM To: d...@mahout.apache.org ; user@mahout.apache.org ; priv...@mahout.apache.org

[ANNOUNCE] Apache Mahout 0.14.1 RC-1

2019-12-14 Thread Andrew Palumbo
Below please find the first candidate release for Mahout 14.1 https://repository.apache.org/content/repositories/orgapachemahout-1058 Off the bat I can see some trivial naming issues that we need to work out. This is in part due to the nested module structure.. We must come to a consensus on

[MEETING NOTES] 10 AM Friday 6 Jan 2019. Google Hangouts

2019-12-06 Thread Andrew Palumbo
Mahout meeting notes 12.6.2019 == A meeting was held today, Friday 6 Dec 2019 to discuss to discuss the current state of the project, planned releases and a general path forward. Joe Olson, Andrew Palumbo and Trevor Grant met via Google Hangouts at 10:15 AM

Re: AbstractJob class not found exception

2019-01-13 Thread Andrew Palumbo
my mistake- i had a typo in the unsubscribe: user-unsubscr...@mahout.apache.org To unsubscribe, send an email there. Thank you, Andy From: Andrew Palumbo Sent: Saturday, January 12, 2019 4:34 PM To: user@mahout.apache.org Subject: Re: AbstractJob class

Re: AbstractJob class not found exception

2019-01-12 Thread Andrew Palumbo
If you're still looking for the deprecated MapReduce version of AbstractJob it is no longer In the `core` module. It can be found in the `mahout-mr` module. If you still wish to leave the list please send a message to: user-unsubsubscr...@mahout.apache.org Thanks, Andy

Re: Planning moved to 28th

2018-09-14 Thread Andrew Palumbo
Looking forward to it

Re: Friday hangout

2018-09-03 Thread Andrew Palumbo
Probably my calendar messed it up. Thx --andy On Sep 3, 2018 10:32 AM, Andrew Musselman wrote: Huh, it's supposed to be "Friday, September 7 9:00 – 10:00am" On Mon, Sep 3, 2018 at 10:28 AM Andrew Palumbo wrote: > FYI, @andrew.. the calendar invite reads 9 am Sept 3 for me (android). >

Re: Friday hangout

2018-09-03 Thread Andrew Palumbo
FYI, @andrew.. the calendar invite reads 9 am Sept 3 for me (android).

Re: Friday hangout

2018-09-03 Thread Andrew Palumbo
Thanks Andrew talk to you then!

[ANNOUNCE] Andrew Musselman, New Mahout PMC Chair

2018-07-18 Thread Andrew Palumbo
Please join me in congratulating Andrew Musselman as the new Chair of the Apache Mahout Project Management Committee. I would like to thank Andrew for stepping up, all of us who have worked with him over the years know his dedication to the project to be invaluable. I look forward to Andrew

Re: Congrats Palumbo and Holden

2018-05-04 Thread Andrew Palumbo
Thanks guys!

Re: [REMINDER] Calendar Q3 Board report due 9/13

2017-09-13 Thread Andrew Palumbo
My apologies, I was off sync by on our board report cycle. The Mahout board report is due for next month. I will fie the report for the October meeting. Thanks all, and sorry for any inconvenience. --andy From: Andrew Palumbo <ap@outlook.com>

[REMINDER] Calendar Q3 Board report due 9/13

2017-09-02 Thread Andrew Palumbo
A reminder - the Calendar Q3 board Meeting is 20 September. We must submit a board report at least 1 week prior. Please gather a record of recent talks, etc. for the report due 13 September. I will post a Google doc in the upcoming days. --andy

Fwd: CFP: IEEE Computer Magazine -- Special Issue on Mobile and Embedded Deep Learning

2017-08-26 Thread Andrew Palumbo
Fyi @here Sent from my Verizon Wireless 4G LTE smartphone Original message From: Nic Lane Date: 08/26/2017 3:33 AM (GMT-08:00) To: Nic Lane Subject: CFP: IEEE Computer Magazine -- Special Issue on Mobile and Embedded Deep Learning Hi

Re: spark-itemsimilarity scalability / Spark parallelism issues (SimilarityAnalysis.cooccurrencesIDSs)

2017-08-21 Thread Andrew Palumbo
I should mention that the densisty is currently set quite high, and we've been discussing a user defined setting for this. Something that we have not worked in yet. From: Andrew Palumbo <ap@outlook.com> Sent: Monday, August 21, 2017 2:44:35 PM To

Re: spark-itemsimilarity scalability / Spark parallelism issues (SimilarityAnalysis.cooccurrencesIDSs)

2017-08-21 Thread Andrew Palumbo
We do currently have optimizations based on density analysis in use e.g.: in AtB. https://github.com/apache/mahout/blob/08e02602e947ff945b9bd73ab5f0b45863df3e53/math-scala/src/main/scala/org/apache/mahout/math/scalabindings/package.scala#L431 +1 to PR. thanks for pointing this out. --andy

Re: New Committer: Holden Karau

2017-07-18 Thread Andrew Palumbo
Welcome again, Holden, Great to have you on board! --andy From: Trevor Grant Sent: Tuesday, July 18, 2017 12:32:07 AM To: user@mahout.apache.org; Mahout Dev List Subject: New Committer: Holden Karau The Project Management Committee

[ANNOUNCE] JIRA email notifications moved to iss...@mahout.apache.org

2017-07-13 Thread Andrew Palumbo
Please note: with the exception of newly created issues for which email notifications will continue to be sent to d...@mahout.apache.org, all JIRA email notifications (comments, closing, etc.) have been moved to: iss...@mahout.apache.org. All committers and developers, please ensure

RE: [DISCUSS] Naming convention for multiple spark/scala combos

2017-07-08 Thread Andrew Palumbo
Holden Karau <holden.ka...@gmail.com>, user@mahout.apache.org, Dmitriy Lyubimov <dlie...@gmail.com>, Andrew Palumbo <apalu...@apache.org> Subject: Re: [DISCUSS] Naming convention for multiple spark/scala combos IIRC these all fit sbt’s conventons? On Jul 7, 2017, at 2:05 PM, T

[REMINDER] Jira emails to be moved off d...@mahout.apache.org

2017-07-06 Thread Andrew Palumbo
As a reminder: Jira email comments will be moved to: iss...@mahout.apache.org all committers, devs and anyone else interested in Bug Fix, New Feature and issue planning comments must subscribe: issues-subscr...@mahout.apache.org -- Andy

RE: Proposal for changing Mahout's Git branching rules

2017-06-21 Thread Andrew Palumbo
Pat - I just want to clear one point up.. Trevor volunteering to head up this release and the git-flow plans are independent of each other. The 0.13.1 release was originally planned as a quick follow up to 0.13.0 for each scala/spark conf combo I think this will be 6 artifacts.. spark 1.6.x -

Re: UnsatisfiedLinkError: jniViennaCL

2017-05-09 Thread Andrew Palumbo
Hello Sebsastian, > - macOS Sierra 10.12.4 (MacBook Pro, Retina, 13-inch, Early 2015) The native code in the javacpp modules does not have a build profile for MacOS. We do currently have a jira open for this: https://issues.apache.org/jira/browse/MAHOUT-1908 the fix build for mac should

Re: New Website is Staged

2017-05-09 Thread Andrew Palumbo
der for time being? Trevor Grant Data Scientist https://github.com/rawkintrevo http://stackexchange.com/users/3002022/rawkintrevo http://trevorgrant.org *"Fortunate is he, who is able to know the causes of things." -Virgil* On Tue, May 9, 2017 at 10:17 AM, Andrew Palumbo <ap

Re: New Website is Staged

2017-05-08 Thread Andrew Palumbo
I disagree with it being too bland- I find the open space and the formatting much easier to navigate and read docs from. From: Khurrum Nasim Sent: Monday, May 8, 2017 2:36:54 PM To: Mahout Dev List; user@mahout.apache.org;

Re: New Website is Staged

2017-05-08 Thread Andrew Palumbo
Trevor, That link takes me back to the old m.a.o page. From: Trevor Grant Sent: Monday, May 8, 2017 1:53:46 PM To: Mahout Dev List; user@mahout.apache.org Subject: New Website is Staged Hey all, The new website is staged. You can view

RE: New logo

2017-05-06 Thread Andrew Palumbo
+1 :) Sent from my Verizon Wireless 4G LTE smartphone Original message From: "Scott C. Cote" Date: 05/06/2017 2:43 PM (GMT-08:00) To: user@mahout.apache.org Cc: Trevor Grant , Mahout Dev List Subject:

RE: Welcome our GSoC Student Aditya Sarma

2017-05-04 Thread Andrew Palumbo
Welcome!! Sent from my Verizon Wireless 4G LTE smartphone Original message From: Jim Jagielski Date: 05/04/2017 10:33 AM (GMT-08:00) To: d...@mahout.apache.org Cc: priv...@mahout.apache.org, user@mahout.apache.org, Aditya

Re: [RESULT] [VOTE] Apache Mahout 0.13.0 Release Candidate

2017-04-16 Thread Andrew Palumbo
; > Ubuntu 16.04.1 LTS > > > > Trevor Grant > Data Scientist > https://github.com/rawkintrevo > http://stackexchange.com/users/3002022/rawkintrevo > http://trevorgrant.org > > *"Fortunate is he, who is able to know the causes of things." -Virgil* > >

Re: [VOTE] Apache Mahout 0.13.0 Release Candidate

2017-04-15 Thread Andrew Palumbo
+1 (binding) Built and tested source distribution with both profiles -Pviennacl and -Pviennacl-omp. Ran SparseSparseDrmTimer.mscala through the shell in both pseudo cluster and local[2] mode Tested with several iterations and combinations of arguments eg:

Re: [VOTE] Apache Mahout 0.13.0 Release Candidate

2017-04-12 Thread Andrew Palumbo
-viennacl-omp.jar I think that we need to try a different build command. Andy From: Andrew Palumbo <ap@outlook.com> Sent: Wednesday, April 12, 2017 10:42:30 PM To: user@mahout.apache.org Subject: Re: [VOTE] Apache Mahout 0.13.0 Release Candidate It look

Re: [VOTE] Apache Mahout 0.13.0 Release Candidate

2017-04-12 Thread Andrew Palumbo
It looks like we're missing some jars from the binary distro: bin mahout-integration-0.13.0.jar confmahout-math-0.13.0.jar derby.log mahout-math-scala_2.10-0.13.0.jar docs mahout-mr-0.13.0.jar examples

Re: Lambda and Kappa CCO

2017-04-09 Thread Andrew Palumbo
Pat- What can we do from the mahout side? Would we need any new data structures? Trevor and I were just discussing some of the troubles of near real time matrix streaming. From: Pat Ferrel Sent: Monday, March 27, 2017 2:42:55 PM To:

RE: Marketing

2017-03-25 Thread Andrew Palumbo
That's pretty cool. Sent from my Verizon Wireless 4G LTE smartphone Original message From: Ted Dunning Date: 03/24/2017 7:22 PM (GMT-08:00) To: user@mahout.apache.org Cc: Mahout Dev List Subject: Re: Marketing On Fri, Mar 24,

RE: Marketing

2017-03-23 Thread Andrew Palumbo
+1 on revamp. Sent from my Verizon Wireless 4G LTE smartphone Original message From: Trevor Grant Date: 03/23/2017 12:36 PM (GMT-08:00) To: user@mahout.apache.org, d...@mahout.apache.org Subject: Marketing Hey user and dev, With 0.13.0 the Apache

Re: [VOTE] Apache Mahout 0.13.0 Release Candidate

2017-03-04 Thread Andrew Palumbo
% mahoutVersion, "org.apache.mahout" % "mahout-math" % mahoutVersion, "org.apache.mahout" % "mahout-hdfs" % mahoutVersion, BTW I’ve compiled Mahout locally with Scala 2.11 so it may just be a case for someone having time to update the release process. I’ll

RE: [VOTE] Apache Mahout 0.13.0 Release Candidate

2017-03-01 Thread Andrew Palumbo
I will verify keys tonight. Sent from my Verizon Wireless 4G LTE smartphone Original message From: Andrew Musselman Date: 03/01/2017 10:20 AM (GMT-08:00) To: user@mahout.apache.org, d...@mahout.apache.org Subject: Re: [VOTE] Apache Mahout 0.13.0

Fw: Starter Issues

2017-02-01 Thread Andrew Palumbo
From: Trevor Grant Sent: Wednesday, February 1, 2017 5:01 PM To: d...@mahout.apache.org Subject: Starter Issues Hey all, I know there are some folks on here who have been interested in getting more involved with the project,

Re: Mahout ML vs Spark Mlib vs Mahout-Spark integreation

2017-02-01 Thread Andrew Palumbo
From: Isabel Drost Sent: Wednesday, February 1, 2017 4:55 AM To: Dmitriy Lyubimov Cc: user@mahout.apache.org Subject: Re: Mahout ML vs Spark Mlib vs Mahout-Spark integreation On Tue, Jan 31, 2017 at 04:06:36PM -0800, Dmitriy Lyubimov

Re:

2016-08-04 Thread Andrew Palumbo
Raviteja, Before opening a Jira, could you explain what changes you made on the d...@mahout.apache.org list, and explain the errors that you're getting? We don't use attachments so please include in your text. Thanks, Andy From: Andrew Palumbo <

Re: Text clustering how to?

2016-08-04 Thread Andrew Palumbo
Hello Raviteja, Could you start a JIRA issue for this, and post your output there? Instructions are in the "Making Changes" section here: http://mahout.apache.org/developers/how-to-contribute.html Apache Mahout: Scalable machine learning and data

RE: Text clustering how to?

2016-07-27 Thread Andrew Palumbo
* per the response in the jira... Original message From: Andrew Palumbo <ap@outlook.com> Date: 07/27/2016 9:40 PM (GMT-05:00) To: user@mahout.apache.org Subject: RE: Text clustering how to? Right, so per the response in the software, maybe you would be inte

RE: Text clustering how to?

2016-07-27 Thread Andrew Palumbo
, Andy Original message From: Raviteja Lokineni <raviteja.lokin...@gmail.com> Date: 07/27/2016 9:30 PM (GMT-05:00) To: user@mahout.apache.org Subject: RE: Text clustering how to? Already doing that. On Jul 27, 2016 9:28 PM, "Andrew Palumbo" <ap@outlo

RE: Text clustering how to?

2016-07-27 Thread Andrew Palumbo
I don't think the response was completely sarcastic. The point is if you want to learn more about the subject you might do well to dig in and get your hands dirty updating the code. It's the push/kick in the right direction that you asked for. Original message From:

RE: [VOTE] Mahout 0.12.2 Release Candidate 2

2016-06-10 Thread Andrew Palumbo
+1 (binding) that is.. per last email tested MR wikipedia example and spark document classifier without issue. Original message From: Andrew Palumbo <ap@outlook.com> Date: 06/10/2016 10:44 PM (GMT-05:00) To: d...@mahout.apache.org, user@mahout.apache.org Subject: RE:

RE: [VOTE] Mahout 0.12.2 Release Candidate 2

2016-06-10 Thread Andrew Palumbo
+1 ran classify-wikipedia.sh MR script, launched shell and ran spark-document-classifier.mscala in standalone cluster mode. Original message From: Andrew Musselman Date: 06/10/2016 9:23 PM (GMT-05:00) To: user@mahout.apache.org Cc: mahout

Welcome Trevor Grant as a new Mahout Committer

2016-05-23 Thread Andrew Palumbo
bit of background about himself. Congratulations and Welcome! -Andrew Palumbo On Behalf of the Mahout PMC

Re: [VOTE] Apache Mahout 0.12.1 Release

2016-05-18 Thread Andrew Palumbo
+1 (binding) tested a clean source build. From: Suneel Marthi Sent: Wednesday, May 18, 2016 6:23:57 PM To: mahout; user@mahout.apache.org Subject: Re: [VOTE] Apache Mahout 0.12.1 Release Verified {src} * {tar, zip} Ran a clean build

Re: Negative probabilities

2016-05-11 Thread Andrew Palumbo
Hello, the elements of the vector are not actually probabilities, they are scores, the classification is a winner takes all approach, assigning the classification to the class with the max score. See: http://mahout.apache.org/users/algorithms/spark-naive-bayes.html for an overview of the

RE: Congratulations to our new Chair

2016-04-20 Thread Andrew Palumbo
u Andy for stepping in! On Wed, Apr 20, 2016 at 5:00 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > congrats! > > On Wed, Apr 20, 2016 at 4:55 PM, Suneel Marthi <smar...@apache.org> wrote: > > > Please join me in congratulating Andrew Palumbo on becoming our new &g

Re: [VOTE] Apache Mahout 0.12.0 Release Candidate

2016-04-11 Thread Andrew Palumbo
ran through mapreduce/spark examples from the binary .tar.gz distro. ran the spark-shell and tested the spark-document-classifier.mscala script. +1 From: Andrew Musselman Sent: Monday, April 11, 2016 12:43 PM To:

Re: [VOTE] Apache Mahout 0.11.2 Release Candidate

2016-03-11 Thread Andrew Palumbo
Built and tested src tar. Ran through classification and clustering examples in the .zip and .tar binary distro covering Spark single machine, MapReduce pseudo-cluster and MAHOUT_LOCAL. Ran the spark-shell with some simple distributed matrix multiplication and I/O tests in single machine

Re: New Mahout "Samsara" Book

2016-02-25 Thread Andrew Palumbo
; > http://www.amazon.com/Apache-Mahout-MapReduce-Dmitriy-Lyubimov/dp/1523775785 > > > > > > On Thu, Feb 25, 2016 at 9:55 AM, Pavan K Narayanan < > > pavan.naraya...@gmail.com> wrote: > > > >> Andrew, can you please attach table of contents if you don't

Re: New Mahout "Samsara" Book

2016-02-25 Thread Andrew Palumbo
uce-Dmitriy-Lyubimov/dp/1523775785 >> >> >> On Thu, Feb 25, 2016 at 9:55 AM, Pavan K Narayanan < >> pavan.naraya...@gmail.com> wrote: >> >> > Andrew, can you please attach table of contents if you don't mind. >> > On Feb 25, 2016 8:05 AM, "Andrew Pal

New Mahout "Samsara" Book

2016-02-25 Thread Andrew Palumbo
The new book, "Apache Mahout: Beyond MapReduce" has been released. Written by Mahout committers Dmitriy Lyubimov and Andrew Palumbo, this book covers previously undocumented features of Mahout releases 0.10 and 0.11. For more information please see the announcement

RE: mahout spark-itemsimilarity does not work on EMR 4.3

2016-02-23 Thread Andrew Palumbo
Please update to Mahout 0.11.1 for spark versions > 1.3. Original message From: Zhun Shen Date: 02/23/2016 8:57 PM (GMT-05:00) To: user@mahout.apache.org Subject: mahout spark-itemsimilarity does not work on EMR 4.3 Hi, mahout version: 0.11.0 EMR

RE: Mahout error : seq2sparse

2016-02-04 Thread Andrew Palumbo
thank you for reporting this. The "-el" option was removed from 'mahout trainnb' in v0.10.0 It is now the default action. That piece of documentation needs to be updated. Andy Original message From: Andrew Musselman Date: 02/04/2016 2:20 PM

Re: Mahout : 20-newsgroups Classification Example : Split command

2016-01-14 Thread Andrew Palumbo
s done , is that right ? Thanks, Alok Tanna On Thu, Jan 14, 2016 at 4:26 PM, Andrew Palumbo <ap@outlook.com<mailto:ap@outlook.com>> wrote: The poor results you are seeing by testing are because you've run seq2sparse on each set independently. This will create two different d

Re: Mahout : 20-newsgroups Classification Example : Split command

2016-01-14 Thread Andrew Palumbo
The poor results you are seeing by testing are because you've run seq2sparse on each set independently. This will create two different dictionaries, which serve as the vector index for each term in your vocabulary. You must use the same dictionary that you trained your model on to vectorize

Re: [VOTE] Apache Mahout 0.11.1 Release Candidate

2015-11-06 Thread Andrew Palumbo
1. Downloaded and built {src} {tar}- all tests passed. 2. Started shell from {src} {bin} *{tar} distro and ran some distributed algebra and I/O tests- no problems. 3. Ran MR Wikipedia example. 4. Ran Spark CLI naive bayes examples. +1 (binding) From:

Re: Exception in thread "main" java.lang.IllegalArgumentException: Unable to read output from "mahout -spark classpath"

2015-10-08 Thread Andrew Palumbo
The Mahout 0.11.0 Shell requires Spark 1.3. Please try with Spark 1.3.1. On 10/08/2015 10:37 PM, go canal wrote: > I tried Spark 1.4.1, same error. Then I saw the same error from shell > command. So I suspect that it is the environment configuration problem. > I have followed this

Re: [VOTE] Apache Mahout 0.10.2 Release Candidate

2015-08-01 Thread Andrew Palumbo
Verified source tar and zip, all tests pass. Ran through all options of the classification and clustering examples in the binary tar.gz distribution in pseudo-cluster mode for MR and Spark without incident. Ran through one option each in the .zip Classification and Clustering examples in

deprecation of lucene2seq

2015-07-03 Thread Andrew Palumbo
Please note that mahout lucene2seq and all related classes will be deprecated as of the upcoming Mahout 0.10.2 release. Thank You, Andy

Re: [VOTE] Mahout 0.10.1 Release Candidate

2015-05-31 Thread Andrew Palumbo
. Andrew Palumbo: Verify examples locally - {binary} * {zip, tar} 4. Suneel: Verify build and tests - {source} * {zip, tar} 5. Pat: Verify examples locally - {source} * {zip, tar} The LICENSE and NOTICE files have not been updated this time and will be addressed in future releases. On Sat, May 30

RE: trainnb labelindex not found error - help requested

2015-04-27 Thread Andrew Palumbo
It looks like you have a mahout 0.9 install trying to run the mahout 0.10.0 Naive Bayes script.  The command line options have changed slightly for mahout 0.10.0 MapReduce trainnb. mahout-examples-0.9-cdh5.3.0-job.jar 15/04/27 16:41:27 WARN  Sent from my Verizon Wireless 4G LTE smartphone

Re: [VOTE] Apache Mahout 0.10.0 Release

2015-04-11 Thread Andrew Palumbo
After testing examples locally from .tar and .zip distribution and testing the staged mahout-math artifact in a java application, am happy with this release +1 (binding) On 04/11/2015 11:45 AM, Suneel Marthi wrote: After checking the {source} * {tar,zip} and running a few tests locally, I am

Re: Importing tfidf from training set

2015-03-17 Thread Andrew Palumbo
If you vectorized your training data with seq2sparse, you'll need to use the df-count and dictionary from the training set. You can then tokenize a new document with a lucene analyzer and count the term frequencies for all terms in the dictionary. You can then use the TFIDF class:

RE: Providing classification labels to Naive Bayes

2014-12-18 Thread Andrew Palumbo
/divdiv /div Hi, I have created the JIRA MAHOUT-1635 with respect to this issue. Thanks, Suman. -Original Message- From: Andrew Palumbo [mailto:ap@outlook.com] Sent: Tuesday, December 16, 2014 3:06 PM To: user@mahout.apache.org Subject: RE: Providing classification labels to Naive Bayes

RE: Providing classification labels to Naive Bayes

2014-12-16 Thread Andrew Palumbo
Hi Suman, Attachments don't come through on the user list. Would you mind starting a Jira issue for this with an small example of your data and the error that you're receiving? This may be a feature that was not fully implemented in the most recent MapReduce version of Naive Bayes. Thanks,

RE: Insights to Naive Bayes classifier example - 20news groups

2014-12-02 Thread Andrew Palumbo
/org/apache/mahout/classifier/naivebayes/test/TestNaiveBayesDriver.java Thx Jakub On 1 December 2014 at 21:12, Andrew Palumbo ap@outlook.com wrote: However the sequence of steps as described in Mahout Cookbook seems to me incorrect as: this is entirely possible

RE: Insights to Naive Bayes classifier example - 20news groups

2014-12-01 Thread Andrew Palumbo
Hi Jakub, The step that you are missing is `$mahout seqdir ...`. in this step each file in each directory (where the directory is the Category) is converted into a sequence file of form Text,Text where the Text key is /Category/doc_id. `$mahout seq2sparse ...` vectorizes the output of

RE: Insights to Naive Bayes classifier example - 20news groups

2014-12-01 Thread Andrew Palumbo
All input is merged into single dir: *cp -R ${WORK_DIR}/20news-bydate*/*/* ${WORK_DIR}/20news-all* as well the above line should read as follows. $ cp -R ${WORK_DIR}/20news-bydate/*/* ${WORK_DIR}/20news-all see: http://mahout.apache.org/users/classification/twenty-newsgroups.html

RE: Insights to Naive Bayes classifier example - 20news groups

2014-12-01 Thread Andrew Palumbo
me modified if need be. Thanks Jakub On 1 December 2014 at 17:43, Andrew Palumbo ap@outlook.com wrote: Hi Jakub, The step that you are missing is `$mahout seqdir ...`. in this step each file in each directory (where the directory is the Category

RE: Naive Bayes Classification

2014-11-11 Thread Andrew Palumbo
The Naive Bayes model is serialized automatically as naiveBayesModel.bin in by TrainNaiveBayesJob.java [1] (the driver for $ mahout trainnb). To give you an idea of how, the serialization code can be found in NaiveBayesModel.java line 135 [2] [1]

RE: Categorization of documents using clustering and classification

2014-10-24 Thread Andrew Palumbo
Hello Hersheeta, Are you vectorizing the new text using the same dictionary as you used to train the models? If not, this will likely severely impact the performance of the classifier. Date: Fri, 24 Oct 2014 21:28:06 +0530 Subject: Categorization of documents using clustering and

RE: Any idea why h20 module compilation is crashing?

2014-09-22 Thread Andrew Palumbo
I just built with Java 1.6 and everything was successful. I tested with 1.7 before committing it and that was successful as well. Date: Mon, 22 Sep 2014 09:05:46 -0700 Subject: Any idea why h20 module compilation is crashing? From: dlie...@gmail.com To: user@mahout.apache.org Hello,

RE: Any idea why h20 module compilation is crashing?

2014-09-22 Thread Andrew Palumbo
my $SCALA_HOME is actually unset for some reason. I do have 2.10.3 on my machine. But its not in my path. I'll try setting it and building again. From: ap@outlook.com To: user@mahout.apache.org Subject: RE: Any idea why h20 module compilation is crashing? Date: Mon, 22 Sep 2014 12:18:15

RE: Any idea why h20 module compilation is crashing?

2014-09-22 Thread Andrew Palumbo
I'll get 2.10.4 and try with that. From: ap@outlook.com To: user@mahout.apache.org Subject: RE: Any idea why h20 module compilation is crashing? Date: Mon, 22 Sep 2014 12:24:19 -0400 my $SCALA_HOME is actually unset for some reason. I do have 2.10.3 on my machine. But its not in my

RE: Any idea why h20 module compilation is crashing?

2014-09-22 Thread Andrew Palumbo
yeah its building ok.. the h2o module seems to build ok here with 2.10.4. I'll double check with the full build.. [INFO] [INFO] BUILD SUCCESS [INFO]

RE: any pointer to run wikipedia bayes example

2014-09-05 Thread Andrew Palumbo
be entirely possible that it is time for us to move to a larger cluster. I am just curious how much disk space should we expect to use for NB on full wiki dataset ? Thanks! Wei Andrew Palumbo ---08/27/2014 05:17:56 PM---Subject: RE: any pointer to run wikipedia bayes example To: user

RE: any pointer to run wikipedia bayes example

2014-08-27 Thread Andrew Palumbo
to the vectorization, I am not sure if it is feasible ? Thanks a lot ! Wei Andrew Palumbo ---08/21/2014 02:28:45 PM---Hello, Yes, If you work off of the current trunk, you can use the classify-wiki.sh example. There i From: Andrew Palumbo ap@outlook.com To: user

RE: any pointer to run wikipedia bayes example

2014-08-27 Thread Andrew Palumbo
would just need to bypass the label data part and go directly to the vectorization, I am not sure if it is feasible ? Thanks a lot ! Wei Andrew Palumbo ---08/21/2014 02:28:45 PM---Hello, Yes, If you work off of the current trunk, you can use the classify-wiki.sh example

RE: any pointer to run wikipedia bayes example

2014-08-22 Thread Andrew Palumbo
Thank you very much ! Wei (2) Is it legit to use some other catogeries other than Andrew Palumbo ---08/21/2014 02:28:45 PM---Hello, Yes, If you work off of the current trunk, you can use the classify-wiki.sh example. There i From: Andrew Palumbo ap@outlook.com To: user

RE: any pointer to run wikipedia bayes example

2014-08-21 Thread Andrew Palumbo
Hello, Yes, If you work off of the current trunk, you can use the classify-wiki.sh example. There is currently no documentation on the Mahout site for this. You can run this script to build and test an NB classifier for option (1) 10 arbitrary countries or option (2) 2 countries (United

RE: Naive Bayes Classifier Bug ?

2014-06-21 Thread Andrew Palumbo
Hi Toyoharu, Mahout Naive Bayes uses Laplace smoothing (the alpha_I parameter with default=1) to deal with terms unseen by the training set. See Rennie et al. sec. 2.3 [1]. Your modification will certainly work, and may in fact give better results for the problem that your working on.

RE: Using existing model to train again

2014-05-26 Thread Andrew Palumbo
Hi Subbu, There is currently no way to update an already trained Naive Bayes Model. You'd have to retrain on the full 2 million records. You could probably hack TrainNaiveBayesJob.java [1] to meet your needs if you anticipated this as something that you'd need to do in the future, but

RE: Using existing model to train again

2014-05-26 Thread Andrew Palumbo
Hi Namit, The current Naive Bayes implementation is based on MapReduce and therefore dependant on Hadoop. You could run mahout trainnb and mahout testnb scripts locally by setting the environment variable MAHOUT_LOCAL=true. This will keep everything on your local filesystem and prevent

RE: Mahout Naive Bayes CSV Classification

2014-05-06 Thread Andrew Palumbo
, 6 May 2014 21:04:18 +0300 Subject: Re: Mahout Naive Bayes CSV Classification To: user@mahout.apache.org Yes On Mon, May 5, 2014 at 10:51 PM, Andrew Palumbo ap@outlook.com wrote: Jossef, Does your training set have any features with a zero value for all instances? Date

RE: Mahout Naive Bayes CSV Classification

2014-05-05 Thread Andrew Palumbo
from the CSV i'm using can be found here: https://gist.github.com/Jossef/e6c8fc0c31f0c2bf036a On May 5, 2014 5:53 AM, Andrew Palumbo ap@outlook.com wrote: Hi Jossef, I can answer your first two questions for you: 1) Are these predicted values normal? Yes, negative scores

RE: Mahout Naive Bayes CSV Classification

2014-05-04 Thread Andrew Palumbo
Hi Jossef, I can answer your first two questions for you: 1) Are these predicted values normal? Yes, negative scores are normal. 2) For now, i'm assuming that the max value 'wins'. is that correct? That is correct, NaiveBayes uses a winner takes all approach to to class assignment based

RE: Welcome Pat Ferrel as new committer on Mahout

2014-04-24 Thread Andrew Palumbo
Congratulations Pat! Subject: Re: Welcome Pat Ferrel as new committer on Mahout From: andrew.mussel...@gmail.com Date: Thu, 24 Apr 2014 06:44:43 -0700 CC: user@mahout.apache.org To: d...@mahout.apache.org Great news, welcome Pat! On Apr 24, 2014, at 3:19 AM, Sebastian Schelter