tf.idf transform variants for vectors / SMART notation

2008-10-01 Thread Allen Day
I've implemented privately a few combinations of these for Vector: http://nlp.stanford.edu/IR-book/html/htmledition/document-and-query-weighting-schemes-1.html Now I'm considering to make and contribute a more generic class based on SMART notation to do this. Usage might be something like: // i

[jira] Updated: (MAHOUT-77) DistanceMeasure calculation slow for SparseVector

2008-09-19 Thread Allen Day (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-77?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Day updated MAHOUT-77: Attachment: sparse.patch added DistanceMeasure tests. moved patch level generation up to capture tests

[jira] Commented: (MAHOUT-78) HBase RowResult/BatchUpdate access via Mahout Vector interface

2008-09-19 Thread Allen Day (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632802#action_12632802 ] Allen Day commented on MAHOUT-78: - Grant, I'd be happy to add tests, but I'

[jira] Updated: (MAHOUT-78) HBase RowResult/BatchUpdate access via Mahout Vector interface

2008-09-19 Thread Allen Day (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-78?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Day updated MAHOUT-78: Attachment: hbase.patch > HBase RowResult/BatchUpdate access via Mahout Vector interf

[jira] Created: (MAHOUT-78) HBase RowResult/BatchUpdate access via Mahout Vector interface

2008-09-19 Thread Allen Day (JIRA)
Reporter: Allen Day Priority: Minor Attachments: hbase.patch An adapter class is attached that allows read/write operations on HBase rows using the Vector interface. This allows, e.g. canopy clustering of rows in an HBase table. -- This message is automatically

[jira] Updated: (MAHOUT-77) DistanceMeasure calculation slow for SparseVector

2008-09-19 Thread Allen Day (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-77?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Day updated MAHOUT-77: Attachment: sparse.patch > DistanceMeasure calculation slow for SparseVec

[jira] Created: (MAHOUT-77) DistanceMeasure calculation slow for SparseVector

2008-09-19 Thread Allen Day (JIRA)
Reporter: Allen Day Priority: Minor Attachments: sparse.patch ManhattanDistanceMeasure and TanimotoDistanceMeasure assume all vector indices up to cardinality() must be compared. We can speed this up for SparseVectors (and others) because Vector implements Iterable

using HBase matrices/vectors from Mahout

2008-09-04 Thread Allen Day
Hi, I'm writing some adapter classes that allow HBase tables and rows to be used from Mahout. I have a working HBase row / vector adapter written, and now going to do another for table / matrix. I'd like to contribute this to Mahout. A couple of questions: * what package should this code be in

[jira] Updated: (MAHOUT-76) Singular Value Decomposition for SparseMatrix / DenseMatrix

2008-08-20 Thread Allen Day (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-76?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Day updated MAHOUT-76: Attachment: SVD.patch > Singular Value Decomposition for SparseMatrix / DenseMat

[jira] Created: (MAHOUT-76) Singular Value Decomposition for SparseMatrix / DenseMatrix

2008-08-20 Thread Allen Day (JIRA)
Components: Matrix Environment: N/A Reporter: Allen Day Priority: Minor Adding a new class and test harness for a SVD implementation derived from JAMA -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue

SVD for Mahout

2008-08-15 Thread Allen Day
Hi, I've modified the JAMA SingularValueDecomposition class to work with the Mahout SparseMatrix and DenseMatrix, and I'd like to contribute it to the project if it's wanted. What's the procedure? Do I create an item on JIRA with a patch attached? -Allen

Re: asFormatString tests fail

2008-08-08 Thread Allen Day
Writing out in Harwell-Boeing format is another case where you will want to have it sorted. -Allen On Fri, Aug 8, 2008 at 12:11 AM, Shalin Shekhar Mangar <[EMAIL PROTECTED]> wrote: > If there is never a need to keep it in order except for the unit test, then > yes, I agree with you. In that case,

asFormatString tests fail

2008-08-07 Thread Allen Day
Here's a relevant snippet: [junit] Testcase: testAsFormatString(org.apache.mahout.matrix.TestSparseVector): FAILED [junit] format expected:<[s5, [2:2.2, 1:1.1], 3:3.3, ] > but was:<[s5, [1:1.1, 2:2.2], 3:3.3, ] > [junit] junit.framework.ComparisonFailure: format expected:<[s5, [2:2.2,

Re: OutOfMemory Exception !

2008-08-07 Thread Allen Day
Hey, This is still too low. 256m is sufficient for my tests to pass. I was getting failures with jdk 1.6.0_10-beta on a 64bit system. -Allen On Mon, Jun 2, 2008 at 6:18 AM, Sean Owen <[EMAIL PROTECTED]> wrote: > Yep I saw the same thing. You can either change that settings up your > maximum he

Re: getting started with mahout, failing tests

2008-06-21 Thread Allen Day
l <[EMAIL PROTECTED]> wrote: > Hmm, weird. Can you look at the test reports and see what the errors are? > > -Grant > > On Jun 21, 2008, at 2:00 AM, Allen Day wrote: > >> Hi, >> >> I finally had a chance to get mahout checked out and built today. I &g

getting started with mahout, failing tests

2008-06-20 Thread Allen Day
Hi, I finally had a chance to get mahout checked out and built today. I want to get up to speed so I can start using/contributing. I can get the "compile" target to build successfully, but I'm getting errors from the "test" target. [junit] Test org.apache.mahout.clustering.canopy.TestCanopy

Re: source of lots of images

2008-05-22 Thread Allen Day
hing? > Jeff > > > Allen Day wrote: >> >> There was an excellent presentation from Rob Fergus on this data set >> at last year's UCLA/NSF "Mathematics of Search Engines" workshop. >> >> https://www.ipam.ucla.edu/schedule.aspx?pc=sews2 >> >>

Re: source of lots of images

2008-05-22 Thread Allen Day
There was an excellent presentation from Rob Fergus on this data set at last year's UCLA/NSF "Mathematics of Search Engines" workshop. https://www.ipam.ucla.edu/schedule.aspx?pc=sews2 Scroll down to Tuesday 3pm to grab the slides and audio. -Allen On Thu, May 22, 2008 at 5:44 PM, Ted Dunning <

Re: Demos/Tutorials

2008-03-17 Thread Allen Day
Hi, I'll be trying out Mahout to do some microarray gene expression clustering pretty soon. I would be happy to do a small write-up. -Allen On Mon, Mar 17, 2008 at 7:41 AM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > Now that we have some code in place for clustering, I think it would > be co