Re: In progress website

2010-05-11 Thread Pradeep Pujari
It is very nice. Keep it up. Pradeep. On Tue, May 11, 2010 at 4:33 PM, Robin Anil wrote: > I am off to neverland. So this is what I have at the moment. If you have > any > thoughts for the central block do tell. My thoughts go towards like > explaining clustering, recommendation etc pictorially

named entity recognization though Hadoop map reduce frame

2010-05-11 Thread 张佳宝
Hi, I am working with named entity recognization though Hadoop map reduce frame using large mount of website-data.It is a similar work to Mahout ,so I want to know if there is anyone have done this work?and if you are intersted in it ,i can contribute it to you when I totally finished it .

In progress website

2010-05-11 Thread Robin Anil
I am off to neverland. So this is what I have at the moment. If you have any thoughts for the central block do tell. My thoughts go towards like explaining clustering, recommendation etc pictorially along with some message http://robinanil.com/website/ Robin

Re: Items to complete the move

2010-05-11 Thread Grant Ingersoll
No, I said I would do it in the morning at a scheduled time to synchronize the process. On May 11, 2010, at 4:24 PM, Jeff Eastman wrote: > Do we have svn yet? > > # svn co http://svn.apache.org/repos/asf/mahout/trunk mahout > svn: URL 'http://svn.apache.org/repos/asf/mahout/trunk' doesn't exist

Re: Items to complete the move

2010-05-11 Thread Jeff Eastman
Do we have svn yet? # svn co http://svn.apache.org/repos/asf/mahout/trunk mahout svn: URL 'http://svn.apache.org/repos/asf/mahout/trunk' doesn't exist On 5/11/10 6:22 AM, Grant Ingersoll wrote: On May 11, 2010, at 9:20 AM, Benson Margulies wrote: Grant, just to be tiresome, I'll remind yo

Re: Items to complete the move

2010-05-11 Thread Grant Ingersoll
I hate to put the cart before the horse. I vote for leaving as is. Moving in SVN is trivial, let's do it when the time arises. -Grant On May 11, 2010, at 3:20 PM, Robin Anil wrote: > java ? like in lucene ? > > > On Wed, May 12, 2010 at 12:45 AM, Ted Dunning wrote: > >> I like some things

[jira] Resolved: (MAHOUT-391) Make vector more space efficient with variable-length encoding, et al

2010-05-11 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved MAHOUT-391. -- Resolution: Fixed Yeah I think that's a solid win, and about what I'd expect on average in real life.

Re: Items to complete the move

2010-05-11 Thread Robin Anil
java ? like in lucene ? On Wed, May 12, 2010 at 12:45 AM, Ted Dunning wrote: > I like some things about this, dislike others. > > One thing I don't like is the code directory. Is it necessary? Or is this > just as good: > > mahout/ > ?main-project?/ > trunk, tags, branches > collect

[jira] Commented: (MAHOUT-391) Make vector more space efficient with variable-length encoding, et al

2010-05-11 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12866283#action_12866283 ] Robin Anil commented on MAHOUT-391: --- 18237354 vs 16184485 Thats a 11% improvement. Awesom

Re: Items to complete the move

2010-05-11 Thread Ted Dunning
I like some things about this, dislike others. One thing I don't like is the code directory. Is it necessary? Or is this just as good: mahout/ ?main-project?/ trunk, tags, branches collections/ trunk, tags, branches something-else-next-week trunk, tags, branches

[jira] Commented: (MAHOUT-391) Make vector more space efficient with variable-length encoding, et al

2010-05-11 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12866280#action_12866280 ] Sean Owen commented on MAHOUT-391: -- Sorry my brain is out to lunch today. Trying to play t

[jira] Commented: (MAHOUT-391) Make vector more space efficient with variable-length encoding, et al

2010-05-11 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12866278#action_12866278 ] Robin Anil commented on MAHOUT-391: --- Its failing for me . ava.io.EOFException a

[jira] Updated: (MAHOUT-391) Make vector more space efficient with variable-length encoding, et al

2010-05-11 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated MAHOUT-391: - Attachment: MAHOUT-391.patch Latest full patch which should properly save the bytes advertised. > Make v

Re: Draft of May board report available; comments needed by Wednesday

2010-05-11 Thread Ted Dunning
The best characterization I have heard recently distinguished between "traditional statistics" and "data mining". The key factor in the distinction was that in traditional statistics, you test hypotheses against data whereas in data mining you generate hypotheses (called models) from the data. In

Re: Mahout Mailing List Moved

2010-05-11 Thread Jeff Eastman
Did you send a message to dev-unsubscr...@m.a.o? On 5/11/10 10:43 AM, Ashutosh Singh wrote: how do I unsubscribe. I tried sending an unsubscribe mail in the past and it did not work. On Tue, May 11, 2010 at 4:02 AM, Grant Ingersollwrote: The Mahout mailing lists have moved. All current s

Re: svn commit: r943118 - /lucene/mahout/pmc/board-reports/2010/board-report-may.txt

2010-05-11 Thread Grant Ingersoll
For the first Board report, I think it is useful to provide a bit of context about what Mahout is, but after that, Board reports should mostly be short and too the point, highlighting any key things that happened since the last report or anything that requires Board attn. -Grant On May 11, 20

Re: svn commit: r943118 - /lucene/mahout/pmc/board-reports/2010/board-report-may.txt

2010-05-11 Thread Sisir Koppaka
Genetic Algorithms are very specific instances of Evolutionary Algorithms inspired by genes...Evolutionary Computation encompasses the much broader class of algorithms including Evolutionary Algorithms(which include Quantum-Inspired Evolutionary Algos on which I work, and will hopefully port from M

Re: svn commit: r943126 - /lucene/mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/item/IndexIndexWritable.java

2010-05-11 Thread Robin Anil
put the patch up. I will verify it on reuters once. On Tue, May 11, 2010 at 11:10 PM, Sean Owen wrote: > Well I did a purer, local test and results are more reasonable. > Writing 1 random-access sparse vectors, 1000 entries, each a > random number to 10, takes 5.4s before versus 4.7s wi

Re: Mahout Mailing List Moved

2010-05-11 Thread Grant Ingersoll
-unsubscr...@domain.com, as in dev-unsubscr...@mahout.apache.org. On May 11, 2010, at 1:43 PM, Ashutosh Singh wrote: > how do I unsubscribe. I tried sending an unsubscribe mail in the past and it > did not work. > > On Tue, May 11, 2010 at 4:02 AM, Grant Ingersoll wrote: > >> The Mahout maili

Re: svn commit: r943118 - /lucene/mahout/pmc/board-reports/2010/board-report-may.txt

2010-05-11 Thread Sean Owen
Nah I think it's fine to mention. The old "three Cs" meme (CF, Clustering, Classification) is outdated now so might as well fully update. If it were something that people just would like someone to support someday, I'd say let's not yet claim Mahout encompasses those topics. But yeah watchmaker is

Re: svn commit: r943118 - /lucene/mahout/pmc/board-reports/2010/board-report-may.txt

2010-05-11 Thread Sisir Koppaka
Ok...cool. It's just that if it's going on the website by any chance, mentioning evolutionary algorithms(org.apache.mahout.ga.watchmaker) might attract contributors from the area. There are quite interesting algorithms like NSGA-II , and several memetic alg

Re: svn commit: r943118 - /lucene/mahout/pmc/board-reports/2010/board-report-may.txt

2010-05-11 Thread Sean Owen
Sure, that's in scope. There's not much you could call evolutionary in the code base yet, compared to what you see for CF, clustering, classification, and maybe frequent item set mining, in terms of quantity and maturity. So I'm just trying to usefully express the reality of the project's current s

Re: Mahout Mailing List Moved

2010-05-11 Thread Ashutosh Singh
how do I unsubscribe. I tried sending an unsubscribe mail in the past and it did not work. On Tue, May 11, 2010 at 4:02 AM, Grant Ingersoll wrote: > The Mahout mailing lists have moved. All current subscribers were > automatically moved. > > The new lists are: > > dev@mahout.apache.org > u...@ma

Re: svn commit: r943126 - /lucene/mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/item/IndexIndexWritable.java

2010-05-11 Thread Sean Owen
Well I did a purer, local test and results are more reasonable. Writing 1 random-access sparse vectors, 1000 entries, each a random number to 10, takes 5.4s before versus 4.7s with changes. That must be I/O savings since it takes a little more CPU -- and that's savings writing to an SSD. Im

Re: svn commit: r943118 - /lucene/mahout/pmc/board-reports/2010/board-report-may.txt

2010-05-11 Thread Sisir Koppaka
Hi, Please correct me if I am wrong - but isn't Mahout also into Evolutionary Algorithms and Programming? Is it missing in the report by mistake? Sisir

Re: Items to complete the move

2010-05-11 Thread Jeff Eastman
I generally don't like deep file hierarchies as they tend to bury things, but since we do have one subproject already I think an organization like Two has some long-term advantages. On 5/11/10 6:30 AM, Benson Margulies wrote: Two alternatives. One, ignore me. We'll end up like CXF, which loo

Re: svn commit: r943126 - /lucene/mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/item/IndexIndexWritable.java

2010-05-11 Thread Sean Owen
I added tests to check it outputs the expected number of bytes. I checked that performance is fine. That checks out. So maybe it was a bad or misleading test. I haven't constructed a new one yet, should be easy though. On May 11, 2010 4:17 PM, "Robin Anil" wrote: Sean, Did you get to explore th

Re: svn commit: r943126 - /lucene/mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/item/IndexIndexWritable.java

2010-05-11 Thread Robin Anil
Sean, Did you get to explore the issue you found with Varint, theoretically it should bring better savings thatn VInt and VLong right? Robin

Re: svn commit: r943118 - /lucene/mahout/pmc/board-reports/2010/board-report-may.txt

2010-05-11 Thread Jeff Eastman
+1, the best so far On 5/11/10 6:26 AM, sro...@apache.org wrote: Author: srowen Date: Tue May 11 13:26:57 2010 New Revision: 943118 URL: http://svn.apache.org/viewvc?rev=943118&view=rev Log: Edits from dev list Modified: lucene/mahout/pmc/board-reports/2010/board-report-may.txt Modified

Re: mahout test

2010-05-11 Thread Jeff Eastman
What kind of algorithm do you want to test? If it is pure Java you can write a unit test; look in one of the testing folders for many examples. If you want to test a MapReduce algorithm, you can write it in a unit test too, it is just a little more complicated. Look in the clustering tests for

Re: Items to complete the move

2010-05-11 Thread Benson Margulies
As far as javadoc is concerned, I think that Isabel had some stuff set up on the pattern of 'run the site plugin, check the results into svn, always populate p.a.o:/www from svn'. Or I'm mixing this up with some other Apache project I had to release a week or two ago. If we want to have both the c

Re: Items to complete the move

2010-05-11 Thread Drew Farris
On Tue, May 11, 2010 at 9:13 AM, Robin Anil wrote: > > > I was hoping for the same, I have seen some apache projects do that. > Would > be better to keep things dynamic. On the other hand making single landing > page is easier for me and faster to do before TLP is announced. Confluence > modding

Re: Items to complete the move

2010-05-11 Thread Robin Anil
On Tue, May 11, 2010 at 7:00 PM, Benson Margulies wrote: > Two alternatives. > > One, ignore me. We'll end up like CXF, which looks like: > > cxf/ > trunk > tags > branches > sandboxes > dosgi > web > osgi > > in other words, the subproject top-levels and such are just sitting > there next

Re: Items to complete the move

2010-05-11 Thread Benson Margulies
Two alternatives. One, ignore me. We'll end up like CXF, which looks like: cxf/ trunk tags branches sandboxes dosgi web osgi in other words, the subproject top-levels and such are just sitting there next to the main project's trunk/tag/branches. That gave me an itch, thus Tw

Re: Draft of May board report available; comments needed by Wednesday

2010-05-11 Thread Sean Owen
I don't think it hurts to add a sentence about that, will do. On Tue, May 11, 2010 at 2:09 PM, Drew Farris wrote: > Should we add anything about the move, getting the website set up, etc? It > is not directly related to the goals of the project, but it is work that's > being done. Not sure if thi

Re: Items to complete the move

2010-05-11 Thread Grant Ingersoll
On May 11, 2010, at 9:20 AM, Benson Margulies wrote: > Grant, just to be tiresome, I'll remind you that relocating the > existing mahout svn root down a directory level will reduce the effort > level of the later introductions of subprojects, sandboxes, etc. What's the structure that you propose

Re: Items to complete the move

2010-05-11 Thread Benson Margulies
Grant, just to be tiresome, I'll remind you that relocating the existing mahout svn root down a directory level will reduce the effort level of the later introductions of subprojects, sandboxes, etc. On Tue, May 11, 2010 at 8:46 AM, Grant Ingersoll wrote: > 1. Move the website to people.a.o: /ww

Re: Items to complete the move

2010-05-11 Thread Robin Anil
> > > > I think we need a static landing page still, no? I suppose we could just > do a redirect, too. Can we make the CWiki look nice? > > > I was hoping for the same, I have seen some apache projects do that. Would be better to keep things dynamic. On the other hand making single landing page i

Re: Draft of May board report available; comments needed by Wednesday

2010-05-11 Thread Drew Farris
Should we add anything about the move, getting the website set up, etc? It is not directly related to the goals of the project, but it is work that's being done. Not sure if this sort of thing goes in the board report or not. Drew On Mon, May 10, 2010 at 7:19 PM, Sean Owen wrote: > Our first bo

Re: Items to complete the move

2010-05-11 Thread Grant Ingersoll
On May 11, 2010, at 8:56 AM, Robin Anil wrote: > On Tue, May 11, 2010 at 6:16 PM, Grant Ingersoll wrote: > >> 1. Move the website to people.a.o: /www/mahout.apache.org. Update the >> site to reflect our new status and set up the Who We are, etc. Robin, any >> word on the new skin for Mahout? >

Re: Items to complete the move

2010-05-11 Thread Drew Farris
On Tue, May 11, 2010 at 8:56 AM, Robin Anil wrote: > > > 1. Move the website to people.a.o: /www/mahout.apache.org. Update the > > site to reflect our new status and set up the Who We are, etc. Robin, > any > > word on the new skin for Mahout? > > > I am confused. I thought we are going to use

Re: Items to complete the move

2010-05-11 Thread Grant Ingersoll
On May 11, 2010, at 8:49 AM, Drew Farris wrote: > On Tue, May 11, 2010 at 8:46 AM, Grant Ingersoll wrote: > >> >> 3. Move SVN. Let's schedule this. I will do the move. How about tomorrow >> at 8 AM EDT? This will mean moving the existing tree and leaving a >> placeholder in the old place to

Re: Items to complete the move

2010-05-11 Thread Robin Anil
On Tue, May 11, 2010 at 6:16 PM, Grant Ingersoll wrote: > 1. Move the website to people.a.o: /www/mahout.apache.org. Update the > site to reflect our new status and set up the Who We are, etc. Robin, any > word on the new skin for Mahout? > I am confused. I thought we are going to use the same

Re: Items to complete the move

2010-05-11 Thread Drew Farris
On Tue, May 11, 2010 at 8:46 AM, Grant Ingersoll wrote: > > 3. Move SVN. Let's schedule this. I will do the move. How about tomorrow > at 8 AM EDT? This will mean moving the existing tree and leaving a > placeholder in the old place to point people to the new place. > It doesn't appear that

Items to complete the move

2010-05-11 Thread Grant Ingersoll
1. Move the website to people.a.o: /www/mahout.apache.org. Update the site to reflect our new status and set up the Who We are, etc. Robin, any word on the new skin for Mahout? 2. Update the Lucene TLP site. We should put Mahout under the related projects item and also put a news item there.

Re: Draft of May board report available; comments needed by Wednesday

2010-05-11 Thread Sean Owen
No worries, it's already there. I checked. On Tue, May 11, 2010 at 1:38 PM, Grant Ingersoll wrote: > If you haven't (or infra hasn't) already, we'll need to setup an entry in the > committee-info.txt file.  It's under the foundation private section.  I'm not > entirely sure if it is ASF Members

Re: Draft of May board report available; comments needed by Wednesday

2010-05-11 Thread Grant Ingersoll
If you haven't (or infra hasn't) already, we'll need to setup an entry in the committee-info.txt file. It's under the foundation private section. I'm not entirely sure if it is ASF Members only or not. I'll send you a link privately. On May 11, 2010, at 8:12 AM, Sean Owen wrote: > Done, I

Re: Draft of May board report available; comments needed by Wednesday

2010-05-11 Thread Sean Owen
Done, I had kept it to 77 chars I thought but will double-check that before submitting. Subscribed, and started the process to add Mahout to apache.org/foundation (https://issues.apache.org/jira/browse/INFRA-2698), which was also listed on the chair duties to-do list. On Tue, May 11, 2010 at 12:1

Re: Draft of May board report available; comments needed by Wednesday

2010-05-11 Thread Grant Ingersoll
Looks good, Sean. Report should be formatted to no more than 80 chars wide, as the board seems to live in the dark ages still when it comes to screen width. You should subscribe to bo...@a.o (see the chair duties on www.apache.org/dev) I'll change the SVN karma to give you permission and also t

Mahout Mailing List Moved

2010-05-11 Thread Grant Ingersoll
The Mahout mailing lists have moved. All current subscribers were automatically moved. The new lists are: dev@mahout.apache.org u...@mahout.apache.org comm...@mahout.apache.org Cheers, Grant

Re: mahout.apache.org serving mailing list archives?

2010-05-11 Thread Grant Ingersoll
We need to put the site up in /www/mahout.apache.org on people.a.o and then it will propagate through the ASF mirroring system. On May 11, 2010, at 5:10 AM, Sean Owen wrote: > I just took a look at mahout.apache.org to see if it was serving > something yet since I see the new mailing lists seem

Re: Draft of May board report available; comments needed by Wednesday

2010-05-11 Thread Sean Owen
Sounds good, will qualify that too. (I'm too lazy at the moment, but the web site ought to say that too, then.) On Tue, May 11, 2010 at 10:08 AM, Robin Anil wrote: > Just a thought. "scalable machine learning and data-mining libraries" ?. > FPgrowth is not machine learning. Similary LDA is not ma

mahout.apache.org serving mailing list archives?

2010-05-11 Thread Sean Owen
I just took a look at mahout.apache.org to see if it was serving something yet since I see the new mailing lists seem to be coming online. It seems to be serving up complete archives of the mailing lists: http://mahout.apache.org/mail/ Not a big deal, all of that is public already. But just flaggi

Re: Draft of May board report available; comments needed by Wednesday

2010-05-11 Thread Robin Anil
Just a thought. "scalable machine learning and data-mining libraries" ?. FPgrowth is not machine learning. Similary LDA is not machine learning but more like data modelling. I know, its all fuzzy, and wish we had a better way to say it. "tools for understanding patterns from data and predicting fro