Re: Canopy implementation

2009-05-31 Thread Jeff Eastman
Hi Benson, We are in the same boat on old dogs and new tricks. Because of the potentially large volume of points that can be clustered by the MR implementation, storing the points with the canopies won't scale. Thus, the CanopyClusteringJob does two passes: the CanopyDriver.runJob pass proces

Re: Canopy implementation

2009-05-31 Thread Benson Margulies
Should ref implementations be in the examples project or in the core, or in some third location? On Sun, May 31, 2009 at 12:21 PM, Ted Dunning wrote: > Yes. > > On Sun, May 31, 2009 at 7:09 AM, Benson Margulies >wrote: > > > Should the ref implementation be in a class by itself? > > > > > -- >

Re: Canopy implementation

2009-05-31 Thread Benson Margulies
Jeff, I'm an old dog who has been taught a certain number of machine learning new tricks. There's a common thread to the Canopy and KMeans code that has me doing a certain amount of head-scratching. The Canopy class doesn't keep a reference to the points in the canopy. But someone must. Or is can

Re: Canopy implementation

2009-05-31 Thread Ted Dunning
Yes. On Sun, May 31, 2009 at 7:09 AM, Benson Margulies wrote: > Should the ref implementation be in a class by itself? -- Ted Dunning, CTO DeepDyve

Re: Canopy implementation

2009-05-31 Thread Benson Margulies
Question: What's the role of a reference implementation embedded in a GUI? I think I can patch up the implementation in DisplayKMeans easily enough. Should the ref implementation be in a class by itself? --be On Sat, May 30, 2009 at 8:22 PM, Jeff Eastman wrote: > I think you are actually corre

Re: Canopy implementation

2009-05-30 Thread Jeff Eastman
I think you are actually correct about the reference implementation that is used in the tests and that example. I was looking at the Canopy.addPointToCanopies() method which does add a new canopy if there are none that are strongly bound (suggest a fix? Jeff Benson Margulies wrote: I'll loo

Re: Canopy implementation

2009-05-30 Thread Benson Margulies
I'll look at the copy in DisplayKMeans again and see if it is missing that last test. On Sat, May 30, 2009 at 12:41 PM, Jeff Eastman wrote: > Canopy tests each point against the current set of canopies, adding the > point to each canopy that is within t1 and finally stopping when it finds > one w

Re: Canopy implementation

2009-05-30 Thread Jeff Eastman
Canopy tests each point against the current set of canopies, adding the point to each canopy that is within t1 and finally stopping when it finds one within t2. If all canopies are tested and none are within t2 then a new canopy is added with the point as its center. So, even if you set t1 and