[jira] [Commented] (MAHOUT-510) Standardize serialization mechanisms

2011-04-06 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016640#comment-13016640 ] Hudson commented on MAHOUT-510: --- Integrated in Mahout-Quality #726 (See [https://hudson.apa

[jira] [Commented] (MAHOUT-510) Standardize serialization mechanisms

2011-04-06 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016279#comment-13016279 ] Sean Owen commented on MAHOUT-510: -- It does quite little now, if not nothing. It essentia

[jira] [Commented] (MAHOUT-510) Standardize serialization mechanisms

2011-04-05 Thread Lance Norskog (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016262#comment-13016262 ] Lance Norskog commented on MAHOUT-510: -- MatrixTest still has a testLabelBindingSerial

[jira] [Commented] (MAHOUT-510) Standardize serialization mechanisms

2011-04-05 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016178#comment-13016178 ] Hudson commented on MAHOUT-510: --- Integrated in Mahout-Quality #722 (See [https://hudson.apa

[jira] [Commented] (MAHOUT-510) Standardize serialization mechanisms

2011-04-05 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015978#comment-13015978 ] Ted Dunning commented on MAHOUT-510: Yes. Preserving those is all that is needed. Th

[jira] [Commented] (MAHOUT-510) Standardize serialization mechanisms

2011-04-05 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015846#comment-13015846 ] Sean Owen commented on MAHOUT-510: -- Ted are you saying it's simply sufficient to retain t

[jira] [Commented] (MAHOUT-510) Standardize serialization mechanisms

2011-04-04 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015666#comment-13015666 ] Ted Dunning commented on MAHOUT-510: That stitching has to be kind of fancy because th

[jira] [Commented] (MAHOUT-510) Standardize serialization mechanisms

2011-04-04 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015642#comment-13015642 ] Sean Owen commented on MAHOUT-510: -- No problem I can stitch that back in with a Writable-

[jira] [Commented] (MAHOUT-510) Standardize serialization mechanisms

2011-04-04 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015623#comment-13015623 ] Ted Dunning commented on MAHOUT-510: The deletion of ModelSerializer in its entirety i

[jira] [Commented] (MAHOUT-510) Standardize serialization mechanisms

2011-04-04 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015601#comment-13015601 ] Sean Owen commented on MAHOUT-510: -- Yes it's just a few wiki pages, it seems -- I just se

[jira] [Commented] (MAHOUT-510) Standardize serialization mechanisms

2011-04-04 Thread Isabel Drost (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015449#comment-13015449 ] Isabel Drost commented on MAHOUT-510: - > The only remaining concern was that this will

Re: [jira] Commented: (MAHOUT-510) Standardize serialization mechanisms

2011-01-17 Thread Jeff Eastman
Ah, ok, now I see what you are talking about. This is a bit of laziness on my part that I forgot about. The ModelDistribution is produced from 3-4 argument values (modelFactory, modelPrototype, distanceMeasure, prototypeSize) from the command line. You could just pass those argument values (all

Re: [jira] Commented: (MAHOUT-510) Standardize serialization mechanisms

2011-01-17 Thread Sean Owen
The idea was to remove all use of JSON in an attempt to reduce the number of different serialization approaches used. So at the moment I'm trying to figure out what happens when I delete everything related to JSON. Most of it goes quietly. The only use that seems, well, actively used is the bit in

Re: [jira] Commented: (MAHOUT-510) Standardize serialization mechanisms

2011-01-17 Thread Ted Dunning
I am only trying to understand what Sean means when he mentions passing state via configuration. On Mon, Jan 17, 2011 at 9:17 AM, Jeff Eastman wrote: > Dirichlet uses Writable to serialize its iteration output state (to > clusters-n). I'm confused about what your trying to do. > > > > On 1/17/11

Re: [jira] Commented: (MAHOUT-510) Standardize serialization mechanisms

2011-01-17 Thread Jeff Eastman
Dirichlet uses Writable to serialize its iteration output state (to clusters-n). I'm confused about what your trying to do. On 1/17/11 9:58 AM, Ted Dunning wrote: This sort of thing is what the distributed cache was designed for. On Mon, Jan 17, 2011 at 8:53 AM, Sean Owen wrote: Do you thi

Re: [jira] Commented: (MAHOUT-510) Standardize serialization mechanisms

2011-01-17 Thread Ted Dunning
This sort of thing is what the distributed cache was designed for. On Mon, Jan 17, 2011 at 8:53 AM, Sean Owen wrote: > Do you think the way forward is to leave it, or use Writable and write the > model distribution to a file, or something else? >

Re: [jira] Commented: (MAHOUT-510) Standardize serialization mechanisms

2011-01-17 Thread Sean Owen
The bit I wasn't able to take out easily was serialzing ModelDistribution to a JSON string and then passing that via the Configuration object. Indeed everywhere else it was just another output option. Do you think the way forward is to leave it, or use Writable and write the model distribution to

Re: [jira] Commented: (MAHOUT-510) Standardize serialization mechanisms

2011-01-17 Thread Jeff Eastman
Dirichlet supports JSON but uses Writable internally as do the rest of the clustering algorithms. On 1/17/11 8:50 AM, Sean Owen wrote: The idea behind MAHOUT-510 was to try to standardize serialization mechanisms as much as reasonable. It would be counterproductive to remove one and add another

Re: [jira] Commented: (MAHOUT-510) Standardize serialization mechanisms

2011-01-17 Thread Sean Owen
The idea behind MAHOUT-510 was to try to standardize serialization mechanisms as much as reasonable. It would be counterproductive to remove one and add another, I think. There was some support for using Avro for text serialization instead of JSON, even though that has the same issue -- so I think

Re: [jira] Commented: (MAHOUT-510) Standardize serialization mechanisms

2011-01-17 Thread Robin Anil
Protobufs are a good choice :)

Re: [jira] Commented: (MAHOUT-510) Standardize serialization mechanisms

2011-01-17 Thread Ted Dunning
I think that sequence file might be the next step but the current step is just a writable. I have a PolymorphicWritable static class that helps with that a bit. I am leaning toward protobufs as a replacement for JSON. There is human readable protobuf syntax. Avro is another option. I don't gen

[jira] Commented: (MAHOUT-510) Standardize serialization mechanisms

2011-01-16 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982483#action_12982483 ] Robin Anil commented on MAHOUT-510: --- The changes are fine. What are we moving towards for

[jira] Commented: (MAHOUT-510) Standardize serialization mechanisms

2011-01-16 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982416#action_12982416 ] Ted Dunning commented on MAHOUT-510: (replied instead of commenting ... sorry for the d

Re: [jira] Commented: (MAHOUT-510) Standardize serialization mechanisms

2011-01-16 Thread Ted Dunning
Putting data objects in the Configuration is a bit of a misuse (it has been the subject of an argument on the hadoop mailing lists for a long time now). I would leave this use in place for now and later refactor to read from HDFS. That has more legs in any case when it comes to using the clusteri

[jira] Commented: (MAHOUT-510) Standardize serialization mechanisms

2011-01-16 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982278#action_12982278 ] Sean Owen commented on MAHOUT-510: -- (BTW I'm not committing this for some time.) I've man

[jira] Commented: (MAHOUT-510) Standardize serialization mechanisms

2011-01-15 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982197#action_12982197 ] Ted Dunning commented on MAHOUT-510: I think that this patch looks pretty reasonable.

Re: [jira] Commented: (MAHOUT-510) Standardize serialization mechanisms

2011-01-11 Thread Jake Mannix
+1 from me, the guy who is usually really conservative regarding killing code. -jake On Tue, Jan 11, 2011 at 4:22 PM, Ted Dunning wrote: > The book might need to change to track changes like that. Other than that, > the time I spent with GSON I consider to be lost hours of my life. Binary >

Re: [jira] Commented: (MAHOUT-510) Standardize serialization mechanisms

2011-01-11 Thread Sean Owen
Sounds good. I'll proceed but am going to sit on this change for a long while. On Wed, Jan 12, 2011 at 12:22 AM, Ted Dunning wrote: > The book might need to change to track changes like that. Other than that, > the time I spent with GSON I consider to be lost hours of my life. Binary > seriali

Re: [jira] Commented: (MAHOUT-510) Standardize serialization mechanisms

2011-01-11 Thread Ted Dunning
The book might need to change to track changes like that. Other than that, the time I spent with GSON I consider to be lost hours of my life. Binary serialization with model inspection tools is really the only way to go for our classification needs. On Tue, Jan 11, 2011 at 3:20 PM, Sean Owen (JI

[jira] Commented: (MAHOUT-510) Standardize serialization mechanisms

2011-01-11 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980423#action_12980423 ] Sean Owen commented on MAHOUT-510: -- So, I'm starting to take a crack at this. I've identif