[
https://issues.apache.org/jira/browse/MAHOUT-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016640#comment-13016640
]
Hudson commented on MAHOUT-510:
---
Integrated in Mahout-Quality #726 (See
[https://hudson.apa
[
https://issues.apache.org/jira/browse/MAHOUT-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016279#comment-13016279
]
Sean Owen commented on MAHOUT-510:
--
It does quite little now, if not nothing. It essentia
[
https://issues.apache.org/jira/browse/MAHOUT-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016262#comment-13016262
]
Lance Norskog commented on MAHOUT-510:
--
MatrixTest still has a testLabelBindingSerial
[
https://issues.apache.org/jira/browse/MAHOUT-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016178#comment-13016178
]
Hudson commented on MAHOUT-510:
---
Integrated in Mahout-Quality #722 (See
[https://hudson.apa
[
https://issues.apache.org/jira/browse/MAHOUT-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015978#comment-13015978
]
Ted Dunning commented on MAHOUT-510:
Yes. Preserving those is all that is needed.
Th
[
https://issues.apache.org/jira/browse/MAHOUT-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015846#comment-13015846
]
Sean Owen commented on MAHOUT-510:
--
Ted are you saying it's simply sufficient to retain t
[
https://issues.apache.org/jira/browse/MAHOUT-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015666#comment-13015666
]
Ted Dunning commented on MAHOUT-510:
That stitching has to be kind of fancy because th
[
https://issues.apache.org/jira/browse/MAHOUT-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015642#comment-13015642
]
Sean Owen commented on MAHOUT-510:
--
No problem I can stitch that back in with a Writable-
[
https://issues.apache.org/jira/browse/MAHOUT-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015623#comment-13015623
]
Ted Dunning commented on MAHOUT-510:
The deletion of ModelSerializer in its entirety i
[
https://issues.apache.org/jira/browse/MAHOUT-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015601#comment-13015601
]
Sean Owen commented on MAHOUT-510:
--
Yes it's just a few wiki pages, it seems -- I just se
[
https://issues.apache.org/jira/browse/MAHOUT-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015449#comment-13015449
]
Isabel Drost commented on MAHOUT-510:
-
> The only remaining concern was that this will
Ah, ok, now I see what you are talking about. This is a bit of laziness
on my part that I forgot about. The ModelDistribution is produced from
3-4 argument values (modelFactory, modelPrototype, distanceMeasure,
prototypeSize) from the command line. You could just pass those argument
values (all
The idea was to remove all use of JSON in an attempt to reduce the number of
different serialization approaches used. So at the moment I'm trying to
figure out what happens when I delete everything related to JSON. Most of it
goes quietly.
The only use that seems, well, actively used is the bit in
I am only trying to understand what Sean means when he mentions passing
state via configuration.
On Mon, Jan 17, 2011 at 9:17 AM, Jeff Eastman wrote:
> Dirichlet uses Writable to serialize its iteration output state (to
> clusters-n). I'm confused about what your trying to do.
>
>
>
> On 1/17/11
Dirichlet uses Writable to serialize its iteration output state (to
clusters-n). I'm confused about what your trying to do.
On 1/17/11 9:58 AM, Ted Dunning wrote:
This sort of thing is what the distributed cache was designed for.
On Mon, Jan 17, 2011 at 8:53 AM, Sean Owen wrote:
Do you thi
This sort of thing is what the distributed cache was designed for.
On Mon, Jan 17, 2011 at 8:53 AM, Sean Owen wrote:
> Do you think the way forward is to leave it, or use Writable and write the
> model distribution to a file, or something else?
>
The bit I wasn't able to take out easily was serialzing ModelDistribution to
a JSON string and then passing that via the Configuration object.
Indeed everywhere else it was just another output option.
Do you think the way forward is to leave it, or use Writable and write the
model distribution to
Dirichlet supports JSON but uses Writable internally as do the rest of
the clustering algorithms.
On 1/17/11 8:50 AM, Sean Owen wrote:
The idea behind MAHOUT-510 was to try to standardize serialization
mechanisms as much as reasonable. It would be counterproductive to remove
one and add another
The idea behind MAHOUT-510 was to try to standardize serialization
mechanisms as much as reasonable. It would be counterproductive to remove
one and add another, I think. There was some support for using Avro for text
serialization instead of JSON, even though that has the same issue -- so I
think
Protobufs are a good choice :)
I think that sequence file might be the next step but the current step is
just a writable. I have a PolymorphicWritable static class that helps with
that a bit.
I am leaning toward protobufs as a replacement for JSON. There is human
readable protobuf syntax. Avro is another option.
I don't gen
[
https://issues.apache.org/jira/browse/MAHOUT-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982483#action_12982483
]
Robin Anil commented on MAHOUT-510:
---
The changes are fine. What are we moving towards for
[
https://issues.apache.org/jira/browse/MAHOUT-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982416#action_12982416
]
Ted Dunning commented on MAHOUT-510:
(replied instead of commenting ... sorry for the d
Putting data objects in the Configuration is a bit of a misuse (it has been
the subject of an argument on the hadoop mailing lists for a long time now).
I would leave this use in place for now and later refactor to read from
HDFS. That has more legs in any case when it comes to using the clusteri
[
https://issues.apache.org/jira/browse/MAHOUT-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982278#action_12982278
]
Sean Owen commented on MAHOUT-510:
--
(BTW I'm not committing this for some time.)
I've man
[
https://issues.apache.org/jira/browse/MAHOUT-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982197#action_12982197
]
Ted Dunning commented on MAHOUT-510:
I think that this patch looks pretty reasonable.
+1 from me, the guy who is usually really conservative regarding killing
code.
-jake
On Tue, Jan 11, 2011 at 4:22 PM, Ted Dunning wrote:
> The book might need to change to track changes like that. Other than that,
> the time I spent with GSON I consider to be lost hours of my life. Binary
>
Sounds good. I'll proceed but am going to sit on this change for a long
while.
On Wed, Jan 12, 2011 at 12:22 AM, Ted Dunning wrote:
> The book might need to change to track changes like that. Other than that,
> the time I spent with GSON I consider to be lost hours of my life. Binary
> seriali
The book might need to change to track changes like that. Other than that,
the time I spent with GSON I consider to be lost hours of my life. Binary
serialization with model inspection tools is really the only way to go for
our classification needs.
On Tue, Jan 11, 2011 at 3:20 PM, Sean Owen (JI
[
https://issues.apache.org/jira/browse/MAHOUT-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980423#action_12980423
]
Sean Owen commented on MAHOUT-510:
--
So, I'm starting to take a crack at this. I've identif
30 matches
Mail list logo