To be honest, i always cancelled the sketching after a while because i
wasn't satisfied with the points per second speed. The version used is the
0.8 release.
if i find the time i'm gonna look what is called when and where and how
often and what the problem could be.
On Thu, Dec 26, 2013 at 8:22
Interesting. In Dan's tests on sparse data, he got about 10x speedup net.
You didn't run multiple sketching passes did you?
Also, which version? There was a horrendous clone in there at one time.
On Wed, Dec 25, 2013 at 2:07 PM, Johannes Schulte <
johannes.schu...@gmail.com> wrote:
> ever
Happy Holidays everyone !!! :)
On Wed, Dec 25, 2013 at 8:09 AM, Andrew Musselman <
andrew.mussel...@gmail.com> wrote:
> Merry Christmas and a Happy New Year!
>
> > On Dec 24, 2013, at 3:36 PM, Stevo Slavić wrote:
> >
> > Happy Holidays Everyone!
> >
> >
> > On Tue, Dec 24, 2013 at 12:28 PM, Fra
[
https://issues.apache.org/jira/browse/MAHOUT-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13856671#comment-13856671
]
Yexi Jiang commented on MAHOUT-1388:
[~smarthi] OK, I'll add it. Currently, it only s
everybody should have the right to do
job.getConfiguration().set("mapred.reduce.child.java.opts", "-Xmx2G");
for that :)
For my problems, i always felt the sketching took too long. i put up a
simple comparison here:
g...@github.com:baunz/cluster-comprarison.git
it generates some sample vector
Not sure how that would work in a corporate setting wherein there's a fixed
systemwide setting that cannot be overridden.
Sent from my iPhone
> On Dec 25, 2013, at 9:44 AM, Sebastian Schelter wrote:
>
>> On 25.12.2013 14:19, Suneel Marthi wrote:
>>
>>
>>
>>
>>
On Tuesday, December
On 25.12.2013 14:19, Suneel Marthi wrote:
>
>
>
>
>
>>> On Tuesday, December 24, 2013 4:23 PM, Ted Dunning
>>> wrote:
>
>>> For reference, on a 16 core machine, I was able to run the sequential
>>> version of streaming k-means on 1,000,000 points, each with 10 dimensions
>>> in about 20 se
>>On Tuesday, December 24, 2013 4:23 PM, Ted Dunning
>>wrote:
>>For reference, on a 16 core machine, I was able to run the sequential
>>version of streaming k-means on 1,000,000 points, each with 10 dimensions
>>in about 20 seconds. The map-reduce versions are comparable subject to
>>scal
[
https://issues.apache.org/jira/browse/MAHOUT-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Suneel Marthi updated MAHOUT-1358:
--
Description:
Running StreamingKMeans Clustering with REDUCE_STREAMING_KMEANS = true and when
[
https://issues.apache.org/jira/browse/MAHOUT-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13856592#comment-13856592
]
Suneel Marthi commented on MAHOUT-1388:
---
[~yxjiang] Also please provide adequate Lo
@Johannes, how many datapoints did u have in ur test? Since the Streaming
KMeans runs through a single reducer how much memory did u have to allocate if
u had like a million data points? What was the expectedDistanceCutoff you had?
@All, My experience so far has been that once you are done wit
On Wednesday, December 25, 2013 5:20 AM, Sebastian Schelter
wrote:
Hi Johannes,
can you share some details about the dataset that you ran streaming
k-means on (number of datapoints, cardinality, etc)?
@Ted/Suneel Shouldn't the approximate searching techniques (e.g.
projection search) he
@Johannes, I didn't quite get reading your 2 emails if Streaming kmeans worked
for you or not? What were the issues you had identified with pending additions
and projection?
On Wednesday, December 25, 2013 5:40 AM, Johannes Schulte
wrote:
Hey Sebastian,
it was a text like clustering pr
Hey Sebastian,
it was a text like clustering problem with a dimensionality of 100 000, the
number of data points could have have been million but i always cancelled
it after a while (i used the java classes, not the command line version and
monitored the progress).
As for my statements above: The
Hi Johannes,
can you share some details about the dataset that you ran streaming
k-means on (number of datapoints, cardinality, etc)?
@Ted/Suneel Shouldn't the approximate searching techniques (e.g.
projection search) help cope with high dimensional inputs?
--sebastian
On 25.12.2013 10:42, Joh
Hi,
i also had problems getting up to speed but i made the cardinality of the
vectors responsible for that. i didn't do the math exactly but while
streaming k-means improves over regular k-means in using log(k) and
(n_umber of datapoints / k) passes, the d_imension parameter from the
original k*d*
16 matches
Mail list logo