date:20141009

Re: Mahout 1.0: is DRM too file-bound?

2014-10-09 Thread Reinis Vicups

Guys, thank you very much for your feedback. I have already my own vanilla spark-based implementation of row similarity that reads and writes into NoSQL (in my case HBase). My intention is to profit from your effort to abstract algebraic layer from physical backend because I find it a great i

Re: mahout seqdirectory NoClassDefFoundError

2014-10-09 Thread Peng Zhang

Hi, You want to upgrade your hadoop to version 2.4. Peng Zhang On Oct 10, 2014, at 11:01 AM, slee wrote: > Hi all: > > When I try to run the 20news-group > example(http://mahout.apache.org/users/classification/twenty-newsgroups.html),I > get the error below: > > > mahout seqdirectory -i $W

mahout seqdirectory NoClassDefFoundError

2014-10-09 Thread slee

Hi all: When I try to run the 20news-group example(http://mahout.apache.org/users/classification/twenty-newsgroups.html),I get the error below: mahout seqdirectory -i $WORK_DIR/20news-all -o $WORK_DIR/20news-seq MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. Error: Could not find

Re: Mahout 1.0: is DRM too file-bound?

2014-10-09 Thread Pat Ferrel

There is also the mahout Reader and Writer traits and classes that currently work with text delimited file I/O. These were imagined as a general framework to support parallelized read/write to any format and store using whatever method is expedient, including the ones Dmitriy mentions. I persona

Re: Mahout 1.0: is DRM too file-bound?

2014-10-09 Thread Dmitriy Lyubimov

Bottom line, some very smart people decided to do all that work in Spark and give us for free. Not sure why, but that did. If the capability already found in Spark, there's no need for us to replicate it. WRT specifically NoSql, Spark can read HBase trivially. I also did a bit more advanced thing

Re: Mahout 1.0: is DRM too file-bound?

2014-10-09 Thread Dmitriy Lyubimov

Matrix defines structure. Not necessarily where it can be imported from. You're right in the sense that framework itself avoids defining apis for custom partition formation. But you're wrong in implying you cannot do it if you wanted, our that you d have to do anything that complex as you say. As

Re: SSVD Q-Job taking very long even after 100% ?

2014-10-09 Thread Dmitriy Lyubimov

I don't remember the code that well already to give you details, but a lot of jobs are actually reduce bound. Sent from my phone. On Oct 9, 2014 11:07 AM, "Yang" wrote: > my Q-Job MR job shows as 100% mapper complete (it's a map-only job) very > quickly, but the job itself does not finish, until

RE: SSVD: lease conflict due to 2 attempts using the same dir

2014-10-09 Thread Dmitriy Lyubimov

This is using side input, yes. But it is standard practice. For example, map side joins do basically the same. Specifically wrt opportunistic execution this should be fine. Hdfs does not disallow opening the same file for reading by multiple tasks iirc. Sent from my phone. On Oct 9, 2014 1:02 PM,

Re: Mahout 1.0: is DRM too file-bound?

2014-10-09 Thread Andrew Butkus

Correct me if wrong but This is done for distributed processing on large data sets and using map reduce principle and a common file type to do distributed processing. Sent from my iPhone > On 9 Oct 2014, at 20:56, Reinis Vicups wrote: > > Hello, > > I am currently looking into the new (DRM)

Re: SSVD Q-Job taking very long even after 100% ?

2014-10-09 Thread Yang

I commented out the code about compression, but the actual job console still shows mapreduce.output.fileoutputformat.compress as true On Thu, Oct 9, 2014 at 11:40 AM, Yang wrote: > it's possible that they are compressing the output, I'm now rebuilding > the code after commenting out the setOut

RE: SSVD: lease conflict due to 2 attempts using the same dir

2014-10-09 Thread Ken Krugler

The BtJob's BtMapper has some "interesting" logic in its setup routine, where it looks like it's creating a side-channel: /* * actually this is kind of dangerous because this routine thinks we need * to create file name for our current job and this will use -m- so it's

Mahout 1.0: is DRM too file-bound?

2014-10-09 Thread Reinis Vicups

Hello, I am currently looking into the new (DRM) mahout framework. I find myself wondering why is it so that from one side there is a lot of thought, effort and design complexity being invested into abstracting engines, contexts or algebraic operations, but from the other side, even abstract in

Re: SSVD Q-Job taking very long even after 100% ?

2014-10-09 Thread Yang

it's possible that they are compressing the output, I'm now rebuilding the code after commenting out the setOutputCompress(true) in the code also will run with compression param set to false but still it's quite surprising why compression should take so long (8--10minutes) On Thu, Oct 9, 2014

Re: SSVD: lease conflict due to 2 attempts using the same dir

2014-10-09 Thread Dmitriy Lyubimov

wow. good to know. however, it would seem to me like a bug in MultipleOutputs? either way, it doesn't seem anything to do with the Mahout code itself. On Thu, Oct 9, 2014 at 10:32 AM, Yang wrote: > yes, that's what I'm saying, I disabled speculative, it works for now (kind > of is a hack) > > >

SSVD Q-Job taking very long even after 100% ?

2014-10-09 Thread Yang

my Q-Job MR job shows as 100% mapper complete (it's a map-only job) very quickly, but the job itself does not finish, until about 10 minutes later. this is rather surprising. my input is a sparse vector of 37000 rows, and the column count is 8000, with each row usually having < 10 elements set to n

Re: SSVD: lease conflict due to 2 attempts using the same dir

2014-10-09 Thread Yang

yes, that's what I'm saying, I disabled speculative, it works for now (kind of is a hack) also yes, this is hadoop 2.0 with YARN this has nothing to do with overwrite mode. the 2 attempts are run simultaneously because they are speculative runs On Wed, Oct 8, 2014 at 12:07 AM, Serega Sheypak w

Re: Streaming K Means exception without any reason

2014-10-09 Thread Marko Dinić

Here is the dataset, I've just checked to be sure it is the right one. On 09.10.2014. 15:34, Suneel Marthi wrote: Heh u r data size is tiny indeed. One of the edge conditions I was alluding to was the failures of this implementation on tiny datasets. Do u see any output clusters? If so how

Re: Streaming K Means exception without any reason

2014-10-09 Thread Marko Dinić

Here is the dataset. On четвртак, 09. октобар 2014. 16:53:25 CEST, Marko Dinić wrote: Yes it is small, but it is just a sample, so the dataset will probably be much bigger. So you think that this was the problem? Will this problem be avoided in case of larger dataset? I think that there were no

Re: Streaming K Means exception without any reason

2014-10-09 Thread Marko Dinić

Yes it is small, but it is just a sample, so the dataset will probably be much bigger. So you think that this was the problem? Will this problem be avoided in case of larger dataset? I think that there were no output clusters, as I remember. I'm sending the dataset, if you want to take a look.

Re: Streaming K Means exception without any reason

2014-10-09 Thread Suneel Marthi

Heh u r data size is tiny indeed. One of the edge conditions I was alluding to was the failures of this implementation on tiny datasets. Do u see any output clusters? If so how many points? possible to share ur dataset to troubleshoot ? On Thu, Oct 9, 2014 at 9:18 AM, Marko Dinić wrote: >

Re: Streaming K Means exception without any reason

2014-10-09 Thread Marko Dinić

Suneel, Thank you for your answer, this was rather strange to me. The number of points is 942. I have multiple runs, in each run I have a loop in which number of clusters is increased in each iteration and I multiple that number by 3, since I'm expecting log(n) initial centroids, before Ball

Re: Streaming K Means exception without any reason

2014-10-09 Thread Suneel Marthi

Seen this issue happen a few times before, there are few edge conditions that need to be fixed in the Streaming KMeans code and you are right that the generated clusters are different on successive runs given the same input. IIRC this stacktrace is due to BallKMeans failing to read any input centr

Streaming K Means exception without any reason

2014-10-09 Thread Marko Dinić

Hello everyone, I'm using Mahout Streaming K Means multiple times in a loop, every time for same input data, and output path is always different. Concretely, I'm increasing number of clusters in each iteration. Currently it is run on a single machine. A couple of times (maybe 3 of 20 runs) I

Re: Mahout 1.0: is DRM too file-bound?

Re: mahout seqdirectory NoClassDefFoundError

mahout seqdirectory NoClassDefFoundError

Re: Mahout 1.0: is DRM too file-bound?

Re: Mahout 1.0: is DRM too file-bound?

Re: Mahout 1.0: is DRM too file-bound?

Re: SSVD Q-Job taking very long even after 100% ?

RE: SSVD: lease conflict due to 2 attempts using the same dir

Re: Mahout 1.0: is DRM too file-bound?

Re: SSVD Q-Job taking very long even after 100% ?

RE: SSVD: lease conflict due to 2 attempts using the same dir

Mahout 1.0: is DRM too file-bound?

Re: SSVD Q-Job taking very long even after 100% ?

Re: SSVD: lease conflict due to 2 attempts using the same dir

SSVD Q-Job taking very long even after 100% ?

Re: SSVD: lease conflict due to 2 attempts using the same dir

Re: Streaming K Means exception without any reason

Re: Streaming K Means exception without any reason

Re: Streaming K Means exception without any reason

Re: Streaming K Means exception without any reason

Re: Streaming K Means exception without any reason

Re: Streaming K Means exception without any reason

Streaming K Means exception without any reason

23 matches

Site Navigation

Mail list logo

Footer information