unable to find the job id.

2011-10-11 Thread Gaurav
Hello, When i try to kill a process i am unable to find the process id after using the command :hadoop job -list It says no jobs running. i am running the canopy clustering example by typing the following command: mahout org.apache.mahout.clustering.syntheticcontrol.canopy.Job --input --output

Cluster dumper crashes when run on a large dataset

2011-11-03 Thread gaurav redkar
to build the "result" map. Any idea how do i fix this.?? I am working on a dataset of size 40mb. I had tried increaseing the heap space but with no luck. Thanks Gaurav

Re: Cluster dumper crashes when run on a large dataset

2011-11-03 Thread gaurav redkar
the vectors might be too large. How many > dimensions are you having in your Vector? > > > On 04-11-2011 10:57, gaurav redkar wrote: > >> Hello, >> >> I am in a fix with the Clusterdumper utility. The clusterdump utility >> crashes when it tries to ou

Re: Cluster dumper crashes when run on a large dataset

2011-11-03 Thread gaurav redkar
ich will > populate only the dimensions which you are using. This can also decrease > memory consumption. > > > On 04-11-2011 11:19, gaurav redkar wrote: > >> Hi, >> >> yes Paritosh..even i think the same. actually i am using a test data set >> that has 5000 tup

Re: Cluster dumper crashes when run on a large dataset

2011-11-03 Thread gaurav redkar
lusterFilter > 0 , which might help in reducing > the number of clusters that you are getting as output, which, in turn, > might help in less memory usage. > > > On 04-11-2011 11:43, gaurav redkar wrote: > >> Actually i have to run the meanshift algorithm on a large dataset

Re: Cluster dumper crashes when run on a large dataset

2011-11-03 Thread gaurav redkar
sterId does not exist){ >create directory of name clusterId >} > writeVectorInDirectoryNamedClu**sterId(); > > } > > On 04-11-2011 12:09, gaurav redkar wrote: > >> Thanks a lot for ur help. Yes i will be running it on a hadoop cluster. >> Can >> u elab

meanshift clustering

2011-11-09 Thread gaurav redkar
Hi.. I am unable to identify where is the clusterPoints() function in the MeanShiftCanopyClusterer.java file being called during the execution of Meanshift job. What i need to know is where are the files in clusteredPoints n clusters-* directory being written when we run the job on hadoop. build

Re: meanshift clustering

2011-11-10 Thread gaurav redkar
Wed, Nov 9, 2011 at 11:27 PM, Jeff Eastman wrote: > See inline, > Jeff > > -----Original Message- > From: gaurav redkar [mailto:gauravred...@gmail.com] > Sent: Wednesday, November 09, 2011 4:09 AM > To: user@mahout.apache.org > Subject: meanshift clustering &

incosistent output while using clusterdumper

2011-11-11 Thread gaurav redkar
Hello After using clusterdumper on the output generated by meanshift algorithm, i see following type of result. MSV-441 {n=2 c=[0.003, -0.002,0.005,0.001,etc MSV-770{n=1 c=[0:-0.025,1:0.011,2:0.032,..etc As seen above in MSV-441 there is no presence of ":" in the output whereas MSV-770 h

Mahout fpg missing patterns

2011-12-18 Thread gaurav singh
correct and do exist in the data set with correct value of their support. Can anyone please explain me the reason?? Thanks!! -- regards Gaurav Singh -- regards Gaurav Singh -- regards Gaurav Singh

Re: Mahout fpg missing patterns

2011-12-19 Thread gaurav singh
That seems to make sense. What do you mean by " Mahout will not report any of those unless the support is strictly greater than 3. " Is there a way for me to get all the patterns with support strictly greater then a particular value? Thanks Gaurav On Mon, Dec 19, 2011 at 4:58 PM,

Re: Mahout fpg missing patterns

2011-12-19 Thread gaurav singh
You were a real help Tom! Thanks Gaurav On Mon, Dec 19, 2011 at 5:33 PM, Tom Pierce wrote: > Maybe it's easiest to give an example. > > If you have input: > > a b c > a b c d > ac d > a b c > > You should expect Mahout to output (say, for support 2): &g

Help regarding ClusterOutputPostProcessor

2012-01-06 Thread gaurav redkar
Hello, wen I ran the ClusterOutputPostProcessor on synthetic_control_data in mapreduce mode, I observed that one directory contained points belonging to 2 other clusters and the directories relating to those 2 clusters were not created as their "part- *" files were empty and the function "movePart

Mahout and Hadoop on Windows

2012-01-21 Thread gaurav singh
this class errors, like hadoop-0.19.1-core.jar,com.google.common.source_1.0.0.201004262004.jar etc to resolve many of the imported packages like com.google.common.io.Closeable etc. Did I do the right thing? Please any detailed light on its functioning would be great. Thanks everyone! -- regards Gaurav Singh

Re: Help regarding ClusterOutputPostProcessor

2012-01-25 Thread gaurav redkar
in d directory for each cluster. Any idea why is this happening ...? PS: the dataset on which i tested the algorithm has 1000 records with 200 attributes per record. I can share the dataset that i have used if needed. Thanks, Gaurav On Fri, Jan 6, 2012 at 6:12 PM, Paritosh Ranjan wrote

Re: Help regarding ClusterOutputPostProcessor

2012-01-30 Thread gaurav redkar
Hello. As Jeff mentioned, i created a JIRA issue. Kindly check out MAHOUT-966 <https://issues.apache.org/jira/browse/MAHOUT-966> and share your inputs. Thanks, Gaurav On Wed, Jan 25, 2012 at 8:51 PM, Jeff Eastman wrote: > Mean Shift accumulates the pointIds of every point assigned to

Re: Apache Mahout 0.6 Released

2012-02-06 Thread gaurav singh
Hi, When you say decision trees, you mean decision forest right? Is it possible to expect decision tree algorithm (C 4.5) to be in mahout, since the algorithm is pretty sequential in nature and won't be suitable for distributed processing. Regards Gaurav On Tue, Feb 7, 2012 at 2:49 AM, Sh

Re: How to get documents from the clusters?

2012-02-08 Thread gaurav redkar
Hi.. The clusteredPoints directory contains sequence files where each record is a pair. The format of each record is basically, this directory contains the mapping of each point in the dataset and the clusterID of the cluster to which it belongs. IntWritable is the clusterID of the cluster; We

Re: How to use clusterpp?

2012-02-17 Thread gaurav redkar
If that is the only thing that is contained in the part-r-* file, then the reducer responsible to write to that part-r-* file did not recieve any input records to write to it. This happens because the program uses the default hash partitioner which sometimes maps records belonging to different clus

Re: mahout pfp : isSubPatternof() function

2012-02-26 Thread gaurav singh
b 26, 2012 at 9:39 PM, tom wrote: > Hi Gaurav, > > The patterns are accumulated in a heap (see FrequentPatternMaxHeap), which > uses isSubPatternOf. > > That said, I do think the default implementation of PFPGrowth will get you > many redundant patterns under cer

Re: mahout pfp : isSubPatternof() function

2012-02-26 Thread gaurav singh
to hear if this persists after trying --useFPG2. > > -tom > > > > On 02/26/2012 12:06 PM, gaurav singh wrote: > >> Hi Tom, >> >> I don't understand, why do you say I will get a lot of redundant patterns? >> In each group dependent shard generates patterns wi

Re: Canopy estimator

2012-05-11 Thread gaurav redkar
ement in each column (ignoring the 0's in the diagonal) which will give me 1.4 ,1.36 , 1.36. to choose the value of t2 i intend to take mean of all the minimum elements in each column. then select the mean of these values , t2=1.37 Any comments on the approach Thanks Gaurav On Fri, May 1

Named Entity Extraction.

2012-06-09 Thread Gaurav Sehgal
s for your help, Gaurav

Dense matrix with kmeans

2012-06-20 Thread gaurav singh
. I just wish to know if it can be directly used with kmeans or I will have to write or customize kmeans for my purpose? Thanks for any help offered! -- Regards Gaurav Singh

Re: Dense matrix with kmeans

2012-06-20 Thread gaurav singh
Dunning wrote: > Yeah... you can probably do this. It will involve storing your matrices as > vectors and probably requires that they be the same size. > > Can you say more about the matrices in terms of size and how you compute > distance? > > On Wed, Jun 20, 2012 at 1:42 AM,

Re: Dense matrix with kmeans

2012-06-20 Thread gaurav singh
subtracting one matrix from another and the elements of resulting matrix should be squared and added. On Wed, Jun 20, 2012 at 5:38 PM, gaurav singh wrote: > Hi, > > The matrix if sparse can be very large like 1000 X 1000 but it will only > have at most 20 non-zero elements. That is

Re: Nave Bayes Classifier and probability calculation for a single test example

2011-05-03 Thread gaurav garg
Hi Svetlomir, You can use ClassifierContext class ( https://builds.apache.org/hudson/job/Mahout-Quality/javadoc/index.html?org/apache/mahout/common/StringTuple.html) to get the top N matching result and their respective scores. Hope it helps. Thanks Gaurav

Re: Nave Bayes Classifier and probability calculation for a single test example

2011-05-04 Thread gaurav garg
ill work. Have you read Renny's paper? http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.13.8572 **** Thanks gaurav On Wed, May 4, 2011 at 6:16 PM, Svetlomir Kasabov < skasa...@smail.inf.fh-brs.de> wrote: > Hello Gaurav, >