Re: Mahout with Storm/Spark

2014-03-07 Thread vineet yadav
Hi Ted, It is Mahout 0.7. Thanks Vineet Yadav On Thu, Mar 6, 2014 at 11:58 PM, Ted Dunning wrote: > WHich version are you using? > > > On Thu, Mar 6, 2014 at 5:47 AM, vineet yadav >wrote: > > > Hi, > > I am using Mahout LDA algorithm for Topic Modeling on a hu

Mahout with Storm/Spark

2014-03-06 Thread vineet yadav
topic modeling. I want to know if anyone has tried storm or spark with mahout to speed up the process. Thanks Vineet Yadav

Regarding meanshift clustering

2013-04-15 Thread vineet yadav
Hi, Has anyone tried meanshift clustering. I am using meanshift clustering on numerical data. The dataset consists 200k instances. I want to know what should be appropriate value parameters like t1, t2, etc. Can anyone give pointers for that. Thanks Vineet Yadav

Re: Huge classification engine

2011-04-01 Thread vineet yadav
://www.linkedin.com/groups/Mining-Wikipedia-Hadoop-Pig-Natural-115439.S.39911336?goback=.gna_115439 Cheers Vineet Yadav On Fri, Apr 1, 2011 at 9:27 PM, Martin Provencher wrote: > Dan, I think what you propose make a lot of sense. I won't try to use Nutch > for now since we already have our craw

Re: Huge classification engine

2011-04-01 Thread vineet yadav
Hi, I suggest you to use Map-reduce with crawler architecture for crawling local file system. Since parsing HTML pages creates more overhead delays. Thanks Vineet Yadav On Fri, Apr 1, 2011 at 1:07 PM, Sreejith S wrote: > Mahout can handle huge amount of data set.As a personal experie

Re: SVM and logistic regression in Mahout

2011-03-26 Thread vineet yadav
a was dropped since SVM is scalable. You can use mahout SGD algorithm for logistic regression(https://cwiki.apache.org/confluence/display/MAHOUT/Logistic+Regression). Thanks Vineet Yadav On Sat, Mar 26, 2011 at 7:08 PM, Patrick Diviacco wrote: > Hello, > > I'm new to Mahout and I ne

Re: question about set up mahout

2011-03-26 Thread vineet yadav
slow. Also make sure if Jobtracker, tasktracker are running, all class path are set. Thanks Vineet Yadav On Sat, Mar 26, 2011 at 6:45 AM, Glworld Net wrote: > Hi all, > > I'm new to mahout and was trying to set up mahout on my single-node hadoop > cluster (0.20.2). > Aft

Re: GSOC

2011-03-21 Thread vineet yadav
-statistical) is used parse ranking for link-grammar parser, chunk ranking. The second approach is to train dependency parsers like malt parser (http://maltparser.org/) or mst parser(http://sourceforge.net/projects/mstparser/) from dependency treebank. Thanks Vineet Yadav On Tue, Mar 22, 2011 at 1:22 AM, Ted

Re: GSOC

2011-03-21 Thread vineet yadav
(http://wiki.opencog.org/w/Probabilistic_Logic_Networks) which can also be useful for your projects. Thanks Vineet Yadav On Mon, Mar 21, 2011 at 12:42 AM, Harsh wrote > I wanted to develop a natural language parser that can tell the theme of any > particular para in a given document. Thi

Re: clustering using n-grams

2011-03-18 Thread vineet yadav
Hi Sambhu, Are you using your own algorithm for n-grams generation ? Well you can use mahout seqsparse(https://cwiki.apache.org/MAHOUT/collocations.html) which is used to extract n-grams and collocations and use it k-mean clustering. Thanks Vineet Yadav On Sat, Mar 19, 2011 at 10:43 AM

Re: clustering using n-grams

2011-03-16 Thread vineet yadav
Hi Sambhu, Check out Grant Article on Lucid Imagination http://www.lucidimagination.com/blog/2010/03/16/integrating-apache-mahout-with-apache-lucene-and-solr-part-i-of-3/ Thanks Vineet Yadav On Thu, Mar 17, 2011 at 11:39 AM, shambhusingh wrote: > I have created the lucene index for datab

Re: kmeans

2011-03-10 Thread vineet yadav
Hi Manish, Have you tried mahout k-means algo, you can find instruction on link https://cwiki.apache.org/MAHOUT/k-means-clustering.html Thanks Vineet Yadav On Thu, Mar 10, 2011 at 3:39 PM, wrote: > Hii > I have been trying to run kmeans code in hadoop...I picked up the code from &

Re: mahout

2011-03-07 Thread vineet yadav
Hi devichand, You can also look at Mahout in action first chapter(http://www.manning.com/owen/Mahout_MEAP_CH01.pdf) as a reference for Mahout installation. Thanks Vineet Yadav On Mon, Mar 7, 2011 at 2:05 PM, devichand saini wrote: > Hi >   I am new to mahout.Can anyone tell me which configu

Re: Reg- mahout

2011-03-07 Thread vineet yadav
Hi sudha, Check out k-means, fuzzy k-means, mean shift and canopy clustering algorithm on mahout wiki(https://cwiki.apache.org/MAHOUT/algorithms.html) Thanks Vineet Yadav On Mon, Mar 7, 2011 at 2:20 PM, sudha sadhasivam wrote: > Sir > We are exposed to hadoop. We like to use mahout for clus

Re: Regarding classification of URL's

2011-03-01 Thread vineet yadav
Hi Arjun, you need to scrap content from website for a given url, and then need to prepare training datasets from scarped content for Bayesian classification. Also check out mahout twenty news groups example for reference https://cwiki.apache.org/MAHOUT/twenty-newsgroups.html Thanks Vineet Yadav

Re: Mahout for Keyword Extraction

2011-02-03 Thread vineet yadav
semantic approaches. I think you are looking some supervised(classification) algorithm for keyphrase extraction. I suggest to look at kea( http://www.nzdl.org/Kea/download.html) and maui-indexer( http://code.google.com/p/maui-indexer/) Thanks Vineet Yadav On Thu, Feb 3, 2011 at 6:51 PM, Joyce Babu

Re: Need help: beginner

2011-02-02 Thread vineet yadav
Hi sarath, Can you post the exact argument you passed to call the job ? Thanks Vineet Yadav On Thu, Feb 3, 2011 at 3:37 AM, sharath jagannath < sharathjagann...@gmail.com> wrote: > Hey All, > > It is again me with probably another stupid query but I am having hard time >

Re: Hadoop error running Wikipedia exercise

2011-02-01 Thread vineet yadav
Hi Lance, It is reading from local file system and not from hadoop file system. Please check hadoop configuration. Since you are getting error while creating wikipedia dataset. So make sure that you have enough disk space available in your system, since wikipedia datasets is huge. Thanks Vineet

Re: Recommeding on Dynamic Content

2011-02-01 Thread vineet yadav
Hi Ted, Yes, In paper they have mentioned the point that "locally optimized co-clustering gives poor result in iterative learning", so they have used evolutionary co-clustering that gives better result. Thanks Vineet Yadav On Wed, Feb 2, 2011 at 1:12 AM, Ted Dunning wrote: >

Re: Recommeding on Dynamic Content

2011-02-01 Thread vineet yadav
ng is used to cluster row and column(items and user) simultaneously. Also check master thesis "RECOMMENDING
 ARTICLES 
FOR 
AN
 ONLINE 
NEWSPAPER
" ( http://www.ilk.uvt.nl/downloads/pub/papers/hait/kneepkens2009.pdf). Thanks Vineet Yadav On Tue, Feb 1, 2011 at 10:32 PM, Ted Dunning

Re: Incremental data stream clustering.

2011-02-01 Thread vineet yadav
centers. But you need to make sure documents/posts in each pass are related for better result. Thanks Vineet Yadav On Tue, Feb 1, 2011 at 11:58 PM, sharath jagannath < sharathjagann...@gmail.com> wrote: > Hey All, > > Another new bie to mahout. > I want to implement a sy