I am interested in contributing. The next two weeks will be a little
busy, as it is end of term, but I would be more than happy to work
over the summer on this project.
Perhaps you can also give me some advice on how to accomplish a few
tasks. Currently I am using NLineInputFormat to
the overall feasibility of
the proposal.
Proposal for GSoC 2010 (EigenCuts clustering algorithm for Mahout)
--
Key: MAHOUT-363
URL: https://issues.apache.org/jira/browse/MAHOUT-363
Project
Like Ted said, its a bit late for a GSOC proposal, but I am excited at the
possibility of improving the frequent pattern mining package. Check out the
current Parallel FPGrowth implementation in the code, you can find more
explanation on usage the Mahout wiki. Apriori should be trivially
Timeline including Apache internal deadlines:
http://cwiki.apache.org/confluence/display/COMDEVxSITE/GSoC
Mentors, please also click on the ranking link to the ranking explanation [1]
for more information on how to rank student proposals.
Isabel
[1]
like to apply for Mahout GSoC 2010. My proposal is to implement
Association Mining algorithm utilizing existing PFPGrowth implementation
(
http://cwiki.apache.org/MAHOUT/parallelfrequentpatternmining.html).
As for the Assoiciation Mining I would like to implement a very general
algorithm
plus for a GSOC project.
Robin
On Mon, Mar 29, 2010 at 1:46 AM, Lukáš Vlček lukas.vl...@gmail.com
wrote:
Hello,
I would like to apply for Mahout GSoC 2010. My proposal is to
implement
Association Mining algorithm utilizing existing PFPGrowth
implementation
GSOC 2010 Proposal Implement Map/Reduce Enabled Neural Networks (mahout-342)
-
Key: MAHOUT-374
URL: https://issues.apache.org/jira/browse/MAHOUT-374
Project: Mahout
Lukas,
The strongest alternative for this kind of application (and the normal
choice for large scale applications) is on-line gradient descent learning
with an L_1 or L_1 + L_2 regularization. The typical goal is to predict
some outcome (click or purchase or signup) from a variety of large
Ted,
do you think you can give some good links to paper or orther resources about
mentioned approaches? I would like to look at it after the weekend.
As far as I can see the association mining (and the guha method in its
original form) is not meant to be a predictive method but rather data
Hello,
I just wanted to introduce myself. I am a MSc. Computer Science
student at the University of Victoria. My research over the past year
has been focused on developing and implementing an Apriori based
frequent item-set mining algorithm for mining large data sets at low
support counts.
Neal, I think that this might well be a useful contribution to Mahout, but,
if I am not mistaken, I think that the deadline for student proposals for
GSoC has just passed.
That likely means that making this contribution an official GSoC project is
not possible. I am sure that the Mahout
for GSoC 2010 (EigenCuts clustering algorithm for Mahout)
--
Key: MAHOUT-363
URL: https://issues.apache.org/jira/browse/MAHOUT-363
Project: Mahout
Issue Type: Task
comments to this JIRA ticket,
instead of editing the original ticket itself, we'll be able to more easily
follow your thinking. Otherwise, we can't really see what has changed.
Proposal for GSoC 2010 (EigenCuts clustering algorithm for Mahout
some of the wording; the overall
proposal structure wasn't changed. But I will certainly refrain from editing
the ticket itself.
Are there any other suggestions for making the proposal more viable?
Proposal for GSoC 2010 (EigenCuts clustering algorithm for Mahout
how you get side-tracked after you
start. :-)
Proposal for GSoC 2010 (EigenCuts clustering algorithm for Mahout)
--
Key: MAHOUT-363
URL: https://issues.apache.org/jira/browse/MAHOUT-363
Hello,
I'm posting a draft for my proposal for this year's GSoC. I kindly ask for
your feedback on it.
I have also posted a JIRA ticket with it:
https://issues.apache.org/jira/browse/MAHOUT-365 .
Thank you in advance.
Cristi.
gradient.
In the Reducer class:
- There's a single reducer class that will combine all the partial
gradients from the Mappers to get the overall batch gradient.
- The final error gradient vector is written back to the FileSystem
** I propose to complete all of the following sub-tasks during GSoC
computer science degree from Georgia Tech, and after an
internship with IBM ExtremeBlue, I feel I am extremely adept at picking up new
frameworks quickly.
References
[1] Chakra Chennubhotla and Allan D. Jepson. Half-Lives of EigenFlows for
Spectral Clustering. NIPS 2002.
Proposal for GSoC 2010
code. I believe the k-means
you are looking to implement is already there it will shave 2 weeks of your
GSOC :). Reading the code/wiki is a great exercise for you to be more realistic
in your proposal
Proposal for GSoC 2010 (EigenCuts clustering algorithm for Mahout
, given its ease of implementation.
That's just my explanation; if you feel otherwise I'm happy to adjust my
proposal :)
Proposal for GSoC 2010 (EigenCuts clustering algorithm for Mahout)
--
Key: MAHOUT-363
understand
this method. Maybe starting from a transaction of shopping cart item ? A
great demo is big plus for a GSOC project.
Robin
On Mon, Mar 29, 2010 at 1:46 AM, Lukáš Vlček lukas.vl...@gmail.com wrote:
Hello,
I would like to apply for Mahout GSoC 2010. My proposal is to implement
Association
this method. Maybe starting from a transaction of shopping cart item ? A
great demo is big plus for a GSOC project.
Robin
On Mon, Mar 29, 2010 at 1:46 AM, Lukáš Vlček lukas.vl...@gmail.comwrote:
Hello,
I would like to apply for Mahout GSoC 2010. My proposal is to implement
Association Mining
Proposal for GSoC 2010 (EigenCuts clustering algorithm for Mahout)
--
Key: MAHOUT-363
URL: https://issues.apache.org/jira/browse/MAHOUT-363
Project: Mahout
Issue Type: Task
would certainly improve the feasibility of the project
timeline and allow me to further refine the overall algorithm. I will
absolutely adhere to your advice; I'll edit this ticket and my GSoC
application. Thank you again!
Proposal for GSoC 2010 (EigenCuts clustering algorithm for Mahout
for a GSoC project. I
wish I had the time to help with mentoring this project, in fact.
Proposal for GSoC 2010 (EigenCuts clustering algorithm for Mahout)
--
Key: MAHOUT-363
URL: https
http://socghop.appspot.com/document/show/gsoc_program/google/gsoc2010/faqs#timeline
Hi,
Can anyone please point me a good data set on which I might try SimHash
clustering ?
Thank you,
Cristi
On Tue, Mar 23, 2010 at 10:35 AM, cristi prodan
prodan.crist...@gmail.comwrote:
Hello again,
First of all, thank you all for taking time to answer my ideas. Based on
your thoughts, I
Why dont you try it on 20 newsgroups. There are about 17-18 unique topics
and couple of overlapping ones. You can easily find issues with the
clustering code with that dataset. Once its done you can try bigger datasets
like wikipedia
Robin
On Thu, Apr 1, 2010 at 12:02 PM, Cristian Prodan
Thanks Robin, I will try have a look at that.
Cristi.
On Thu, Apr 1, 2010 at 9:36 AM, Robin Anil robin.a...@gmail.com wrote:
Why dont you try it on 20 newsgroups. There are about 17-18 unique topics
and couple of overlapping ones. You can easily find issues with the
clustering code with that
Hi
I want to work under MAHOUT-328 for my GSOC 2010 project.How do I apply?
Thanking You
Tanya
Hi
I would like a detailed project description for MAHOUT-328.
Thanking You
Tanya Gupta
Hi Tanya,
MAHOUT-328 is just a general stub. There is no detailed project
description other than what is given there. The idea is we let you propose
to implement a clustering algorithm in Mahout. Start here
http://cwiki.apache.org/MAHOUT/gsoc.html. Browse through the Wiki. Look at
:
Hi
I want to work under MAHOUT-328 for my GSOC 2010 project.How do I apply?
Thanking You
Tanya
GSOC 2010 project.How do I apply?
Thanking You
Tanya
Hello,
I would like to apply for Mahout GSoC 2010. My proposal is to implement
Association Mining algorithm utilizing existing PFPGrowth implementation (
http://cwiki.apache.org/MAHOUT/parallelfrequentpatternmining.html).
As for the Assoiciation Mining I would like to implement a very general
Since Sean already answered IDEA-2, I'll reply to IDEA 1.
Minhash (and Shingling in general) are very efficient clustering techniques
that have traditionally been employed by Search engines for near-duplicate
detection of web documents. They are known to be efficient and effective at
Dear Mahout community,
My name is Cristi Prodan, I'm 23 years old and currently a 2nd year student
pursuing a MSc degree in Computer Science.
I started studying machine learning in the past year and during my research I
found about the Mapreduce model. Then, I discovered hadoop and Mahout. I
I think that's a fine project indeed. It sounds even a little
ambitious for a GSoC project. Understanding, implementing, and
parallelizing this approach is not trivial. If you want to propose it,
sure, but scaling it back a little is probably OK too. As always it's
best to propose a simple project
Minhash clustering is important for duplicate detection. You can also
do simhash
clusteringhttp://simhash.googlecode.com/svn/trunk/paper/SimHashWithBib.pdfwhich
might be simpler to implement. I can imagine an implementation where
the map generates the simhash and emits multiple copies keyed on
to
change your currently timeline to accurately reflect that.
I will post more queries about the design choice later
Robin
On Fri, Mar 12, 2010 at 4:18 PM, zhao zhendong zhaozhend...@gmail.comwrote:
Hi all,
The updated proposal for GSoC 2010 is as follows, any comment is welcome.
Title/Summary
all,
The updated proposal for GSoC 2010 is as follows, any comment is
welcome.
Title/Summary:
Linear SVM Package (LIBLINEAR) for Mahout Student: Zhen-Dong Zhao
Student
e-mail: zha...@comp.nus.edu.sg Student Major: Multimedia Information
Retrieval /Computer ScienceStudent Degree
Dunning wrote:
Apache is definitely going to participate. If Mahout gets strong
candidates, we would probably will get one or more slots.
On Mon, Mar 8, 2010 at 10:06 AM, zhao zhendong zhaozhend...@gmail.com
wrote:
Robin told me Mahout gonna apply GSOC 2010 as a mentor. Can anybody
On Mar 9, 2010, at 12:27 PM, zhao zhendong wrote:
Hi Robin Ted and Grant,
Thank you very much.
To Grant:
One more thing, could you please tell us the link of archives you
mentioned before?
There's a bunch of 'em, but my personal fav. is
http://search.lucidimagination.com ;-)
Just
Hi
Robin told me Mahout gonna apply GSOC 2010 as a mentor. Can anybody tell me
the answer? I really appreciate this chance.
Thanks,
--
-
Zhen-Dong Zhao (Maxim)
Department of Computer Science
School of Computing
National
GSOC 2010 as a mentor. Can anybody tell
me
the answer? I really appreciate this chance.
Thanks,
--
-
Zhen-Dong Zhao (Maxim)
Department of Computer Science
School of Computing
National University of Singapore
Apache is definitely going to participate. If Mahout gets strong
candidates, we would probably will get one or more slots.
On Mon, Mar 8, 2010 at 10:06 AM, zhao zhendong zhaozhend...@gmail.comwrote:
Robin told me Mahout gonna apply GSOC 2010 as a mentor. Can anybody tell
me
the answer? I
On Mon Robin Anil robin.a...@gmail.com wrote:
2. UIMA Integration with Mahout? (Maybe a good project if UIMA folks
are taking in GSOC students)
I guess one could easily split this one in two:
a) Using UIMA (whole pipeline or just the analysers if that is possible)
for data pre-processing
On Wed Robin Anil robin.a...@gmail.com wrote:
Greetings! Fellow GSOC alums, administrators and dear mentors, the
next edition is right here. Details are given in the link below.
https://groups.google.com/group/google-summer-of-code-discuss/browse_thread/thread/d839c0b02ac15b3f
Some
Some more Wild and Wacky Ideas. Might be out of scope for GSOC, but are nice
to have features for mahout. I would like to encourage all of you to put
down your ideas here.
1. Data Visualization tool backed with HDFS/Hbase for inspecting clusters,
Topic model etc etc
- It could have many
Greetings! Fellow GSOC alums, administrators and dear mentors, the next
edition is right here. Details are given in the link below.
https://groups.google.com/group/google-summer-of-code-discuss/browse_thread/thread/d839c0b02ac15b3f
Maybe we could identify key areas in Mahout which we need to
50 matches
Mail list logo