On Sat, Jul 31, 2010 at 5:41 AM, Gregg Williams greg...@innerpaths.net wrote:
I've begun work on a visual front-end to display such infocards, using
Clojure and the Piccolo graphics library (http://piccolo2d.org/). If
you (or anybody else reading this) find this larger project
interesting,
Daniel (and anyone else reading this)
I would like to correspond with you because I'm working on a project
for which your word graphing is a subset. I invented a
standardized electronic notecard (see http://infoml.org), with the
idea that writers and others could dump chunks of information (with
As others have said, there isn't an algorithm that does this. Useful
results depend on precise definitions of context and similarity.
The waters get deep quickly.
As a clojure exercise, though, there are lots of good starting points.
For instance: get a set of words, create all pairs from the
What you describe is not clojure specific, so...
Check out the NLTK project. It is all in Python, and all of the big
book are written for learning to use the tools in Python. However, it
also contains a lot of talk about Natural Language Processing in
general.
http://www.nltk.org/book
I,
On Wed, Jul 28, 2010 at 2:58 PM, Daniel doubleagen...@gmail.com wrote:
I want to write a clojure program that searches for similarities of
words in the english language and places them in a graph, where the
distance between nodes indicates their similarity. I don't mean
syntactical
I've done quite a lot of work in this area, although not in clojure.
As Mark mentioned, wordnet is definitely a good place to start, but
it's short on proper nouns, which reduces the utility of this when
analyzing natural language. I ended up extending wordnet by data
mining wikipedia dumps. The
I think there were some talks about this on the conference I went to
recently. Keywords might be natural language processing. Linked is
the abstracts of the conference, which you might find some use in.
http://www.insna.org/PDF/Sunbelt/4_ProgramPDF.pdf
One alternative I briefly considered is to
As others have said, this is a difficult problem, but a fascinating
one too. I'm currently nibbling on building some grouping-by-
similarity algorithms for Clojure, although I'm sticking to numerical
criteria for similarity or distance. New developments in text
analysis and the Learning by Reading
I think that a big part of the problem is that most approaches to word
similarity (especially thesaurus-based approaches like Wordnet, but also the
significantly better distributional approaches) use very impoverished
representations of knowledge. As such, they are unable to make useful
I think that a big part of the problem is that most approaches to word
similarity (especially thesaurus-based approaches like Wordnet, but also the
significantly better distributional approaches) use very impoverished
representations of knowledge. As such, they are unable to make useful
On Thu, 2010-07-29 at 10:11 -0400, rob levy wrote:
Also, most of NLTK works in Jython*, and by extension in Jython
running in Clojure ( which is why I started writing a convenience
wrapper to make it easier to use python libraries:
http://code.google.com/p/clojure-python/ ).
*Actually
I want to write a clojure program that searches for similarities of
words in the english language and places them in a graph, where the
distance between nodes indicates their similarity. I don't mean
syntactical similarity. Related contextual meaning is closer to the
mark.
For instance: fish
Wordnet is the main existing thing that comes to mind as related to your
idea.
--
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be
This is a hard problem.
If you go by degrees and shades of synonymity, it can (and has been)
done manually - see Visual Thesaurus (http://
www.visualthesaurus.com/).
But for grouping based on the same semantic topics - that's pretty
difficult. You could do it based on co-location in a corpus,
On 7/28/10 5:34 PM, Mark Engelberg wrote:
Wordnet is the main existing thing that comes to mind as related to your
idea.
You might also want to look into Freebase. Here's a Clojure client you
can use to query their data. http://github.com/rnewman/clj-mql
signature.asc
Description: OpenPGP
A very good place to start searching about edit distances between words and
some related stuff can be found on Peter Norvigs site at:
http://norvig.com/spell-correct.html
Also, try to find some wikipedia articles about the bm25 ranking algorithm, I
used clojure for an assignment at school that
16 matches
Mail list logo