Hi all,
I'd like to do a very simple change to the idf computation, but I can't seem
to wrap my head around it.
There are very useful hints in the javadocs for "Changing Similarity" for
new tf() and lengthNorm() behavior, but it was a little bit blurrier for
idf()
http://lucene.apache.org/java/3_0
Hmm
My Analyzer is a Dictionary-based Analyzer. And so, it only recognizes
tokens in its dictionary. Adding every url (or domain) is not a viable
solution.
So how could I include that to my analyzer? Lucene Filter? FilterReader?
Thanks,
--
Franz Allan Valencia See | Java Software Engineer
Are you asking how to get lucene.apache.org out of
http://lucene.apache.org/ or how to get apache.org out of
lucene.apache.org? The getHost() method of java.net.URL will give you
the former. Or use a regexp. I don't know an easy way to do the
latter, but depending on your requirements you could s
How should I go about identifying the domain?
Thanks,
--
Franz Allan Valencia See | Java Software Engineer
franz@gmail.com
LinkedIn: http://www.linkedin.com/in/franzsee
Twitter: http://www.twitter.com/franz_see
On Fri, Jan 29, 2010 at 6:42 PM, Ian Lea wrote:
> Instead of playing around wi
Instead of playing around with tf/idf, how about just indexing and
searching the domain.
--
Ian.
On Fri, Jan 29, 2010 at 3:43 AM, Franz Allan Valencia See
wrote:
> Good day,
>
> I am currently using lucene for my searches. And one of the problems that Im
> facing is when keyword is a url. The
Good day,
I am currently using lucene for my searches. And one of the problems that Im
facing is when keyword is a url. The tokens such as http, https, ://, index,
html, etc seems to be messing up with our search results. The focus was
supposed to be only on the url domain.
The idea that I have i