Try a singleton pattern or an static field.
Stefan
Michael Celona wrote:
I am creating new IndexSearchers... how do I cache my IndexSearcher...
Michael
-Original Message-
From: David Townsend [mailto:[EMAIL PROTECTED]
Sent: Friday, February 18, 2005 11:00 AM
To: Lucene Users List
Subject:
+ the lucene in action book. :-)
+ scholar.google.com
+ acm.org ir group
+ ieee.org has ir group as well
may you will find http://searchenginewatch.com/ useful as well.
HTH
Stefan
Am 26.01.2005 um 23:18 schrieb Xiaohong Yang ((Sharon)):
Hi all,
I am looking for good review articles or books regar
Hi,
do you optimize the index?
Do you tried to implement a own hit collector?
Stefan
Am 25.01.2005 um 01:01 schrieb Peter Hollas:
I am working on a public accessible Struts based species database
project where the number of species names is currently at 2.3 million,
and in the near future will be
Do you know:
http://websom.hut.fi/websom/comp.ai.neural-nets-new/html/root.html ?
Interesting - is there any code avail to draw the maps?
The algorithm is described here;
http://www.cis.hut.fi/research/som-research/book/
A short summary and some sample code is available here:
http://davis.wpi.edu/
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/
MultiSearcher.html
100% Right.
I personal found code samples more interesting then just java doc.
That why my hint, here the code snippet from nutch:
/** Construct given a number of indexed segments. */
public IndexSearcher(Fil
Possibly a silly question - but how would I go about searching multiple
indexes using lucene? Do I need to basically repeat the code I use to
search one index for each one, or is there a better way to do it?
Take a look to the nutch.org sourcecode. It does what you are searching
for.
HTH
Stefan
-
Dave,
cool stuff, think aboout to contribute that to nutch.. ;-)!
Do you know:
http://websom.hut.fi/websom/comp.ai.neural-nets-new/html/root.html ?
Cheers,
Stefan
Am 01.07.2004 um 23:28 schrieb David Spencer:
Inspired by these guys who put results from Google into a treemap...
http://google.hivegro
Hi,
I notice some thing strange: (1.4-rc4)
Until I add a empty text to my index:
where text is "" or null;
IndexWriter indexWriter = getIndexWriter();
document.add(Field.Text(Corpus.TEXT, text, true));
indexWriter.addDocument(document);
I see this in std.out: "No tvx file"
Furthermore IndexReade
Hi,
another question, but first many thanks for the last hint, the new term
frequency functionality of lucene is just GREAT! ;)
I have index a set of documents with different meta data, Language = DE
or Language = EN.
Now i wish to get Term frequencies for DE and EN. The easiest solution
would b
Hi,
sorry, a stupid question,
Is there a best practice to get the term vector of an document?
Is there any experience to do any kind of feature selection for
dimension reducing like zipf laws or getting tf/idf of a term for the
complete corpora.
Thanks for any hints.
Stefan
-
Hi Erik,
in case we will meet one time and I sure since the world is small, I
have to invite you to a beer! :-)
Thanks your suggestion works, recreating the index solved the problem...
Stefan
Am 11.06.2004 um 12:12 schrieb Erik Hatcher:
On Jun 11, 2004, at 5:51 AM, Stefan Groschupf wrote:
Hi
Hi,
I'm having a strange problem until upgrading lucene 1.3 to 1.4 rc4.
I'm using a third party component that include the old lucene 1.3 but i
need to run the new 1.4 rc 4 in the same vm.
So i unpack the component jar, remove all lucene 1.3 classes and repack
it again and just add the new lucene
Lucene can't help you.
Search for text classification or text clustering.
Browse the tools section @ www.text-mining.org there you will found may
be tools that can help you with this task.
In general some key words for your further search:
Feature extraction from text.
Data mining algorithms for
Am 30.01.2004 um 22:11 schrieb Stefan Groschupf:
JBoss Group http://jboss.org/
Does jboss really support maven?
Sorry, doing 2 things at the same time is not good.
Should be: "Does jboss really support lucene?"
Stefan
open technology: www.media-style.com
o
JBoss Group http://jboss.org/
Does jboss really support maven?
I think they are more focused on its j2ee server.
open technology: www.media-style.com
open source: www.weta-group.net
open discussion:www.text-mining.org
--
I will not, but I would work to get a degree from mit.edu. B-)
Just kidding, I wouldn't do that.
http://www.ai.mit.edu/research/sponsors/sponsors.shtml
Peace!
Stefan
I am willing as well.
Scott
On Jan 29, 2004, at 12:04 PM, Boris Goldowsky wrote:
Strangely, the web site does not seem to list
If you browse the cvs of nutch.org you will found an implementation.
HTH
Stefan
Am 10.01.2004 um 19:43 schrieb [EMAIL PROTECTED]:
Hi group,
would it be possible to implement a Analyser who filters HTML code out
of a
HTML page. As a result I would have only the text free of any tagging.
Is is m
Perhaps we'd better continue this on lucene-dev.
Ok, i will subscribe this list and request again.
Thanks!
Stefan
--
open technology: http://www.media-style.com
open source: http://www.weta-group.net
open discussion: http://www.text-mining.org
---
Ype,
It's a bug, and there is a fix for this in the latest CVS
near the end of the QueryParser.jj file:
// avoid boosting null queries, such as those caused by stop words
if (q != null) {
q.setBoost(f);
}
I had checked out the latest sources from public cvs. The posted cod
Damian Gajda wrote:
BTW. i may send You the partly working Lucene with Dmitrys code patched
in.
Yeah that would be very helpful.
Thanks!
--
open technology: http://www.media-style.com
open source: http://www.weta-group.net
open discussion: http://www.text-mining.org
--
Hi,
I notice something really strange.
I just tried the "document to query" thing with term frequencies and
term bosting based on the term frequence.
The code itself take may be 3 minutes, but i spend around 2 hours to
search a nullpointer exception i got in this line.
query = QueryParser.p
Damian Gajda wrote:
Hello I already have some experience with Dmitry's implementation.
Can you point me to Dmitry's code,so that i can take a look, i just had
read about it
Feel free to contact me.
I will do! ;)
Thanks!
Stefan
--
open technology: www.media-style.com
open source: www.w
A few people have asked, a few people have expressed interest.
I have to do some work for nutch but since I need the feature vector
stuff for an commercial project I will try to implement it.
Someone wish to join me??? ;)
Stefan
--
open technology: www.media-style.com
open source: w
Just to be sure since there was a lot of dicussion in the lists.
There is actually no solution available to get a term vector for a
document or a TF/IDF feature vector for a document, isn't it?
Some one had work on such things?
Some wish to work on such things?
Stefan
-
Hi Jing,
do you work on the task of document similarity?
I see nobody was answering your question.
To create a query out of an document would be very easy, but would it
provide well results?
Document term vectors would provide more possibilities to use different
data mining algorithms for cluste
Otis,
based on this discussion:
http://www.mail-archive.com/[EMAIL PROTECTED]/msg03350.html
Stefan
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Hi there,
is Damian patch in the cvs or latest lucene release.
Allow this patch to recieve a term vector of a document?
Thanks!
Stefan
--
open technology: www.media-style.com
open source: www.weta-group.net
open discussion: www.text-mining.org
--
Tun Lin wrote:
Anyone knows a search engine that supports xml formats?
http://jakarta.apache.org/lucene/docs/lucene-sandbox/
see SAX/ DOM XML demo.
--
open technology: www.media-style.com
open source: www.weta-group.net
open discussion: www.text-mining.org
-
Herb,
On Friday 14 November 2003 13:39, Chong, Herb wrote:
you're describing ad-hoc solutions to a problem that have an effect, but
not one that is easily predictable. one can concoct all sorts of
combinations of the query operators that would have something of the effect
that i am describing.
PA,
But Lucene is an low level indexing library.
I'm sure most people here will agree that lucene is much more than a
_low level_ indexing library.
May be it is just a library, but definitely the *highest level* search
technology available in the web for free.
You ride roughshod over the hard
really cool Stuff!!!
maurits van wijland wrote:
Hi All and Marc,
There is the carrot project :
http://www.cs.put.poznan.pl/dweiss/carrot/
The carrot system consists of webservices that can easily be fed by a lucene
resultlist. You simply have to create a JSP that creates this XML file and
create
Marcel Stor wrote:
Stefan Groschupf wrote:
Hi,
How is document clustering different/related to text categorization?
Clustering: try to find own categories and put documents that match
in it. You group all documents with minimal distance together.
Would I be correct to say
Hi,
How is document clustering different/related to text categorization?
Clustering: try to find own categories and put documents that match in it.
You group all documents with minimal distance together.
Classification: you have already categories and samples for it, that help you to match other
Hi Marc,
I'm working on it. Classification and Clustering as well.
I was planing doing it for nutch.org, but actually some guys there
breakup some important basic work I already had done, so may be i will
not contribute it there.
However it will be open source and I can notice you if something
Wouldn't mind joining in a joint approach, only problem is timing - it would
probably be late December before we could start putting the hours in.
We all do this just for fun, so no rush! However more people less work
for everybody, faster results.
We only need a generic API but i had done som
alternate solution for pdfs.
I'd be interested in knowing whether anyone is working on a pure java
solution that would give us a single method for handling ms office
documments / pdfs / etc.
Cheers
Pete
- Original Message -
From: "Stefan Groschupf" <[EMAIL PROTECTED]>
T
I had write to this list some days ago, to announce a possibility to
parse 182 file formats.
There was a tiny bug report some days ago, i hope i can fix it.
Browse the archive to figure out more.
Cheers
Stefan
Marcel Stor wrote:
Hi all,
I'm thinkin' about writing a search tool for my filesyste
Hi,
sorry a very stupid question does lucene zipf laws until indexing?
Thanks
Stefan
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Hi there,
just to let you know, i had implement for the nutch project a plugin
that can parse 182 file formats including m$ office.
I simply use open office and use the available java api.
It is really straight forward to use.
Found some info's and a link to the open source code here:
http://so
William W wrote:
Hi Folks,
Is there any Lucene best practice ?
www.nutch.org ;)
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Elsa Hernandez wrote:
Hi, I would like to know if someone has used Jmeter to prove/test the
performance of your web applications, or if someone could suggest a
tool/application that they have used. Thank you.
http://eclipsecolorer.sourceforge.net/index_profiler.html
Is the best i ever found.
Hi there,
I wish to run an pre analyzer that help me to choice the right analyzer
I wish to run on my stream.
For instance i wish to analyze the language of my text and choice then
an language dependent stop word remover.
Since it is a token stream an my language detection need the whole text
t
42 matches
Mail list logo