See http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
and http://wiki.apache.org/lucene-java/ImproveSearchingSpeed
and http://wiki.apache.org/lucene-java/ConcurrentAccessToIndex
Murat Yakici
Department of Computer Information Sciences
University of Strathclyde
Glasgow, UK
Yes, I agree with you - I also tried this approach in the past and it was
terribely slow - looping on the term vectors.
What I have done - is dividing indexes into steps - which of course, if can
be avoided, it will be more than great!!
As for my problem - it was a code problems, I sloved it,
Paul Taylor wrote:
Hi I was using a RAMDirectory and this was working fine but have now
moved over to a filesystem directory to preserve space, the directory
is just initialized once
directory = new RAMDirectory();
directory =
Thanks a lot!
I will deffenitely will go thoroughly over these
2009/4/27 Murat Yakici murat.yak...@cis.strath.ac.uk
See http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
and http://wiki.apache.org/lucene-java/ImproveSearchingSpeed
and
hi,
I am trying to perform a search using Lucene. The keyword : national india
This phrase exists inside the content. I try searching it using Lucene and
it fail to return any results. Then I try to search it using Luke, with the
quotes and it also fail to return results.
Why is that happening?
What does query.toString() say? Are you using standard analyzers with
standard lowercasing, stop words etc?
Knocking up a very simple program/index that demonstrates the problem
usually helps: either it will work and help you spot the problem with
your existing code, or if you post it here
It's fine if one thread is changing the index (w/ IndexWriter) while
other threads are opening IndexReaders.
Though, if you have IndexWriter opening with create=true at the same
time that an IndexReader is attempting to open the index, that would
explain this. Is it possible that's happening?
Michael McCandless wrote:
It's fine if one thread is changing the index (w/ IndexWriter) while
other threads are opening IndexReaders.
Though, if you have IndexWriter opening with create=true at the same
time that an IndexReader is attempting to open the index, that would
explain this. Is it
Another possible factor, if you are using omitTf feature, it causes
phrase query doesn't work.
Koji
Ian Lea wrote:
What does query.toString() say? Are you using standard analyzers with
standard lowercasing, stop words etc?
Knocking up a very simple program/index that demonstrates the
hi
thanks for your reply. Please suggest me what to do now.
i want to index the document which contains multiple languages.
I really waiting for this to complete with your help.
Please,please help me
Erick Erickson wrote:
I'm puzzled why you say
By the above out put we can say that
Hello all,
following the link to SemanticVectors - related research there is this
link:-
Magnus Sahlgren. An introduction to random
indexing.http://www.sics.se/%7Emange/papers/RI_intro.pdf
I would like to point out that Magnus Sahlgren has completed a PHd in this
area which is both very readable
Hi,
I have built an index using the standardAnalyser, and am now querying
that index, also with standardAnalyser. However, I get results I don't
understand.
I know there are a few documents about Brazil in my corpus. My corpus
is in Dutch, and the Dutch term used is Braziliƫ.
I query
Well, you can always implement your own HitCollector and just take
the end of the list.
But perhaps a fuller explanation of why you need to do this would
lead to a better answer
Best
Erick
On Sun, Apr 26, 2009 at 11:41 PM, samd sdoyl...@yahoo.com wrote:
I have 2500 documents and need to
Hi
The problem may well lie with the reading of the queries from disk
into your program.
Using an InputStreamReader with correct encoding (UTF-8?) should solve it.
--
Ian.
On Mon, Apr 27, 2009 at 12:32 PM, Laura Hollink lau...@cs.vu.nl wrote:
Hi,
I have built an index using the
Because no matter how low the rank they should still be available in search
results for our case.
For example, if you want to search based on an event name. It shouldn't
matter if an event name is of low rank or not, you still want to find a
match.
Erick Erickson wrote:
Well, you can always
One more thing, it's not just about ranking pieces, it's about all no matter
what the rank should be available.
Erick Erickson wrote:
Well, you can always implement your own HitCollector and just take
the end of the list.
But perhaps a fuller explanation of why you need to do this would
Thanks.
I have a couple quick questions.
1. when I call DocSetFactory.getDocSetInstance(0, -1, -1,
DocSetFactory.FOCUS.SPACE);
with very large numbers instead of -1, I get OpenBitSets back, which I
found surprising (I get p4 sets back when I pass in the -1 values).
Also, is there any
I'm really having a hard time understanding what your requirements
are. To have all the results available for event, just search on that
field (assuming you have an event field for each of your docs).
Use a HitCollector to get all of the rather than the (deprecated)
Hits object, I'd suggest
I am looking into setting custom scoring sing Similarity
(org.apache.lucene.search.Similarity) . Searcher has a method to set
similarity as follows - searcher.setSimilarity(Similarity) and
retrieve the same too. I am looking at a case where I can have just
one Searcher instance and use different
As it turns out it was more an issue with the tokenization and how fields
were being stored which contain numeric values. I need to be sure that
fields are tokenized on numeric values as well.
Thanks
Erick Erickson wrote:
I'm really having a hard time understanding what your requirements
Hi everyone,
There has been a lot of discussion regarding Lucene+NFS pitfalls. I'm
not sure how to proceed with a more distributed operation.
I'm trying to take the indexing load off of our search server. I can do
this either by building a new server which hosts the Indexer and the
Index, or a
In theory either solution will work, from Lucene's standpoint. But
this is not well-explored territory: you are a pioneer! Please report
back on your results :)
Usually (and this may be different in your app), search performance
trumps indexing performance, so my guess is you'd want the index
What is the best way to handle this sort of situation? My inclination
is
build a new Search Server (with fast HDDs and lots of Memory for
tomcat)
and leave the indexer on the old server connected via NFS.
- Our current development is on similar lines. Almost no deletes, but
only lots of
Thanks Otis!
I've got java APIs that could be used.
Sincerely,
Sithu D Sudarsan
Off: 301-796-2587
sithu.sudar...@fda.hhs.gov
sdsudar...@ualr.edu
-Original Message-
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
Sent: Friday, April 24, 2009 12:09 PM
To:
Hi--
I just got kamikaze somewhat integrated into a project of mine. I'm
having problems growing the DocIdSets, though. Up to the point where the
first regrow happens, everything is fine. Once the regrow happens, I get
an ArrayOutOfBoundsException. The following unit test will exhibit this
Hi all.
One of our users has seen an error like this:
java.lang.ArrayIndexOutOfBoundsException: -1030685
at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:210)
at org.apache.lucene.index.SegmentTermDocs.seek(SegmentTermDocs.java:54)
at
When i print out the query, it will be like this:
(url:terror india^2.0 anchor:terror india^0.0 content:terror india
title:terror india^1.5 host:terror india^2.0 site:terror india^10.0)
I dont understand at all, only phrase query got problem, even my sloop has
no problem at all. I have exact
27 matches
Mail list logo