Hi,
I'am facing some problems in using Lucene. The index I am using is
constructed like this:
try {
Analyzer analyzer = new SnowballAnalyzer(Version.LUCENE_30, English);
Directory dir = MMapDirectory.open(index);
IndexWriter writer = new IndexWriter(dir, analyzer,
MaxFieldLength.LIMITED);
Hi Ian,
thank you for your quick response. I am running Lucene on Ubuntu 10.04, 64
bit. I switched from MMapDirectory to NIOFSDirectory without any significant
changes in performance. The Lucene version running is 3.0.2. I followed your
advice and opened the IndexSearcher after I added all
Hi,
is there a way to store additional metadata with fields?
My Problem is as follows:
I'm extracting extended html with tika. This extended html contains references
to pages, x,y values of the text etc. I want to be able to retrieve those
values when text was found while searching.
So when
Many times when you run a search for the first time it has to load all field
values IF the field is being sorted on. Subsequent searches use that cache
and are faster. Does that happen in your case? From your description it
doesn't look like you are sorting, although this kind of performance
Payload!!
2010/10/14 Christoph Hermann herm...@informatik.uni-freiburg.de
Hi,
is there a way to store additional metadata with fields?
My Problem is as follows:
I'm extracting extended html with tika. This extended html contains
references
to pages, x,y values of the text etc. I want to
OK, so it looks like we're down to a more general why is searching
slow question.
The number of docs is not very large by lucene standards.
Work through http://wiki.apache.org/lucene-java/ImproveSearchingSpeed.
If that still doesn't help, pick a slow query and post again with:
. the output of
Hey Guys
Whenever I try to view open issues in hudson it doesn't display any information.
Does anyone know why this is the case or how I could fix it?
Thanks in advance
-Dave Clarke
Am Donnerstag, 14. Oktober 2010, 12:29:43 schrieben Sie:
Hello,
is there a way to store additional metadata with fields?
Example:
I have the following content:
htmlbody
span page=1 x=1, y=1This is a very/span
span page=1 x=1, y=2interesting text./span
span page=2 x=1, y=1This is
Ok, I read the Wiki page related to improving the searching speed and adopted
some advices. One of the slow queries is simply. Here are some:
plaintext:guid
107.0 ms
resultSet.totalHits = 1
plaintext:allianc
51.0 ms
resultSet.totalHists = 1
plaintext:engin
46.0 ms
resultSet.totalHits = 1
Hello
I would like to store data retrieved hourly from RSS feeds in a database or in
Lucene so that the text can be easily
indexed for word frequencies.
I need to get the text from the title and description elements of RSS items.
Ideally, for each hourly retrieval from a given feed, I would
On Oct 14, 2010, at 10:17 AM, app...@dsl.pipex.com wrote:
Hello
I would like to store data retrieved hourly from RSS feeds in a database or
in Lucene so that the text can be easily
indexed for word frequencies.
I need to get the text from the title and description elements of RSS
I have two index, A and B. Can two documents doc1[in index A] and doc2[in
index B] have a common field? doc1 and doc2 have same document Id's.
Hey Grant,
Fair point on the next(). In this case I'm iterating through the terms returned
from a PrefixTermEnum so I know they're in the index.
The analyser I'm using looks like this:
public class TypeSavingAnalyzer extends StandardAnalyzer {
public TypeSavingAnalyzer(Version version) {
Background: I've been trying to enable hit highlighting of XML documents
in such a way that the highlighting preserves the well-formedness of the
XML.
I thought I could get this to work by implementing a CharFilter that
extracts text from XML (somewhat like HTMLStripCharFilter, except I am
No. And you don't even want to try... Document IDs are NOT invariant.
Particularly
when you delete a document and optimize an index, all the documents that
come
after the deleted one get new doc IDs. Trying to keep these two indexes in
synch
will be a nightmare.
Perhaps you could explain what
Hey Erick, Sure.
*
*
*What I am trying to achieve:*
A) Update a field in Index A
B) When searching for that old field, it should be a miss.
*How I achieved it*
*Index 1 *
Doc 1 - Field1, Value 1
Doc 2 - Field1, Value 1
*Index 2*
Doc 1 - Field1, Modified_Value 1
Doc 2 - EMPTY
Add index 2
This seems like far too much work if I'm reading things right. You can't
update
a field, but you #can# update a document which actually re-index that
document
under the covers (you have to have a way to uniquely identify the doc).
Then, when
you reopen your index reader, you'll only see the new
Any case where it would break?
If a query uses multiple fields it would break. That is, usually all the
fields need to be in doc in index 2 - not just the modified one.
On Fri, Oct 15, 2010 at 2:35 PM, Erick Erickson erickerick...@gmail.comwrote:
This seems like far too much work if I'm
18 matches
Mail list logo