Hi Igor,
About your performance problem with SpanQueries and Payloads:
Try to filter with the corresponding BooleanQuery and use a profiler.
You have an IO-bottleneck because of reading position and payload
information per document.
Possible it would help if you first filter off the obviously
Hi Nariman,
In my understanding of ComplexPhraseQueryParser this class is not longer
supported.
http://issues.apache.org/jira/browse/LUCENE-1486#action_12782254
Instead with lucene 3.1 the new
org.apache.lucene.queryParser.standard.parser.StandardSyntaxParser will do
this job.
Hi,
I have a problem with the checkedRepeats in SloppyPhraseScorer.
This feature is for phrases like 1st word 2st word.
Without this feature the result would be the same as 1st word 2st.
OK
But I have an Index with more then one token on the same position.
The german sentence Die käuflichen
Hi Dave,
facets:
in you case a solution with one
int[IndexReader.maxDoc()]
fits. For each document number you can store an integer which represents the
facet value.
This is what org.apache.solr.request.UnInvertedField will store in your
case.
(*John* : is there something similar in
Hi David,
correct: you should avoid reading the content of a document inside a
hitcollector.
Normaly that means to cache all you need in main memory. Very simple and
fast is a facet with only 255 possible values and exactly one value per
document. In this case you need only an
Hi Dave,
searching and sorting in lucene are two separate functions (if you not want
to sort by relevance).
You will not loss performance if you first search with BitSet as
HitCollector and then sort the result by DateField.
But more easy is to extend TopFieldDocCollector/TopFieldCollector to a
Hi John,
I intended to compare xtf with hierarchical facet browsing in browseengine
(selection expansion).
I found PathFacetCountCollector/PathFacetHandler#getFacetsForPath, and I
think that the implementation in xtf has a lot of advantages.
So I suggest you to reuse the xtf-source for that
Hi ilwes,
Did you noticed the thread
http://www.nabble.com/Lucene-vs.-Database-td19755932.html
?
I think it is usefull for the question about using lucene storage fields
even if you already have the information in DB.
Best regards
Karsten
ilwes wrote:
Hello,
I googled, searched this
Hi Murali,
I think a search with 4 * 5 = 20 Boolean Clauses will not be a performance
problem
(at least if you have only one optimized index-folder).
You also could use one Field which contains content of all other fields with
a boost factor for each term (different boost for content from
Hi John,
I will take a look in the bobo-browse source code at week end.
Do you now the xtf implementation of faceted browsing:
starting point is
org.cdlib.xtf.textEngine.facet.GroupCounts#addDoc
?
(It works with millions of facet values on millions of hits)
What is the starting point in
hi glen,
possible you will find this thread interesting:
http://groups.google.com/group/xtf-user/browse_thread/thread/beb62f5ff9a16a3a/16044d1009511cda
was about a taxonomy like in your example.
Also take a look to the faceted browsing on date in
Hi Dipak,
Which kind of Taxonomy?
Where is the difference to faceted browsing in your case?
best regards
Karsten
Kesarkar, Dipak wrote:
Hi
I want to include Taxonomy feature in my search.
Does Lucene support Taxonomy? How?
If not, is there in different way to add Taxonomy
Hi buFka,
take a look to
http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
e.g. your example does not set mergeFactor or RAMBufferSizeMB
I also like the last tip: Run a Java profiler
Because in my case, the leak of performance vanished after I switched from
jdom to saxon.
(we are
Hi Zender,
please take a look to
http://www.nabble.com/Lucene-vs.-Database-td19755932.html#a19757274
you shouldn't use a lucene fields to store such huge data. At least not a
lucene field in your main search index.
You can use lucene as repository, but I would advice you to use a extra
index
Hi csantos,
most possible this is not about lucene:
http://java.sun.com/j2se/1.4.2/docs/api/java/lang/AbstractMethodError.html
GermanAnalyser ist not part of normal lucene jar (it is part of
lucene-analyzers).
In application server the position of jar files can be important.
Please try your
Hi Ohsang,
are you looking for
http://lucene.apache.org/java/2_4_0/fileformats.html
?
Best regards
Karsten
Kwon, Ohsang wrote:
I want to know how the lucene stored the data in the index internally.
(Lucene`s index format changed very often.)
I can not find this information in
Hi Blured,
if you are asking about integration of lucene and a DBMS, possible compass
is something for you
http://www.nabble.com/Lucene-vs.-Database-tp19755932p19758736.html
if you think about using hibernate: I think there already exist a lucene
connector, so you don't have to use jdbc.
if
Hi Blured,
sorry I don't know anything about eclipse birt.
I recommend to start a new thread eclipse birt with lucene where you
describe your problem again in detail.
be aware that lucene don't know numerical values. lucene only knows strings.
best regards
Karsten
blured blured wrote:
Hi Chris,
most likely this is not a lucene problem.
You looked with luke in the stored fields of your document?
Please take a second look with luke in the terms of your field 'unique_id'
(with Show top terms):
What do you see?
Best regards
Karsten
btw: why do you use the prefix search? This
Hi Brian,
I don't know the internals of highlighting („explanation“) in lucene.
But I know that XTF (
http://xtf.wiki.sourceforge.net/underHood_Documents#tocunderHood_Documents5
) can handle very large documents (above 100 Mbyte) with highlighting very
fast. The difference to your approach is,
Hi spring,
unit of retrieval in lucene is a document.
There are no joins between document sets like in sql.
What you can do is to collect all hits for each term query on level of
folders and than implement the logical „and“ or „or“ by your own.
For this you could reuse the existing
Hi agatone,
I agree with markharw00 that highlighting is the main reason to store fields
in lucene.
I want to remind Sascha Fahl that the stored field in lucene are not inside
the inverted index-structure.
The implemention of stored fields is very simple:
A (.fdt)-file with the pairs
Hi Luther,
your question:
Is there a way to ask Lucene to search starting from a fixed position?
the anwer: no, not by standard search.
But you don't want to use your field for scoring. So this is a field to
filter results.
you could easily change RangeFilter for this purpose but the new
Hi Antony,
I decided first to delete all duplicates from master(iW) and then to insert
all temporary indices(other).
Any other opinions?
Best regards
Karsten
code
public static synchronized void merge(IndexWriter iW, Directory[] other,
final String uniqueID_FieldName) throws IOException{
queries on that? If
Lucene isn't the right tool for this job, maybe some other toolkit would
more useful(possibly on top of the Lucene)
Thanks in advance for any suggestions and comments. I would appreciate any
ideas and directions to look into.
On Tue, Sep 2, 2008 at 11:46 AM, Karsten F
Hi Markus,
hopefully someone will tell you the predefined Filter for this.
I only want to agree, that filter is the correct place for this, and that
you should be aware of the Token positions (after your filter you must have
two Tokens on the same position).
I think WordDelimitierFilter is a
Hi Leonid,
what kind of query is your use case?
Comlex scenario:
You need all the hierarchical structure information in one query. This means
you want to search with xpath in a real xml-Database. (like: All Documents
with a subtitle XY which contains directly after this subtitle a table with
Hi John,
I am not sure about the way Solr implements range query.
But it looks like, that Solr is using
org.apache.lucene.search.ConstantScoreRangeQuery
which itself is using
org.apache.lucene.search.RangeFilter
So Solr do not rewrite the query to a large Boolean SHOULD, but it is
reading all
Hi John,
about integration other index implementation:
Sounds like you need a DBMS with some lucene features.
There was a post about using lucene in Oracle:
http://www.nabble.com/Using-lucene-as-a-database...-good-idea-or-bad-idea--to18703473.html#a18741137
and
Hi David,
this is not true, please take a look to
IndexWriter#setRAMBufferSizeMB
and
IndexWriter#setMaxBufferedDocs
But you can produce 9 segments (each with only one document), if you call
IndexWriter#flush
or
IndexWriter#commit
after each addDocument
so from my knowledge about lucene there
Hi Bill,
you should not use prefix-query (*), because in first step lucene would
generate a list of all terms in this field, and than search for all this
terms. Which is senceless.
I would suggest to insert a new field myFields which contains as value the
names of all fields for this
Hi A.
starting point of xtf was the TEI format. I am very curious, if you find a
missing point for your needs.
(I already used it with cocoon.)
I never saw a better implementation of searching xml-aware: Each hit knows
his exact position inside the indexed(=source) xml-file :-)
I you dive into
hi Martin,
I think you are searching for
DuplicateFilter
http://www.nabble.com/how-to-get--all-unique--documents-based-on-keyword-feild-to18807014.html
best regards
Karsten
wysiecki wrote:
Hello,
thanks for help in advance.
my example docs:
two fileds company_id and content
Hi,
I want to agree with the advice of using only one index.
And I want to add two reasons:
1. Sorting and caching are working with the lucene-document-numbers.
In case of lucene warming up means that a lot of int-Arrays and bitsets
are stored in main memory.
If you using different MultiReader
Hi Nico Krijnen,
I think it is ok, to store a filter for each user-session im memory.
And I think that a cached filter is the correct approach for permissions.
(extra memory usage = one bit for each user and each document)
Hopefully someone with more experience will also answer your question.
Hi Grant,
you made mention of jackrabbit as example of storing data in lucene.
I did not find something like that in source-code. I found
LocalFileSystem and DatabaseFileSystem.
(I found lucene for indexing and searching.)
Have I overlooked something?
Best regards
Karsten
Grant
Hi Ganesh,
in this Thread nobody said, that lucene is a good storage server.
Only it could be used as storage server (Grant: Connect data storage with
simple, fast lookup and Lucene..)
I don't now about automatic rentention.
But for the rest in your list of features I suggest to take a deep
Hi Fayyaz,
again, this is about SAX-Handler not about lucene.
My understanding of what you want:
1. one lucene document for each SPEECH-Element (already implemented)
2. one lucene document for each SCENE-COMMENTARY-Element (not implemented
yet).
correct?
If yes, you can write
Hi Fayyaz,
From my point of view, this is not a lucene question.
If I understand your SAX-Handler correctly, you start a document with each
speech-start-Tag and you end this document with each lines-close-Tag.
So if you know that the SCENE-COMMENTARY Elements and the speech elements
are
Hi,
only to be sure:
You know IndexModifier.deleteDocument(int)?
It is deprecated, because you should use
IndexWriter.deleteDocuments(Term[]).
What do you mean with index is committed.
If you mean optimize() the document number will change (so there is a
side-effect;-)
best regards
Karsten
Hi,
my question: How did ebay solve this problem?
Take a look to the faceted browsing in the mark twain project:
http://www.marktwainproject.org/xtf/search?keyword=Berlinstyle=mtp
http://tinyurl.com/5cvb3c
This solution is open source and from the xtf project (they use lucene).
41 matches
Mail list logo