Shai,
This is the code snippet I use inside my class...
public class MySorter extends Sorter {
@Override
public DocMap sort(AtomicReader reader) throws IOException {
final MapInteger, BytesRef docVsId = loadSortTerm(reader);
final Sorter.DocComparator comparator = new
Hi,
Thanks Uwe. I tried this path and I do not find any .cfs files.
Lucene 3 and Lucene 4 indexes do not necessarily always contain CFS files,
especially not if they are optimized. This depends on the merge policy. The
index upgrader uses the default one, which creates no CFS files for the
Hi,
We are in the process of upgrading from lucene 3.6.0 to lucene 4.7.2,
and our tests show a significant search degradation on Windows platform.
Trying to figure this out, here are a couple of points we noticed.
Any suggestions/thoughts will be greatly appreciated.
Thanks!
1) Running
I am afraid the DocMap still maintains doc-id mappings till merge and I am
trying to avoid it...
I think lucene itself has a MergeIterator in o.a.l.util package.
A MergePolicy can wrap a simple MergeIterator for iterating docs across
different AtomicReaders in correct sort-order for a given
I am afraid the DocMap still maintains doc-id mappings till merge and I am
trying to avoid it...
What do you mean 'till merge'? The method OneMerge.getMergeReaders() is
called only when the merge is executed, not when the MergePolicy decided to
merge those segments. Therefore the DocMap is
Therefore the DocMap is initialized only when the
merge actually executes ... what is there more to postpone?
Agreed. However, what I am asking is, if there is an alternative to DocMap,
will that be better? Plz read-on
And besides, if the segments are already sorted, you should return a
Hi,
Thanks again!
This time, I have indexed data with the following specs. I run into 40
seconds for the FastTaxonomyFacetCounts to create all the facets. Is this as
per your measurements? Subsequent runs fare much better probably because of the
Windows file system cache. How can I speed
Hi,
I'm migrating from lucene 4.6.1 to 4.8.1 and I noticed some Facet API
changes happened on 4.7.0 probably mostly related to this ticket:
http://issues.apache.org/jira/browse/LUCENE-5339
Here are few question about some customization/extension we did and
seem not having a direct
I used lucene 4.4 to create index for some documents. One of the indexing
fields is BinaryDocValuesField. After I change the dependency to lucene 4.5.
The index size for 1 million documents increases from 293MB to 357MB. If I did
not use BinaryDocValuesField, the index size increases only about
Again, because merging is based on byte size, you have to be careful how
you measure (hint: use LogDocMergePolicy).
Otherwise you are comparing apples and oranges.
Separately, your configuration is using experimental codecs like
disk/memory which arent as heavily benchmarked etc as the default
Hi
40 seconds for faceted search is ... crazy. Also, note how the times don't
differ much even though the number of hits is much higher (29K vs 15.1M)
... That, w/ that you say that subsequent queries are much faster (few
seconds) suggests that something is seriously messed up w/ your
OK I think I now understand what you're asking :). It's unrelated though to
SortingMergePolicy. You propose to do the merge part of a merge-sort,
since we know the indexes are already sorted, right?
This is something we've considered in the past, but it is very tricky (see
below) and we went with
That said... if we generate the global DocMap up front, there's no reason
to not execute the merge of the segments more efficiently, i.e. without
wrapping them in a SlowCompositeReaderWrapper.
But that's not work for SortingMergePolicy, it's either a special
SortingAtomicReader which wraps a
Hi,
Thanks for your response. It does sound pretty bad which is why I am not sure
whether there is an issue with the code, the index, the searcher, or just the
machine, as you say.
I will try with another machine just to make sure and post the results.
Meanwhile, can you tell me if there is
Nothing suspicious ... code looks fine. The call to FastTaxoFacetCounts
actually computes the counts ... that's the expensive part of faceted
search.
How big is your taxonomy (number categories)?
Is it hierarchical (i.e. are your dimensions flat, or deep like A/1/2/3/)?
What does your
Hi,
I'm experience a puzzling behaviour with the QueryParser and was hoping
someone around here can help me.
I have a very simple Analyzer that tries to replace forward slashes (/) by
spaces. Because QueryParser forces me to escape strings with slashes before
parsing, I added a MappingCharFilter
If I am counting correctly, the $facets field in the index shows a count of
approx. 28k. That does not sound like much, I guess. All my facets are flat and
the FacetsConfig only defines a couple of them to be multi-valued.
Let me know if I am not counting the taxonomy size correctly. The
You can get the size of the taxonomy by calling taxoReader.getSize(). What
does the 28K of the $facets field denote - the number of terms
(drill-down)? If so, that sounds like your taxonomy is of that size.
And indeed, this is a tiny taxonomy ...
How many facets do you record per document? This
Yeah, this is kind of tricky and confusing! Here's what happens:
1. The query parser parses the input string into individual source terms,
each delimited by white space. The escape is removed in this process, but...
no analyzer has been called at this stage.
2. The query parser (generator)
19 matches
Mail list logo