CfsExtractor tool
-
Key: LUCENE-770
URL: https://issues.apache.org/jira/browse/LUCENE-770
Project: Lucene - Java
Issue Type: New Feature
Components: Index
Affects Versions: 2.1
Reporter: Otis
[
https://issues.apache.org/jira/browse/LUCENE-770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Otis Gospodnetic updated LUCENE-770:
Attachment: LUCENE-770.patch
CfsExtractor tool
-
Key:
[
https://issues.apache.org/jira/browse/LUCENE-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Otis Gospodnetic updated LUCENE-741:
Attachment: LUCENE-741.patch
Field norm modifier (CLI tool)
[
https://issues.apache.org/jira/browse/LUCENE-140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463872
]
Michael McCandless commented on LUCENE-140:
---
Phew! I'm glad we finally got to the bottom of this one.
[
https://issues.apache.org/jira/browse/LUCENE-140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463875
]
Michael McCandless commented on LUCENE-140:
---
Actually, this reminds me that, as of lockless commits,
[
https://issues.apache.org/jira/browse/LUCENE-140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless resolved LUCENE-140.
---
Resolution: Fixed
Fix Version/s: 2.1
Resolving this now, finally (I'll move
Hi Jeff,
Wondering if you (and/or others) would be interested in taking a look
at https://issues.apache.org/jira/browse/LUCENE-662 and vetting the
new interfaces, etc. to see if you could come up w/ a prototype
implementation. This would help move along 662 as it would sort out
some of
Hi,
What do people here think about adding forceOptimize() to IndexWriter?
public synchronized void forceOptimize() throws IOException {
flushRamSegments();
int minSegment = segmentInfos.size() - mergeFactor;
mergeSegments(minSegment 0 ? 0 : minSegment);
}
I need it
Marvin Humphrey wrote:
I've finished integrating the lockless commits concept into KinoSearch,
and I wanted to pop in and say that it's a very nice piece of work.
Real outside-the-box thinking -- or at least outside my box. :)
Nothing better than an innovation which solves long-standing
On Jan 11, 2007, at 6:48 AM, Michael McCandless wrote:
I too am happy that we have no more commit lock :)
Not just that. :)
No more lock directory, since we can put write.lock in the index
directory itself.
No more lock file name munging, since lock files from different
indexes no
[
https://issues.apache.org/jira/browse/LUCENE-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Artem Vasiliev updated LUCENE-769:
--
Attachment: StoredFieldSorting.patch
[PATCH] Performance improvement for some cases of sorted
[
https://issues.apache.org/jira/browse/LUCENE-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463996
]
Artem Vasiliev commented on LUCENE-769:
---
Renamed classes as Hoss proposed. Tried to hide
I agree with the boolean addition.
optimize(false) is a request to maybe optimize, optimize(true) always
should optimize to a single segment
optimize(false) might check some parameter as to the maximum number
of segments allowed before an actual optimize if performed.
On Jan 11, 2007,
I also got the same question. It seems it is very hard to efficiently do
phrase based query.
I think most search engines do phrase based query, or at least appear to be.
So, like in google, the query result must contain all the words user
searched on.
It seems to me that the impacted-sorted
[
https://issues.apache.org/jira/browse/LUCENE-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12464055
]
Chuck Williams commented on LUCENE-769:
---
Robert,
Could you attach your current implementation of reopen() as
[
https://issues.apache.org/jira/browse/LUCENE-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
robert engels updated LUCENE-769:
-
Attachment: IndexReaderUtils.java
[PATCH] Performance improvement for some cases of sorted
[
https://issues.apache.org/jira/browse/LUCENE-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12464056
]
robert engels commented on LUCENE-769:
--
The IndexReaderUtils I posted is not compilable - there are a few more
On Jan 11, 2007, at 2:30 PM, jian chen wrote:
It seems to me that the impacted-sorted list makes sense if you are
trying
to do pure vector space based ranking. This is from what I have
read from
the research papers. They all talk about how to optimize the vector
space
model using this
: I think we should deprecate the create argument to
: FSDirectory.getDirectory(*) and leave only the create argument in
: IndexWriter's constructors. Am I missing something? Is there are a
: reason not to do this?
i actual wonder about hte problem from the oposite direction: to me it
makes
: Chuck Williams commented on LUCENE-769:
: ---
:
: Robert,
:
: Could you attach your current implementation of reopen() as well? The
: attachment did not come through in your java-dev message today, or the
: one from 12/11. I'd like to look at an incremental
Yeah, I actually had:
public int segments() { return segmentInfos.size(); }
in my IndexReader, but then erased it precisely because I thought this was
exposing too much about the impl.
I think optimize(int) that Chris mentioned exposes too much. I thought about
having optimize(boolean force)
Doron,
Maybe my browser is misbehaving, but I don't see your comments in
http://issues.apache.org/jira/browse/LUCENE-741 . Didn't see the JIRA email
with them either...
Otis
- Original Message
From: Doron Cohen [EMAIL PROTECTED]
To: java-dev@lucene.apache.org
Sent: Thursday, January
Marvin,
Several posts back on this thread, I talked about an
algorithm of impact-sorted posting list for
conjunctive boolean query. Your concerns on
impact-sorting in boolean retrieval model is valid.
But practically, the approximation (as in my original
post) should work well enough for large
: I think optimize(int) that Chris mentioned exposes too much. I thought
: about having optimize(boolean force) in place of optimize(), but then
: we'd have to deprecate, so I opted for forceOptimize() that, I feel
: exposes a little less.
i have no strong feelings about exposing the number of
One day I read email in a different order, I miss replies like this.
If optimize(boolean force) looks more attractive than optimizeForce(), that's
fine by me. I just want to be able to force the cfs index, even if it's
already optimized, to expand. Getting it to have a single segment is just a
[
https://issues.apache.org/jira/browse/LUCENE-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12464105
]
Doron Cohen commented on LUCENE-741:
I was looking at what it would take to make this work with .nrm file as
Otis Gospodnetic [EMAIL PROTECTED] wrote on 11/01/2007 20:17:31:
Doron,
Maybe my browser is misbehaving, but I don't see your comments in
http://issues.apache.org/jira/browse/LUCENE-741 . Didn't see the
JIRA email with them either...
Otis
Otis, your browser is perfect, just that I was
[
https://issues.apache.org/jira/browse/LUCENE-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doron Cohen updated LUCENE-741:
---
Attachment: for.nrm.patch
Field norm modifier (CLI tool)
--
[
https://issues.apache.org/jira/browse/LUCENE-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doron Cohen updated LUCENE-741:
---
Attachment: (was: for.nrm.patch)
Field norm modifier (CLI tool)
--
[
https://issues.apache.org/jira/browse/LUCENE-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doron Cohen updated LUCENE-741:
---
Attachment: for.nrm.patch
Field norm modifier (CLI tool)
--
[
https://issues.apache.org/jira/browse/LUCENE-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12464109
]
Doron Cohen commented on LUCENE-741:
Attached for.nrm.patch was very noisy - so I replaced it with one created
On Jan 11, 2007, at 8:37 PM, Ming Lei wrote:
But practically, the approximation (as in my original
post) should work well enough for large corpus and
relevancy-driven retrieval.
The saving on disk access for large corpus (implies
very long posting list) will be huge by impact-sorted
posting
32 matches
Mail list logo