[
https://issues.apache.org/jira/browse/LUCENE-600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12749450#action_12749450
]
Chuck Williams commented on LUCENE-600:
---
I contributed the first patch to make flush
[
https://issues.apache.org/jira/browse/LUCENE-600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12749656#action_12749656
]
Chuck Williams commented on LUCENE-600:
---
The version attached here is from over 3
[
https://issues.apache.org/jira/browse/LUCENE-600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12749660#action_12749660
]
Chuck Williams commented on LUCENE-600:
---
Erratum: deletion changes doc-id's
[
https://issues.apache.org/jira/browse/LUCENE-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12544055
]
Chuck Williams commented on LUCENE-1052:
I agree a general configuration system would be much better. Doug
[
https://issues.apache.org/jira/browse/LUCENE-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12544136
]
Chuck Williams commented on LUCENE-1052:
I can report that in our application having a formula is critical
[
https://issues.apache.org/jira/browse/LUCENE-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chuck Williams updated LUCENE-1052:
---
Attachment: termInfosConfigurer.patch
termInfosConfigurer.patch extends
[
https://issues.apache.org/jira/browse/LUCENE-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12543383
]
Chuck Williams commented on LUCENE-1052:
I believe this needs to be a formula as a reasonable bound
[
https://issues.apache.org/jira/browse/LUCENE-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12543306
]
Chuck Williams commented on LUCENE-1052:
Michael, thanks for creating an excellent production version
Doug Cutting wrote on 11/07/2007 09:26 AM:
Hadoop's MapFile is similar to Lucene's term index, and supports a
feature where only a subset of the index entries are loaded
(determined by io.map.index.skip). It would not be difficult to add
such a feature to Lucene by changing
Hi All,
We are experiencing OOM's when binary data contained in text files
(e.g., a base64 section of a text file) is indexed. We have extensive
recognition of file types but have encountered binary sections inside of
otherwise normal text files.
We are using the default value of 128 for
Type: Bug
Components: Index
Affects Versions: 2.0.1
Environment: Windows Server 2003
Reporter: Chuck Williams
In testing a reboot during active indexing, upon restart this exception
occurred:
Caused by: java.io.IOException: term out of order
(ancestorForwarders
How about a direct solution with a reference count scheme?
Segments files could be reference-counted, as well as individual
segments either directly, possibly by interning SegmentInfo instances,
or indirectly by reference counting all files via Directory.
The most recent checkpoint and snapshot
I don't see how to do commits without at least some new methods.
There needs to be some way to roll back changes rather than committing
them. If the commit action is IndexWriter.close() (even if just an
interface) the user still needs another method to roll back.
There are reasons to close an
Grant Ingersoll wrote on 01/17/2007 01:42 AM:
Also, I'm curious as to how many people use NFS in live systems.
I've got the requirement to support large indexes and collections of
indexes on NAS devices, which from linux pretty much means NFS or CIFS.
This doesn't seem unusual.
Chuck
[
https://issues.apache.org/jira/browse/LUCENE-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465240
]
Chuck Williams commented on LUCENE-756:
---
I may have the only app that will be broken by the 10-day backwards
Yonik Seeley wrote on 01/16/2007 11:29 AM:
On 1/16/07, robert engels [EMAIL PROTECTED] wrote:
You have the same problem if there is an existing reader open, so
what is the difference? You can't remove the segments there either.
The disk space for the segments is currently removed if no one
robert engels wrote on 01/15/2007 08:01 AM:
Is your parallel adding code available?
There is an early version in LUCENE-600, but without the enhancements
described. I didn't update that version because it didn't capture any
interest and requires Java 1.5 and so it seems will not be committed.
Ning Li wrote on 01/15/2007 06:29 PM:
On 1/14/07, Michael McCandless [EMAIL PROTECTED] wrote:
* The support deleteDocuments in IndexWriter (LUCENE-565) feature
could have a more efficient implementation (just like Solr) when
autoCommit is false, because deletes don't need to be
.
The alternative of a UID-based BooleanQuery would have similar
challenges unless the postings were sorted by UID. But hey, that's
permanent doc-ids.
Chuck
On Jan 15, 2007, at 11:49 PM, Chuck Williams wrote:
My interest is transactions, not making doc-id's permanent.
Specifically
[
https://issues.apache.org/jira/browse/LUCENE-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12464055
]
Chuck Williams commented on LUCENE-769:
---
Robert,
Could you attach your current implementation of reopen
[
https://issues.apache.org/jira/browse/LUCENE-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463729
]
Chuck Williams commented on LUCENE-769:
---
The test case uses only tiny documents, and the reported timings
[
https://issues.apache.org/jira/browse/LUCENE-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463322
]
Chuck Williams commented on LUCENE-767:
---
Isn't maxDoc always the same as the docCount of the segment, which
[
https://issues.apache.org/jira/browse/LUCENE-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12462122
]
Chuck Williams commented on LUCENE-510:
---
Has an improvement been made to eliminate the reported 20% indexing
[
http://issues.apache.org/jira/browse/LUCENE-762?page=comments#action_12461460 ]
Chuck Williams commented on LUCENE-762:
---
Hi Grant,
Maybe even better would be to have an appropriate method on
FieldSelectorResult. E.g
: Store
Affects Versions: 2.1
Reporter: Chuck Williams
Sometimes an application would like to know how large a document is before
retrieving it. This can be important for memory management or choosing between
algorithms, especially in cases where documents might be very large
[ http://issues.apache.org/jira/browse/LUCENE-762?page=all ]
Chuck Williams updated LUCENE-762:
--
Attachment: SizeFieldSelector.patch
[PATCH] Efficiently retrieve sizes of field values
[
http://issues.apache.org/jira/browse/LUCENE-754?page=comments#action_12459763 ]
Chuck Williams commented on LUCENE-754:
---
Cool! This should solve at least part of my problem. Trying this now (along
with finalizer removal patch
[
http://issues.apache.org/jira/browse/LUCENE-754?page=comments#action_12459791 ]
Chuck Williams commented on LUCENE-754:
---
This patch, together with LUCENE-750 (already committed) solved our problem
completely. It sped up simultaneous
)
org.apache.lucene.index.SegmentTermDocs.init(SegmentTermDocs.java:45)
Thanks,
Chuck
Chuck Williams wrote on 12/15/2006 08:22 AM:
Yonik and Robert, thanks for the suggestions and pointer to the patch!
We've looked at the synchronization involved with finalizers and don't
see how it could cause the issue as running
Hi All,
I've had a bizarre anomaly arise in an application and am wondering if
anybody has ever seen anything like this. Certain queries, in not easy
to reproduce cases, take 15-20 minutes to execute rather than a few
seconds. The same query is fast some times and anomalously slow
others. This
Surprising but it looks to me like a bug in Java's collation rules for
en-US. According to
http://developer.mimer.com/collations/charts/UCA_latin.htm, \u00D8
(which is Latin Capital Letter O With Stroke) should be before U,
implying -1 is the correct result. Java is returning 1 for all
strengths
Thanks Ning. This is all very helpful. I'll make sure to be consistent
with the new merge policy and its invariant conditions.
Chuck
Ning Li wrote on 12/05/2006 08:01 AM:
An old issue (http://issues.apache.org/jira/browse/LUCENE-325 new
method expungeDeleted() added to IndexWriter)
Mike Klaas wrote on 12/05/2006 11:38 AM:
On 12/5/06, negrinv [EMAIL PROTECTED] wrote:
Chris Hostetter wrote:
If the code was not already in the core, and someone asked about
adding it
I would argue against doing so on the grounds that some helpfull
utility
methods (possibly in a
Hi All,
I'd like to open up the API to mergeSegments() in IndexWriter and am
wondering if there are potential problems with this.
I use ParallelReader and ParallelWriter (in jira) extensively as these
provide the basis for fast bulk updates of small metadata fields.
ParallelReader requires that
Michael Busch wrote on 11/22/2006 08:47 AM:
Ning Li wrote:
A possible design could be:
First, in addDocument(), compute the byte size of a ram segment after
the ram segment is created. In the synchronized block, when the newly
created segment is added to ramSegmentInfos, also add its byte
[ http://issues.apache.org/jira/browse/LUCENE-709?page=all ]
Chuck Williams updated LUCENE-709:
--
Attachment: ramDirSizeManagement.patch
This one should be golden as it addresses all the issues that have been raised
and I believe the syncrhonization
[
http://issues.apache.org/jira/browse/LUCENE-723?page=comments#action_12451849 ]
Chuck Williams commented on LUCENE-723:
---
+1
With this could also come negative-only queries, e.g.
-foo
as a shortcut for
*:* -foo
QueryParser support
[ http://issues.apache.org/jira/browse/LUCENE-709?page=all ]
Chuck Williams updated LUCENE-709:
--
Attachment: ramDirSizeManagement.patch
I've just attached my version of this patch. It includes a multi-threaded test
case. I believe it is sound
[
http://issues.apache.org/jira/browse/LUCENE-709?page=comments#action_12450894 ]
Chuck Williams commented on LUCENE-709:
---
I didn't see Yonik's new version or comments until after my attach.
Throwing IOExceptions when files that should
[
http://issues.apache.org/jira/browse/LUCENE-709?page=comments#action_12450260 ]
Chuck Williams commented on LUCENE-709:
---
Not synchronizing on the Hashtable, even if using an Enumerator, creates
problems as the contents of the hash table
[
http://issues.apache.org/jira/browse/LUCENE-709?page=comments#action_12450301 ]
Chuck Williams commented on LUCENE-709:
---
I hadn' t considered the case of such large values for maxBufferedDocs, and
agree that the loop execution time
Doug Cutting wrote on 11/13/2006 10:50 AM:
Chuck Williams wrote:
I followed this same logic in ParallelWriter and got burned. My first
implementation (still the version submitted as a patch in jira) used
dynamic threads to add the subdocuments to the parallel subindexes
simultaneously
[
http://issues.apache.org/jira/browse/LUCENE-709?page=comments#action_12448923 ]
Chuck Williams commented on LUCENE-709:
---
Mea Culpa! Bad bug on my part. Thanks for spotting it!
I believe the solution is simple. RAMDirectory.files
[ http://issues.apache.org/jira/browse/LUCENE-709?page=all ]
Chuck Williams updated LUCENE-709:
--
Attachment: ramDirSizeManagement.patch
[PATCH] Enable application-level management of IndexWriter.ramDirectory size
Hi All,
Does anybody have experience dynamically varying maxBufferedDocs? In my
app, I can never truncate docs and so work with maxFieldLength set to
Integer.MAX_VALUE. Some documents are large, over 100 MBytes. Most
documents are tiny. So a fixed value of maxBufferedDocs to avoid OOM's
is
wrote on 11/09/2006 08:37 AM:
On 11/9/06, Chuck Williams [EMAIL PROTECTED] wrote:
My main concern is that the mergeFactor escalation merging logic will
somehow behave poorly in the presence of dynamically varying initial
segment sizes.
Things will work as expected with varying segments sizes
Yonik Seeley wrote on 11/09/2006 08:50 AM:
For best behavior, you probably want to be using the current
(svn-trunk) version of Lucene with the new merge policy. It ensures
there are mergeFactor segments with size = maxBufferedDocs before
triggering a merge. This makes for faster indexing in
Chuck Williams wrote on 11/09/2006 08:55 AM:
Yonik Seeley wrote on 11/09/2006 08:50 AM:
For best behavior, you probably want to be using the current
(svn-trunk) version of Lucene with the new merge policy. It ensures
there are mergeFactor segments with size = maxBufferedDocs before
triggered when
bufferedDocs==maxBufferedDocs *or* the size of the bufferedDocs =
maxBufferSize. I made these changes based on the new merge policy
Yonik mentioned, so if anyone is interested I could open a Jira issue
and submit a patch.
- Michael
Yonik Seeley wrote:
On 11/9/06, Chuck Williams
Michael Busch wrote on 11/09/2006 09:56 AM:
This sounds good. Michael, I'd love to see your patch,
Chuck
Ok, I'll probably need a few days before I can submit it (have to code
unit tests and check if it compiles with the current head), because
I'm quite busy with other stuff right now.
Issue Type: Improvement
Components: Index
Affects Versions: 2.0.1
Environment: All
Reporter: Chuck Williams
IndexWriter currently only supports bounding of in the in-memory index cache
using maxBufferedDocs, which limits it to a fixed number of documents
[ http://issues.apache.org/jira/browse/LUCENE-709?page=all ]
Chuck Williams updated LUCENE-709:
--
Attachment: ramDirSizeManagement.patch
[PATCH] Enable application-level management of IndexWriter.ramDirectory size
Doug Cutting wrote on 11/03/2006 12:18 PM:
Chuck Williams wrote:
Why would a thread pool be more controversial? Dynamically creating and
garbaging threads has many downsides.
The JVM already pools native threads, so mostly what's saved by thread
pools is the allocation initialization
Chris Hostetter wrote on 11/03/2006 09:40 AM:
: Is there any timeline for when Java 1.5 packages will be allowed?
I don't think i'll incite too much rioting to say no there is no
timeline
.. I may incite some rioting by saying my guess is 1.5 packages will be
supported when the patches
Vic Bancroft wrote on 10/17/2006 02:44 AM:
In some of my group's usage of lucene over large document collections,
we have split the documents across several machines. This has lead to
a concern of whether the inverse document frequency was appropriate,
since the score seems to be dependant on
David Balmain wrote on 10/10/2006 08:53 PM:
On 10/11/06, Chuck Williams [EMAIL PROTECTED] wrote:
I personally would always store term vectors since I use a
StandardTokenizer and Stemming. In this case highlighting matches in
small documents is not trivial. Ferret's highlighter matches even
Reuven Ivgi wrote on 10/02/2006 09:32 PM:
I want to divide a document to paragraphs, still having proximity search
within each paragraph
How can I do that?
Is your issue that you want the paragraphs to be in a single document,
but you want to limit proximity search to find matches only
?
Thanks in advance
Reuven Ivgi
-Original Message-
From: Chuck Williams [mailto:[EMAIL PROTECTED]
Sent: Tuesday, October 03, 2006 10:58 AM
To: java-dev@lucene.apache.org
Subject: Re: Define end-of-paragraph
Reuven Ivgi wrote on 10/02/2006 09:32 PM:
I want to divide a document
and the recovery code forgot to turn that off prior to the
optimize! Thus a .cfs file was created, which confused the bulk updater
-- it did not see a segment that was inside the cfs.
Sorry for the false alarm and thanks to all who helped with the original
question/concern,
Chuck
Chuck Williams wrote
Paul Elschot wrote on 09/10/2006 09:15 PM:
On Monday 11 September 2006 02:24, Chuck Williams wrote:
Hi All,
An application of ours under development had a memory link that caused
it to slow interminably. On linux, the application did not response to
kill -15 in a reasonable time, so
robert engels wrote on 09/11/2006 07:34 AM:
A kill -9 should not affect the OS's writing of dirty buffers
(including directory modifications). If this were the case, massive
system corruption would almost always occur every time a kill -9 was
used with any program.
The only thing a kill -9
some custom code for
index writing? (Maybe the NewIndexModified stuff)? Possibly there is
an issue there. Do you maybe have your own cleanup code that attempts
to remove unused segments from the directory? If so, that appears to
be the likely culprit to me.
On Sep 11, 2006, at 2:56 PM, Chuck
Hi All,
An application of ours under development had a memory link that caused
it to slow interminably. On linux, the application did not response to
kill -15 in a reasonable time, so kill -9 was used to forcibly terminate
it. After this the segments file contained a reference to a segment
I presume your search steps are anded, as in typical drill-downs?
From a Lucene standpoint, each sequence of steps is a BooleanQuery of
required clauses, one for each step. To add a step, you extend the
BooleanQuery with a new clause. To not re-evaluate the full query,
you'd need some query
Andrzej Bialecki wrote on 08/28/2006 09:19 AM:
Chuck Williams wrote:
I presume your search steps are anded, as in typical drill-downs?
From a Lucene standpoint, each sequence of steps is a BooleanQuery of
required clauses, one for each step. To add a step, you extend the
BooleanQuery
Issue Type: Bug
Components: Analysis
Affects Versions: 2.0.1, 2.1
Environment: Any
Reporter: Chuck Williams
Attachments: PerFieldAnalyzerWrapper.patch
The attached patch causes PerFieldAnalyzerWrapper to delegate calls to
getPositionIncrementGap
Hi All,
There is a strange treatment of positionIncrementGap in
DocumentWriter.invertDocument().The gap is inserted between all
values of a field, except it is not inserted between values if the
prefix of the value list up to that point has not yet generated a token.
For example, if a field
Chris Hostetter wrote on 08/11/2006 09:08 AM:
(using lower case
to indicate no tokens produced and upper case to indicate tokens were
produced) ...
1) a b C _gap_ D ...results in: C _gap_ D
2) a B _gap_ C _gap_ D ...results in: B _gap_ C _gap_ D
3) A _gap_ b _gap_ c
I have built such a system, although not with Lucene at the time. I
doubt you need to modify anything in Lucene to achieve this.
You may want to index words, stems and/or concepts from the ontology.
Concepts from the ontology may relate to words or phrases. Lucene's
token structure is
, e-mail: [EMAIL PROTECTED]
--
*Chuck Williams*
Manawiz
Principal
V: (808)885-8688
C: (415)846-9018
[EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
Skype: manawiz
AIM: hawimanawiz
Yahoo: jcwxx
-
To unsubscribe, e-mail: [EMAIL
David Balmain wrote on 07/10/2006 01:04 AM:
The only problem I could find with this solution is that
fields are no longer in alphabetical order in the term dictionary but
I couldn't think of a use-case where this is necessary although I'm
sure there probably is one.
So presumably fields are
Chris Hostetter wrote on 07/10/2006 02:06 AM:
As near as i can tell, the large issue can be sumarized with the following
sentiment:
Performance gains could be realized if Field
properties were made fixed and homogeneous for
all Documents in an index.
This is certainly
Yonik Seeley wrote on 07/10/2006 09:27 AM:
I'll rephrase my original question:
When implementing NewIndexModifier, what type of efficiencies do we
get by using the new protected methods of IndexWriter vs using the
public APIs of IndexReader and IndexWriter?
I won't comment on Ning's
[
http://issues.apache.org/jira/browse/LUCENE-509?page=comments#action_12419926 ]
Chuck Williams commented on LUCENE-509:
---
LUCENE-545 does resolve this in a more general way, although the code to get
precisely one field value efficiently is slightly
Marvin Humphrey wrote on 07/08/2006 11:13 PM:
On Jul 8, 2006, at 9:46 AM, Chuck Williams wrote:
Many things would be cleaner in Lucene if fields had a global semantics,
i.e., if properties like text vs. binary, Index, Store, TermVector, the
appropriate Analyzer, the assignment of Directory
David Balmain wrote on 07/09/2006 06:44 PM:
On 7/10/06, Chuck Williams [EMAIL PROTECTED] wrote:
Marvin Humphrey wrote on 07/08/2006 11:13 PM:
On Jul 8, 2006, at 9:46 AM, Chuck Williams wrote:
Many things would be cleaner in Lucene if fields had a global
semantics,
i.e., if properties
Many things would be cleaner in Lucene if fields had a global semantics,
i.e., if properties like text vs. binary, Index, Store, TermVector, the
appropriate Analyzer, the assignment of Directory in ParallelReader (or
ParallelWriter), etc. were a function of just the field name and the
index. This
Doug Cutting wrote on 07/08/2006 09:41 AM:
Chuck Williams wrote:
I only work in 1.5 and use its features extensively. I don't think
about 1.4 at all, and so have no idea how heavily dependent the code in
question is on 1.5.
Unfortunately, I won't be able to contribute anything substantial
karl wettin wrote on 07/08/2006 10:27 AM:
On Sat, 2006-07-08 at 09:46 -0700, Chuck Williams wrote:
Many things would be cleaner in Lucene if fields had a global semantics,
Has this been considered before? Are there good reasons this path has
not been followed?
I've been
karl wettin wrote on 07/08/2006 12:27 PM:
On Sat, 2006-07-08 at 11:08 -0700, Chuck Williams wrote:
Karl, do you have specific reasons or use cases to normalize fields at
Document rather than at Index?
Nothing more than that the way the API looks it implies features that
does
DM Smith wrote on 07/07/2006 07:07 PM:
Otis,
First let me say, I don't want to rehash the arguments for or
against Java 1.5.
This is an emotional issue for people on both sides.
However, I think you have identified that the core people need to
make a decision and the rest of us
robert engels wrote on 07/06/2006 12:24 PM:
I guess we just chose a much simpler way to do this...
Even with you code changes, to see the modification made using the
IndexWriter, it must be closed, and a new IndexReader opened.
So a far simpler way is to get the collection of updates first,
Robert,
Either you or I are missing something basic. I'm not sure which.
As I understand things, an IndexWriter and an IndexReader cannot both
have the write lock at the same time (they use the same write lock file
name). Only an IndexReader can delete and only an IndexWriter can add.
So to
would manage this for you.
On Jul 6, 2006, at 9:16 PM, Chuck Williams wrote:
Robert,
Either you or I are missing something basic. I'm not sure which.
As I understand things, an IndexWriter and an IndexReader cannot both
have the write lock at the same time (they use the same write lock
I'd suggest forcing gc after each n iteration(s) of your loop to
eliminate the garbage factor. Also, you can run a profiler to see which
objects are leaking (e.g., the netbeans profiler is excellent). Those
steps should identify any issues quickly.
Chuck
robert engels wrote on 07/03/2006 07:40
IMHO, Hits is the worst class in Lucene. It's atrocities are numerous,
including the hardwired 50 and the strange normalization of dividing
all scores by the top score if the top score happens to be greater than
1.0 (which destroys any notion of score values having any absolute
meaning, although
[
http://issues.apache.org/jira/browse/LUCENE-609?page=comments#action_12417188 ]
Chuck Williams commented on LUCENE-609:
---
I'm late to the discussion and have only read the patch file, but it seems
invalid to me. Won't getField() get a class cast
the percentage of 1.4 users need to be,
before we can have 1.5 in Lucene?
Thanks,
Otis
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--
*Chuck Williams*
Manawiz
Principal
V
Ray Tsang wrote on 06/19/2006 09:06 AM:
On 6/17/06, Chuck Williams [EMAIL PROTECTED] wrote:
Ray Tsang wrote on 06/17/2006 06:29 AM:
I think the problem right now isn't whether we are going to have 1.5
code or not. We will eventually have to have 1.5 code anyways. But
we need a sound
[
http://issues.apache.org/jira/browse/LUCENE-561?page=comments#action_12416790 ]
Chuck Williams commented on LUCENE-561:
---
Christian,
That is a different bug than this one. This bug has been fixed.
Chuck
ParallelReader fails on deletes
[
http://issues.apache.org/jira/browse/LUCENE-398?page=comments#action_12416837 ]
Chuck Williams commented on LUCENE-398:
---
Christian,
I'm going to open a new issue on this in order to rename it, post a revised
patch, and hopefully get the attention
[ http://issues.apache.org/jira/browse/LUCENE-607?page=all ]
Chuck Williams updated LUCENE-607:
--
Attachment: ParallelTermEnum.patch
ParallelTermEnum is BROKEN
--
Key: LUCENE-607
URL: http
[
http://issues.apache.org/jira/browse/LUCENE-398?page=comments#action_12416838 ]
Chuck Williams commented on LUCENE-398:
---
Revised patch posted in LUCENE-607
ParallelReader crashes when trying to merge into a new index
JMA wrote on 06/17/2006 10:16 PM:
1) Is there a way to find a document that has null fields?
For example, if I have two fields (FIRST_NAME, LAST_NAME) for World Cup
players:
FIRST_NAME: Brian LAST_NAME: McBride
FIRST_NAME: Agustin LAST_NAME: Delgado
FIRST_NAME: Zinha
Tatu Saloranta wrote on 06/17/2006 06:54 AM:
And it's
bit curious as to what the current mad rush regarding
migration is -- beyond the convenience and syntactic
sugar, only the concurrency package seems like a
tempting immediate reason?
The only people who keep bringing up these
[ http://issues.apache.org/jira/browse/LUCENE-602?page=all ]
Chuck Williams updated LUCENE-602:
--
Attachment: TokenSelectorSoloAll.patch
TokenSelectorSoloAll.patch applies against today's svn head. It only requires
Java 1.4.
[PATCH] Filtering
[ http://issues.apache.org/jira/browse/LUCENE-602?page=all ]
Chuck Williams updated LUCENE-602:
--
Attachment: TokenSelectorAllWithParallelWriter.patch
TokenSelectorAllWithParallelWriter.patch contains ParallelWriter as well
(LUCENE-600) as it is also
Williams (JIRA) wrote:
[ http://issues.apache.org/jira/browse/LUCENE-600?page=all ]
Chuck Williams updated LUCENE-600:
--
Attachment: ParallelWriter.patch
Patch to create and integrate ParallelWriter, Writable and
TestParallelWriter -- also modifies build
You can try that approach, but I think you will find it more difficult.
E.g., all of the primitive query classes are written specifically to use
doc-ids. So, you either need to do you searches separately on each
subindex and then write your own routine to join the results, or you
would need to
Hi Wu,
The simplest solution is to synchronize calls to a
ParallelWriter.addDocument() method that calls IndexWriter.addDocument()
for each sub-index. This will work assuming there are no exceptions and
assuming you never refresh your IndexReader within
ParallelWriter.addDocument(). If
1 - 100 of 146 matches
Mail list logo