There may be a problem that you may not want to restore the peek token into
the TokenFilter's attributes itsself. It looks like you want to have a Token
instance returned from peek, but the current Stream should not reset to this
Token (you only want to "look" into the next Token and then possibly
Hi,
I'm new to this site. My question is:
Articles in wikipedia can be edited by everyone and may or may not be
accurate. If any contributor writes an article and then another contributor
deletes certain content in that article would indicate that the article is
controversial.
I need to start o
32 bit JVM, 1.3G allocated heap size, Lucene 2.4.1.
In my option, this exception should not be caused by out of memory or out of
system file handle because different exception should be thrown for these two
cases.
Any hint?
Thanks,
Fang, Li
-Original Message-
From: Uwe Schindler [mail
Fair enough. This Directory impl is really only useful in conjunction with
other usage of the jdbm embedded database -- I can't imagine people layering
other Lucene-dependent projects on top (like Solr or whatever).
I suppose if that time ever comes, I'll revisit the issue. :-)
- Chas
hossman
: new TermsFilter( termsList:[ new Term( 'id', '111' ) ] ).bits( indexReader
: ).cardinality()
...
: indexReader.docFreq( new Term( 'id', '111' ) )
...
: Which one is faster? Can I replace the 2nd one with the 1st and still get
: the same performance?
"the second", and "no"
-Ho
This is what I had in mind (completely untested!):
public class LookaheadTokenFilter extends TokenFilter {
/** List of tokens that were peeked but not returned with next. */
LinkedList peekedTokens = new
LinkedList();
/** The position of the next character that peek() will return in
pee
Daniel,
take a look at the captureState() and restoreState() APIs in
AttributeSource and TokenStream. captureState() returns a State object
containing all attributes with its' current values. restoreState(State)
takes a given State and copies its values back into the TokenStream. You
should b
After thinking about it, the only conclusion I got was instead of saving
the token, to save an iterator of Attributes and use that instead. It
may work.
Daniel Shane
Daniel Shane wrote:
Hi all!
I'm trying to port my Lucene code to the new TokenStream API and I
have a filter that I cannot se
Hi all!
I'm trying to port my Lucene code to the new TokenStream API and I have
a filter that I cannot seem to port using the current new API.
The filter is called LookaheadTokenFilter. It behaves exactly like a
normal token filter, except, you can call peek() and get information on
the next
I think you should do this instead (it will print the exception message
*and* the stack trace instead of only the message) :
throw new IndexerException ("CorruptIndexException on doc: " + doc.toString(),
ex);
Daniel Shane
Chris Bamford wrote:
Hi Grant,
I think you code there needs to sh
: Right, and those methods (IndexReader.lastModified and
: IndexCommit.lastModified) aren't used at all. I guess what I meant to say
they aren't used internal in Lucene, but they are part of the public API,
so client code (apps built using Lucene) may expect them to work for their
own internal
Hi,
I am putting some text into a field which we set to Field.Store.NO &
Field.Index.NOT_ANALYZED. We are then doing exact & fuzzy matches against
that text (about the length of an average sentence). Currently, we have a
single field that is being used for both exact and fuzzy matches while we
hav
: Subject: Question about IndexCommit
: In-Reply-To: <9ac0c6aa0909010403k3306307dxa7751ecff3fa2...@mail.gmail.com>
http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists
When starting a new discussion on a mailing list, please do not reply to
an existing message, inst
: Subject: Why perform optimization in 'off hours'?
: In-Reply-To:
: <5b20def02611534db08854076ce825d8032db...@sc1exc2.corp.emainc.com>
http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists
When starting a new discussion on a mailing list, please do not reply to
Hi,
I am putting some text into a field which we set to Field.Store.NO &
Field.Index.NOT_ANALYZED. We are then doing exact & fuzzy matches against
that text (about the length of an average sentence). Currently, we have a
single field that is being used for both exact and fuzzy matches while we
hav
Alex,
That's right, you'll have to roll your own if you want to do cross-language
search in Lucene. But some of the components you need are available.
Two possible cross-language search strategies don't scale well when the
document collection size is non-trivial: a) translating all documents i
Many thanks Steve for all that information.
I understand by your answer that cross-lingual search doesn't come "out-of
-the-box" in Lucene.
Cheers.
Alex
On Tue, Sep 1, 2009 at 6:46 PM, Steven A Rowe wrote:
> Hi Alex,
>
> What you want to do is commonly referred to as "Cross Language Informati
Hi Alex,
What you want to do is commonly referred to as "Cross Language Information
Retrieval". Doug Oard at the University of Maryland has a page of CLIR
resources here:
http://terpconnect.umd.edu/~dlrg/clir/
Grant Ingersoll responded to a similar question a couple of years ago on this
lis
Late reply, but do what Michael said. I didn't understand that you hadso
many indexes, but opening/closing makes sense to me now, reusing
them wouldn't be very useful. Although I *can* imagine a "keep each open
for 5 minute" sort of rule being useful on the assumption that a user might
search sever
Hi,
I am new to Lucene so excuse me if this is a trivial question ..
I have data that I Index in a given language (English). My users will come
from different countries and my search screen will be internationalized. My
users will then probably query thing in their own language. Is it possible
t
That's excellent.
Thanks very much for the explanations
- Original Message
> From: Michael McCandless
> To: java-user@lucene.apache.org
> Sent: Tuesday, September 1, 2009 8:26:45 AM
> Subject: Re: Question about IndexCommit
>
> Further, when IndexWriter writes new .del files, it's
Thanks Mike, I get what you mean now :-)
BTW I have tested the code with 1 open/close per search (rather than keeping
the IndexReader open between searches) and so far I have witnessed no change in
performance! I will experiment with bigger indexes, but early signs are very
encouraging :-)
Further, when IndexWriter writes new .del files, it's always to a new
(next generation) filename, so that the old .del file remains present.
This means if a fresh IndexReader is opened, it will load the old
.del file, and still not see any of IndexWriter's pending changes.
Mike
On Tue, Sep 1, 20
If I'm not mistaken, IndexReader reads the .del file into memory, and
therefore subsequent updates to it won't be visible to it.
Shai
On Tue, Sep 1, 2009 at 3:54 PM, Ted Stockwell wrote:
> Hi All,
>
> I am interested in using Lucene to index RDF (Resource Description Format)
> data.
> Ultimatel
Hi All,
I am interested in using Lucene to index RDF (Resource Description Format) data.
Ultimately I want to create a transactional interface to the data with proper
transaction isolation.
Therefore I am trying to educate myself on the details of index readers and
writers, I am using v2.9rc2.
setUseCompoundFile is an IndexWriter method. It already defaults to
"true", so you probably are already using compound file format. If
you look in your index directory and see only *.cfs (plus segments_N
and segments.gen) then you are using compound file format.
Mike
On Tue, Sep 1, 2009 at 8:20
Hi Mike,
Thanks for the suggestions, very useful. I would like to adopt a combination
of setUseCompoundFile on the IndexReader and perform an open/close per search.
As a start, I just tried to set compound file format on the IndexSearcher's
underlying IndexReader, but it is not available as a m
In this approach it's expected you'll run out of file descriptors,
when "enough" users attempt to search at the same time.
You can reduce the number of file descriptors required per IndexReader
by 1) using compound file format (it's the default for IndexWriter),
and 2) optimizing the index before
Hi Erick,
>>Note that for search speed reasons, you really, really want to share your
>>readers and NOT open/close for every request.
I have often wondered about this - I hope you can help me understand it better
in the context of our app, which is an email client:
When one of our users receives
Which Lucene version, 64 bit JVM?
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: fang...@emc.com [mailto:fang...@emc.com]
> Sent: Tuesday, September 01, 2009 12:04 PM
> To: java-user@lucene.apache.org
>
We are running on Windows 2003 Enterprise Edition with NTFS file system on a
local disc. JDK version is 1.5.0.12.
The problem was discussed before and there is no clear solution confirmed.
Thanks.
-Original Message-
From: Danil ŢORIN [mailto:torin...@gmail.com]
Sent: Tuesday, September
There should be no problem with large segments.
Please describe OS, FileSystem and JDK you are running on.
There might be some problems with file >2Gb on Win32/FAT, or in some
ancient Linuxes.
On Tue, Sep 1, 2009 at 12:37, wrote:
> I met a problem to open an index bigger than 8GB and the followi
I met a problem to open an index bigger than 8GB and the following
exception was thrown. There is a segment which is bigger than 4GB
already. After searching internet, it is said that not using compound
index may solve the problem.
The same exception was thrown when merging with another index happ
Hi Grant,
>>I think you code there needs to show the underlying exception, too, so
>>we can see that stack trace.
Ummm... isn't this code already doing that? What am I missing?
try {
indexWriter.addDocument(doc);
} catch (CorruptIndexException ex) {
Function queries should work here? (org.apache.lucene.search.function.*).
Mike
On Tue, Sep 1, 2009 at 2:24 AM, marquinhocb wrote:
>
> I would like to create a scorer that applies a score based on a value that is
> calculated during a query. More specifically, to apply a score based on
> geograph
35 matches
Mail list logo