Hello,
Lets say I have two documents, both containing field F.
document 0 has the string "a b" as F
document 1 has the string "b a" as F
I am trying to make a phrasequery like:
PhraseQuery pq = new PhraseQuery();
pq.add(new Term("F", "a"));
pq.add(new Term("F", "b"));
I noticed this was because I was using a KeywordAnalyzer.
Is it possible to write a document with different analyzers in different fields?
Best.
On Tue, Sep 16, 2008 at 8:33 AM, Cam Bazz <[EMAIL PROTECTED]> wrote:
> Hello,
>
> Lets say I have two documents, both containing field F.
>
> document
yes, I made it that way. but still have to port some of my code.
thanks a lot.
On Tue, Sep 16, 2008 at 6:28 AM, Otis Gospodnetic
<[EMAIL PROTECTED]> wrote:
> I think Daniel was suggesting you write your own HitCollector with its own
> "int hits" counter var.
>
> Otis
> --
> Sematext -- http://se
In cases where we dont know the possible number of hits -- and wanting
to test the new 2.4 way of doing things,
could I use custom hitcollectors for everything? any performance
penalty for this?
from what I understand both TopDocCollector and TopDocs will try to
allocate an array of Integer.MAX_V
I think Daniel was suggesting you write your own HitCollector with its own "int
hits" counter var.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: Cam Bazz <[EMAIL PROTECTED]>
> To: java-user@lucene.apache.org
> Sent: Monday, September 15,
Yes, I looked into implementing a custom collector that would return
number of hits, but - I could not.
collect() can not access anything that is final, and final can not be
incremented.
Any ideas?
Best.
On Tue, Sep 16, 2008 at 6:05 AM, Daniel Noll <[EMAIL PROTECTED]> wrote:
> Cam Bazz wrote:
>
I don't think the "exists vs. doesn't exist" matters (but I should really try
it and see) as much as using Sort vs. not using it if you use sorting because
sorting required FieldCache loading.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> Fr
Cam Bazz wrote:
Hello,
Could it harm if I make a
searcher.search(query, Integer.MAX_VALUE) ?
I just need to make a query to get the number of hits in this case,
but I dont know what the max hits will be.
PriorityQueue will attempt to allocate an array of that size.
But if you only need to k
Otis Gospodnetic wrote:
Hi,
Check the Hits javadoc:
* @deprecated Hits will be removed in Lucene 3.0.
* Instead e. g. [EMAIL PROTECTED] TopDocCollector} and [EMAIL PROTECTED] TopDocs}
can be used:
*
* TopDocCollector collector = new TopDocCollector(hitsPerPage);
* searcher.search(qu
Hello,
Could it harm if I make a
searcher.search(query, Integer.MAX_VALUE) ?
I just need to make a query to get the number of hits in this case,
but I dont know what the max hits will be.
Also When I make a topdocs.totalHits is that same as topdocs.scoreDocs.length()?
Best.
-C.A.
---
Hello,
What kind of query is best to warm up a searcher? How many searches should I do?
Are we supposed to search for things we know do exist, or is it better
to make queries we know they dont exist?
Best.
-C.B.
-
To unsubscrib
Sorry, I was talking about the future (when we can get realtime search
working with Lucene).
You have to change your code below to open a new reader (or reopen the
reader from your IndexSearcher) call isDeleted on the new reader, to
see the deletion you did with the writer.
Or, you have
I am updating it to work with trunk.
On Mon, Sep 15, 2008 at 2:11 PM, Otis Gospodnetic
<[EMAIL PROTECTED]> wrote:
> Yes, probably out of sync with the 2.3.2 code. Have you tried applying it to
> the trunk?
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Or
Hello Karl;
This is good good good news. It works.
However, I added a document like
doc.add(new Field("f", "a", Field.Store.YES,
Field.Index.NOT_ANALYZED_NO_NORMS));
and then searched. The score is 0.3~ for the found document. should
not it be 1.0?
also it will find when searched for "f","b" o
Well,
Document da = new Document();
da.add(new Field("word", "a", Field.Store.YES,
Field.Index.NOT_ANALYZED_NO_NORMS));
writer.addDocument(da);
writer.commit();
searcher = new IndexSearcher(dir);
IndexReader reader = searcher.getIndexReader();
It will return true if the provided docID was deleted, by term or
query or docID (due to exception, privately) prior to when you asked
IndexWriter to give you a "realtime" IndexReader.
Mike
Cam Bazz wrote:
ok. but then under what circumstances isDeleted() will return true?
Best.
On Mon
We are in the [slowish] process of releasing 2.4 now -- we are down to
3 2.4 issues:
https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&mode=hide&sorter/order=DESC&sorter/field=priority&resolution=-1&pid=12310110&fixfor=12312681
Once these are resolved then we'll work
15 sep 2008 kl. 18.51 skrev Karl Wettin:
Are the adds reflected directly to the index?
Yes. An InstantiatedIndexReader is always current.
You will probably still have to reconstruct your searcher.
I never really looked in to what happends if you don't.
The second statement was wrong. There
ok. but then under what circumstances isDeleted() will return true?
Best.
On Mon, Sep 15, 2008 at 10:57 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
>
> Until we can get realtime search integrated into Lucene (which I'm gradually
> trying to working on) I think the answer is no -- for now yo
out of curiousity and somewhat unrelated to this thread. when can we
expect to see 2.4?
it seems much much as changed. so people would want to port their code?
Best.
On Mon, Sep 15, 2008 at 10:56 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
>
> Cam Bazz wrote:
>
>> well, I did not understan
Hello Dipen,
I think what he meant is that if power is off the last transaction is
trashed, but your index is not.
Best.
On Mon, Sep 15, 2008 at 10:55 PM, Dipen <[EMAIL PROTECTED]> wrote:
> hi michael,
> this is rather hard for me to understand, if a system loses power
> (electricity), how can
It's only if power is lost *after* the call to IndexWriter.commit()
has successfully returned, that the guarantee holds.
commit() does not return until all newly written and referenced files
in the index have been successfully fsync'd (and the OS does not
return from fsync until all bytes
Until we can get realtime search integrated into Lucene (which I'm
gradually trying to working on) I think the answer is no -- for now
you have to keep your own record of which docIDs you've deleted.
Because IndexWriter allows deletes by query and term (and also by
docID, privately, when
Cam Bazz wrote:
well, I did not understand here.
so there is a no way of using the new constructor - and specify
autoCommit = false ?
That's right, until 3.0.
I would prefer to have a new API, introduced in 2.4 and kept in 3.0,
that has autoCommit=false as its default (without being speci
hi michael,
this is rather hard for me to understand, if a system loses power
(electricity), how can it be ensured that fsync() call will happen at all,
this commit function relies on fsync() but what if OS doesnt have time or
power in this case to actually call fsync() and synchronize. I read ab
well, I did not understand here.
so there is a no way of using the new constructor - and specify
autoCommit = false ?
Best
On Mon, Sep 15, 2008 at 10:30 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
>
> Cam Bazz wrote:
>
>> However the documentation states that autoCommit=true.
>
> For now,
So, apart from the searcher, is there anyway to access the deletion
marks in an indexWriter.
I have a live cache - and I was keeping two caches, ones for new adds,
other for deletes.
I am trying to get rid of deleted cache, and ask the index if a
fetched document is marked deleted.
Best.
-C.B.
Cam Bazz wrote:
However the documentation states that autoCommit=true.
For now, keep using the deprecated API and specify autoCommit=false.
Then in 3.0, when IndexWriter switches to autoCommit=false, remove the
boolean autoCommit from your constructor.
How do we disable this? In 2.3 I
You'll have to open a new IndexReader after the delete is committed.
An IndexReader (or IndexSearcher) only searches the point-in-time
snapshot of the index as of when it was opened.
Mike
Cam Bazz wrote:
Hello,
Here is what I am trying to do:
dir = FSDirectory.getDirectory("/test
great!
well I never use autoCommit=true.
However the documentation states that autoCommit=true.
How do we disable this? In 2.3 I used to do a:
writer.setMaxBufferedDocs(IndexWriter.DISABLE_AUTO_FLUSH);
would that totally disable autoCommit, or will it autoCommit when the
ram usage reaches a ce
Cam Bazz wrote:
Hello,
I see that IndexWriter.flush() is depreciated in 2.4. What do we use?
Looks like you already found it, but the javadoc says this:
* @deprecated please call [EMAIL PROTECTED] #commit()}) instead
Also I used to make a:
try {
nodeWriter = new I
Hello,
Here is what I am trying to do:
dir = FSDirectory.getDirectory("/test");
writer = new IndexWriter(dir, analyzer, true, new
IndexWriter.MaxFieldLength(2));
writer.setMaxBufferedDocs(IndexWriter.DISABLE_AUTO_FLUSH);
Document da = new Document();
da.ad
Hello,
Thanks a bunch Michael. Its been a long time I wanted to upgrade to
2.4. It seems major change has been done.
Best.
On Mon, Sep 15, 2008 at 9:49 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
>
> Oh and I just committed a fix to IndexWriter's javadocs -- commit(long) is a
> private met
Oh and I just committed a fix to IndexWriter's javadocs --
commit(long) is a private method that should never have been in the
javadocs. Thanks for raising this!
Mike
Cam Bazz wrote:
Hello,
What is the difference between flush in <2.4 and commit?
Also I have been looking over docs, an
There is no difference, unless your computer/OS crashes or loses power
shortly after you had call the method.
In that case, there's a big difference: commit() guarantees your index
will be intact (assuming the storage system holding your index was not
damaged) but with flush(), which does
Hello,
I would like to get advantage of isDeleted. If I delete a document
from index, and not commit, and index searcher is not reinstantiated,
how can I check if a document is marked for deletion? I tried it with
both commit() and without committing, the isDeleted(mydeleteddocid)
returns always f
Yes, probably out of sync with the 2.3.2 code. Have you tried applying it to
the trunk?
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: Cam Bazz <[EMAIL PROTECTED]>
> To: java-user@lucene.apache.org
> Sent: Monday, September 15, 2008 11:14
Hi,
Check the Hits javadoc:
* @deprecated Hits will be removed in Lucene 3.0.
* Instead e. g. [EMAIL PROTECTED] TopDocCollector} and [EMAIL PROTECTED]
TopDocs} can be used:
*
* TopDocCollector collector = new TopDocCollector(hitsPerPage);
* searcher.search(query, collector);
* Scor
Hello,
What is the new favorable way of searching a query?
I understand Hits will be depreciated. So how do we do it the new way?
With hit collector?
Best.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-
Hello,
What is the difference between flush in <2.4 and commit?
Also I have been looking over docs, and they mention commit(long) but
there is no commit(long) method but only commit()
Best.
-
To unsubscribe, e-mail: [EMAIL PROT
Hello,
I see that IndexWriter.flush() is depreciated in 2.4. What do we use?
Also I used to make a:
try {
nodeWriter = new IndexWriter(nodeDir, true, analyzer, false);
} catch(FileNotFoundException e) {
nodeWriter = new IndexWriter(nodeDir, true, analyzer,
15 sep 2008 kl. 18.45 skrev Cam Bazz:
I have been looking at instantiated index in the trunk. Does this come
with a searcher?
Pass an InstantiatedIndexReader to the constructor of an IndexSearcher.
Are the adds reflected directly to the index?
Yes. An InstantiatedIndexReader is always cur
Hello,
I have been looking at instantiated index in the trunk. Does this come
with a searcher? Are the adds reflected directly to the index?
Or is it just an experimental thing only with reader and writer?
Best.
-
To unsubscrib
Hm, probably that is not needed.
I thought that tf would influence the score if I don't set it to
constant value, but it seems that it is sufficient to override just
lengthNorm.
-Original Message-
From: Karl Wettin [mailto:[EMAIL PROTECTED]
Sent: Monday, September 15, 2008 4:56 PM
To:
15 sep 2008 kl. 14.08 skrev Dragan Jotanovic:
I made simple Similarity implementation:
public float tf(float arg0) {
return 1f;
}
Why do you touch the term frequency? Is that prehaps unrelated to
what's discussed in this thread?
karl
Filters aren't really specified per field. All they are is a bitmask, one
bit per *document*. You can construct the filter any way you want, in your
case by inspecting the date-time field and passing it along with your query.
You can even combine several fields into one filter by twiddling the bits
Hi All,
I am trying to utilize Filter to see if I can get a bit more performance out
of my application that searches over 100million document lucene index.
On all my documents I have a two fields over which I will have to scope my
searches. One is a date-time field (MMDDHHMMSS) and a user-i
Well Hello,
I made the patch inside trunk/src but I am getting failed errors.
does this mean the lucene-1314 is buggy, or maybe I applied it to the
wrong version?
Best.
joker src # pwd
/root/lucene/lucene-2.3.2/src
joker src # patch -p0 < ../../lucene-1314.patch
patching file java/org/apache/l
We are using Abbyy (FineReader) Index&Search Libraries and Morpology SDK
since 1999.
Our SearchString are likes these :
**
*borusan* | Soruşan* | bbrusan* | "borusan istanbul filarmo*" | "gürer
aykal*" | "borusan oda orkestras*" | "borusan sanat gale*" | "zehra *
*nurhan kocabıyık ilköretim*"*
**
The unsatisfactory answer is "because that's the
way it works".
I suspect that the underlying issue is what happens when
you try to expand phrase searches via wildcards. Wildcard
searches are already plagued by "TooManyClauses" exceptions,
which would only get worse with phrases In fact, downright
I read. But i didn't understand why not ?
15 Eylül 2008 Pazartesi 16:56 tarihinde Erick Erickson
<[EMAIL PROTECTED]> yazdı:
> wildcards are NOT supported within double quotes, so if
> you are submitting your query
> "Technology Gunlugu*"
> WITH the double quotes, you are searching for
> that liter
wildcards are NOT supported within double quotes, so if
you are submitting your query
"Technology Gunlugu*"
WITH the double quotes, you are searching for
that literal phrase.
Best
Erick
P.S. See:
http://lucene.apache.org/java/docs/queryparsersyntax.html
the first line under "wildcard searches"
52 matches
Mail list logo