On 01.11.2012 г. 15:09, Michael McCandless wrote:
On Thu, Nov 1, 2012 at 6:11 AM, Ivan Vasilev wrote:
Hy Guys,
I intend to extend DocumentStoredFieldVisitor class like this:
class DocumentStoredNonRepeatableFieldVisitor extends
DocumentStoredFieldVisitor {
@Override
public
Hy Guys,
I intend to extend DocumentStoredFieldVisitor class like this:
class DocumentStoredNonRepeatableFieldVisitor extends
DocumentStoredFieldVisitor {
@Override
public Status needsField(FieldInfo fieldInfo) throws IOException {
return fieldsToAdd == null || fieldsToAdd
Hy Guys,
I use as suggested in question "Lucene 4.0 delete by ID" from 29.Oct -
instead of reader.delete(docID) use - writer.tryDeleteDocument(..)
method but for some reason it does not work.
My code is:
IndexWriterConfig iwc = new
IndexWriterConfig(Version.LUCENE
mericRangeQuery/Filter on the
field.
Mike McCandless
http://blog.mikemccandless.com
On Wed, Oct 31, 2012 at 9:42 AM, Ivan Vasilev wrote:
Hy Guys,
Is there some advantage in speed or index size to use this:
IntDocValuesField fld = new IntDocValuesField("fldName", 1);
StoredField
Thanks Mike.
On 31.10.2012 г. 15:52, Michael McCandless wrote:
The big advantage of IntField is you can do NumericRangeQuery/Filter
on the field.
Mike McCandless
http://blog.mikemccandless.com
On Wed, Oct 31, 2012 at 9:42 AM, Ivan Vasilev wrote:
Hy Guys,
Is there some advantage in speed
Hy Guys,
Is there some advantage in speed or index size to use this:
IntDocValuesField fld = new IntDocValuesField("fldName", 1);
StoredField fld = new StoredField("fldName", 1);
instead of this:
IntField fld = new IntField("fld", 1, Field.Store.YES);
Searching, sorting and retrieving data fr
Thanks Simon!
On 29.10.2012 г. 21:38, Simon Willnauer wrote:
you should call currDocsAndPositions.nextPosition() before you call
currDocsAndPositions.getPayload() payloads are per positions so you
need to advance the pos first!
simon
On Mon, Oct 29, 2012 at 6:44 PM, Ivan Vasilev wrote:
Hi
incremented form 1 to 4, and after each incrementation is invoked
payloadAttr.setPayload(..), but strangely when reading
DocsAndPositionsEnumwe see those payloads (1 to 4) belong actually to
doc #1.
Do I make some mistake with invoking setPayload(..) method or it is a bug?
Cheers,
Ivan Vasil
Thanks Robert
On 26.10.2012 г. 18:49, Robert Muir wrote:
On Fri, Oct 26, 2012 at 11:47 AM, Ivan Vasilev wrote:
if you want to not use jars, then its not enough to add the
/src/java directories. you also need /src/resources
directories in the classpath
perfield
What other Lucene packages I need to include to avid the Exception?
I prefer adding source code instead of jar(s).
Cheers,
Ivan Vasilev
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional command
the answer to your second question.
--
Ian.
On Thu, Oct 25, 2012 at 2:50 PM, Ivan Vasilev wrote:
Hy Guys,
In previous versions of Lucene there was a class TermPositions that could be
obtained form IndexReader.
Is there something that replaces it in Lucene 4.0.0?
Also is there some documentation t
available?
Cheers,
Ivan Vasilev
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
Hi Guys,
I would like to fix a class in
contrib/misc/src/java/org/apache/lucene/index called IndexSplitter. It
has a bug - when splits the segments in separate index the segment
descriptor file contains a wrong data - the number (the name) of next
segment to generate is 0. Although it can not
On 18.1.2011 г. 23:04, Grant Ingersoll wrote:
[x] ASF Mirrors (linked in our release announcements or via the Lucene website)
[] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
[x] I/we build them from source via an SVN/Git checkout.
[] Other (someone in your company mirrors th
to-level index information (see above).
Uwe
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
-Original Message-----
From: Ivan Vasilev [mailto:ivasi...@sirma.bg]
Sent: Friday, October 08, 2010 8:35 AM
To: LUCENE MAIL LIST
Subject: detect
Hi Guys,
Is there way to detect org.apache.lucene.util.Version of an index having
IndexReader or just FSDirectory?
I know I can open segments file and read the proper bytes according to
rules of creating it but is there more smart way to do this without
using RandomAccessFile or something lik
That`s fine Andrzej :) doing split in just one pass really matters for
big indexes.
Hope we will use it in our application.
Thanks,
Ivan
Andrzej Bialecki wrote:
On 2010-05-12 14:29, Ivan Vasilev wrote:
Hi Michael,
Thanks for your answer.
What we do now:
1. Splitting indexes. We do it not
much
as it did before.
Can you explain in more detail what you are doing w/ Lucene that
requires the doc stores to not be shared? EG for splitting an index,
there is the multi-pass index splitter (in contrib/misc).
Mike
On Wed, May 12, 2010 at 5:33 AM, Ivan Vasilev wrote:
Hi Guys,
Can
Hi Guys,
Can anybody tell me how to avoid sharing of docStore files (term vectors
& stored fields)? I mean to avoid creation of cfx files.
This is important for us because we support some operations like
splitting index, updating index fields (via running optimization that
has some differenc
es to a
non-scored TermQuery. If you already changed QueryParser, you can also override
the method for exactMatches (newTermQuery).
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
-Original Message-
From: Ivan Vasilev [mailto:ivasi..
Hi Guys,
Is it possible to make exact searches on fields that are of type
NumericField and if yes how?
In the LIA book part 2 I found only information about Range searches on
such fields and how to Sort them.
Example - I have field "size" that can take integers as values.
I want to get docs t
fields get automatically decompressed. But there is nothing to do from your
side!
Uwe
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
-Original Message-
From: Ivan Vasilev [mailto:ivasi...@sirma.bg]
Sent: Tuesday, December 29, 2009
i.de
eMail: u...@thetaphi.de
-Original Message-----
From: Ivan Vasilev [mailto:ivasi...@sirma.bg]
Sent: Tuesday, December 29, 2009 11:50 AM
To: java-user@lucene.apache.org
Subject: Re: Compressing field content with Lucene 3.0
10x Uwe for your answer,
It is good news that data compr
Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
-Original Message-
From: Ivan Vasilev [mailto:ivasi...@sirma.bg]
Sent: Monday, December 28, 2009 7:13 PM
To: LUCENE MAIL LIST
Subject: Compressing field content with Lucene 3.0
Hi Guys,
Could you give me advice how to deal with Lucene
Hi Guys,
Could you give me advice how to deal with Lucene 3.0 with 2.4 indexes
that contain compressed data.
Our case is following - we have code like this:
Field.Store fieldStored = storedFieldsSet.contains(fieldName) ?
(fieldValue.length() >= COMPRESS_THRESHOLD ? Field.Store.COMPRESS :
Fi
OK, thanks guys!
Grant Ingersoll wrote:
On Oct 16, 2009, at 6:05 AM, Uwe Schindler wrote:
I would recommend to adopt your app to 2.9 and enable deprecation
warnings.
As soon as all deprecation warning disappear during compile, you are
able to
just go to 3.0 (just drop in jars when available)
Hi Lucene Guys,
I am interested what is your plan date for releasing Lucene 3.0.
I am asking because seeing on the changes in Lucene 2.9 (especially
changes in backward compatibility) I guess that it will be difficult for
us to adopt our app to Lucene 2.9. I see in your Jira there are not many
mDocs termDocs = this.reader.termDocs(term);
int count = 0;
while(termDocs.next()){
count += termDocs.freq();
}
simon
On Mon, Aug 24, 2009 at 6:14 PM, Ivan Vasilev wrote:
Hi All,
We use faceting in our app but it is very slow for the indexes that use our
clients.
First I will say w
Hi All,
We use faceting in our app but it is very slow for the indexes that use
our clients.
First I will say what I understand under faceting - this is for each
term for certain field to obtain 1. number of docs that contain it, 2.
the total number of occurrences of the term in the index.
No
Thanks Guys for the answers!
Steven, I tried with the ".*" instead of "*" but it did not worked as
desired. The ".*" does not replace any symbol(s) in the query. I tested
with different Analyzers. Depending on Analyzer it is omitted or ".*"
are treated just as normal symbols.
Mark, your clas
Hy Guys,
Does anybody knows if there is way to use wild cards in SpanQuery?
My idea is for example instead of query - content:"expansive
computer"~10 - we to use query - content:"exp* comp*"~10. And the
results of first query to be subset of those of second one.
I tried with parsing the above w
:
Why you don't extend to HitCollector and put all logic you need into it?
Ivan Vasilev-2 wrote:
Hi All,
As Hits class was deprecated in current Lucene and is expected to be
excluded from Lucene 3.0 we decided to change our code so that to use
TopDocs class.
Our app provides paging an
OK Guys Thanks ,
Thanks for your help. I really think that paging without caching will be
best for in case. I think in most cases users find results in the first
page. When not, I think they would not not go through more than 2-3 more
pages or just will narrow the search by adding more filter
Hi All,
As Hits class was deprecated in current Lucene and is expected to be
excluded from Lucene 3.0 we decided to change our code so that to use
TopDocs class.
Our app provides paging and now we are uondering what is the bset way to
do it with th TopDocs. I can see only this possibility:
1.
Regards,
Ivan
Ivan Vasilev wrote:
Hi Guys,
Does anybody know if it is possible results to be sorted using the
ParallelReader?
Best Regards,
Ivan
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL
Hi Guys,
Does anybody know if it is possible results to be sorted using the
ParallelReader?
Best Regards,
Ivan
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
).
Also, this behavior isn't "promised" in the API, ie it could in theory
(though I think it unlikely) change in a future release of Lucene.
And remember when a merge completes (or, optimize), any deleted docs
will "collapse down" all docIDs after them.
Mike
Ivan Vasile
Hi Lucene Guys,
I have a question that is simple but is important for me. I did not
found the answer in the javadoc so I am asking here.
When adding Document-s by the method IndexWriter.addDocument(doc) does
the documents obtain Lucene IDs in the order that they are added to the
IndexWriter? I
ng words forward and consithers it exclusive
when counting backwards.
Darren Govoni wrote:
One interpretation of the query with ~5 is that your text has 5 words
and ~5 would imply a word in any position can match. Could it be this?
- Original Message - From: "Ivan Vasilev" <[EMA
Of cours in our system I can use SpanNearQuery instead of PhraseQuery.
My question is is there known performance differences between the two
classes?
Ivan Vasilev wrote:
Hi Guys,
I make the following test – I create 2 files. File1.txt with content:
“apple 2 3 4 pear”
And File2.txt with
Hi Guys,
I make the following test – I create 2 files. File1.txt with content:
“apple 2 3 4 pear”
And File2.txt with content:
“pear 2 3 4 apple”
I made the following searching tests:
1. Using Luke Search tab.
1.1. When searching for:
content:"pear apple"~3
Then the File1.txt is returned.
1.2.
This is! Now I finally got it :)
OK will use it only for test integration by now (if there will time for
this :) ) and will expect the third patch.
Have a nice time :)
Ivan
Mathieu Lecarme wrote:
Ivan Vasilev a écrit :
Thanks Mathieu,
I tryed to checkout but without success. Anyway I can
nk/src/java': 200 OK
(https://admin.garambrogne.net)
Mathieu Lecarme wrote:
Ivan Vasilev a écrit :
Thanks Mathieu for your help!
The contribution that you have made to Lucene by this patch seems to
be great, but the hunspell dictionary is under LGPL which the lawyer
of our company does not
ackages out of it. If possible could you give
a link from where to get these sources as they are?
Best Regards,
Ivan
Mathieu Lecarme wrote:
Ivan Vasilev a écrit :
Hi Guys,
Has anybody integrated the Spell Checker contributed to Lucene.
http://blog.garambrogne.net/index.php?post/2008/03/07/A
Hi Guys,
Has anybody integrated the Spell Checker contributed to Lucene. I need
advise from where to get free dictionary file (one that contains all
words in English) that could be used to create instance of
PlainTextDictionary class. I currently use for my tests responding files
from Jazzy a
ing in a doc the greatest bigramms clusters
covering the phrase token.
Best Regards
Uwe
-Ursprüngliche Nachricht-----
Von: Ivan Vasilev [mailto:[EMAIL PROTECTED]
Gesendet: Freitag, 21. März 2008 16:25
An: java-user@lucene.apache.org
Betreff: Re: feedback: Indexing speed improvement lucene 2.
Hi Uwe,
Could you tell what Analyzer do you use when you marked so big indexing
speedup?
If you use StandardAnalyzer (that uses StandardTokenizer) may be the
reason is in it. You can see the pre last report in the thread "Indexing
Speed: 2.3 vs 2.2 (real world numbers)". According to the repor
Hi Guys,
In the File Formats web page
(http://lucene.apache.org/java/2_3_0/fileformats.html) there is section
describing Segments File, where we read:
Segments --> Format, Version, NameCounter, ...
...
Format is -1 as of Lucene 1.4 and -3
(SemgentInfos.FORMAT_SINGLE_NORM_FILE) as of Lucene 2
able to use our tools for
splitting index. The only thing that we will have to do is to add (-1)
in position of DocStoreOffset in segments_N file.
Thanks,
Ivan
Michael McCandless wrote:
Ivan Vasilev wrote:
Hi Lucene Guys,
As I see in the Lucene web site in file formats page the version 2.3
Hi Lucene Guys,
As I see in the Lucene web site in file formats page the version 2.3
will have some changes in file formats that are very important for us.
First I will say what we do and then will ask my questions.
We distribute the index on some machines. The implementation is made so
that
Hi Lucene Guys,
Can you say approximately when will be released Lucene 2.3? We have some
costumizations in the source code of hte Lucene and we will have to
transfer them in the 2.3 release, so it is important for us to know when
approximately this will happen in order to make our plans.
Tha
Thanks once again :)
Best Regards,
Ivan
Steven Rowe wrote:
> Hi Ivan,
>
> Ivan Vasilev wrote:
>
>> But how to understand the meaning of this: “To overcome this, you
>> have to index chinese characters as single tokens (this will increase
>> recall, but decrease pre
overlapping bigrams: AB BC CD.
> Thus issuing a query containing one chinese sign will not retrieve any
> documents. To overcome this, you have to index chinese characters as single
> tokens (this will increase recall, but decrease precision).
>
> Hope this will help,
> Samir
&g
Hi Guys,
I have made tests with the CJKAnalyzer and the results show something
that seems very strange to me. First I have to say that I do not
understand non of the CJK languages.
What I do is the following I write some text in English and translate it
using an on-line tool, which give me the
10x Hoss for the answer. It is good news that this topic is very rare
and clients do not complain about this. I hope our clients will also not
complain :)
Looking strictly at this I think this leads to a non correct behavior on
indexing applications, but as there are no unsatisfied clients may
Hi Guys,
We have implemented per field setting of Analyzers, based on the
language that is used for the responding field. Example: field FileName
is in English, field Content in Chinese. This we do by creating our
class that implements Analyzer and wraps two analyzers StandardAnalyzer
and CJK
Hi Guys,
There is something in the Lucene that disturbs me. My question is about
sorting. In the queries there are used collator objects that sort the
results (in the class FieldSortedHitQueue). But in the indexing process
they are not used. As I now all the terms are ordered during the
indexi
Hi Guys,
Do anyone who tests the Analyzers can give me some CJK test resources or
advice me from where to obtain.
Best Regards,
Ivan
Ivan Vasilev wrote:
Hi Guys,
We just implemented multi language support in our application. We
tested it with some files which content is copy/pasted from
Hi Guys,
We just implemented multi language support in our application. We tested
it with some files which content is copy/pasted from some Chinese sites
and everything seems to work correctly, but we need to test it more
thoroughly.
Any suggestions from were to get some testing resources and
job/Lucene-Nightly/javadoc/org/apache/lucene/document/NumberTools.html>
I'm curious if those utility methods solve the same problem you're
working on.
Erik
On Sep 13, 2007, at 1:19 PM, Ivan Vasilev wrote:
Hi All,
I have made some changes in my Lucene source, so that va
Hi All,
I have made some changes in my Lucene source, so that values of numeric
fields to be treated as numbers but not as Strings. After testing
everything seems to work correctly, but I still would like to know your
opinion about this.
So my approach is the following:
1. As during the inde
ite beefy to me - Intel core duo
with
500M given to the application.
Regards,
Artem
On 4/23/07, Ivan Vasilev <[EMAIL PROTECTED]> wrote:
Hi All,
THANK YOU FOR YOUR HELP :)
I put this problem in the forum but I had no chance to work on it last
week unfurtunately...
So now I tested the Artem
Hi All,
THANK YOU FOR YOUR HELP :)
I put this problem in the forum but I had no chance to work on it last
week unfurtunately...
So now I tested the Artem's patch but the results show:
1) speed is very slow compare with the usage without patch
2) There are not very big differences of memory usage
Hi All,
I have the following problem - we have OutOfMemoryException when
seraching on the indexes that are of size 20 - 40 GB and contain 10 - 15
million docs.
When we make searches we perform query that match all the results but we
DO NOT fetch all the results - we fetch 100 of them. We also
Hi All,
I have the following problem:
I have to implement range search for fields that contain numbers. For
example the field size that contains file size. The problem is that the
numbers are not kept in strings with strikt length. There are field
values like this: "32", "421", "1201". So when
ator) then the same
Notepad can not read it (unlike Wordpad or other programs) :). The
second in Bulgarian means “here is a big bug”.
Best Regards,
Ivan Vasilev
Erick Erickson wrote:
I know this has been discussed several times, but sure don't remember the
answers. Search the mail archive
Hi All,
Our application that uses Lucene for indexing will be used to index
documents that each of which contains parts written in different
languages. For example some document could contain English, Chinese and
Brazilian text. So how to index such document? Is there some best
practice to do
een 1 hour and 3 hours? or 1 day and two weeks? If you can get it
built
in a night, I'd do it the simple way.
How long did it take to create the index originally anyway?
Best
Erick
On 1/4/07, Ivan Vasilev <[EMAIL PROTECTED]> wrote:
Hi All,
I want to update some documents in e
Hi All,
I want to update some documents in existing indexes by adding a new field to
each of their documents. The documents contained in the indexes have some
fields that are indexed and NOT stored. The new field that will be added
will contain some metadata and will be Stored and not indexe
69 matches
Mail list logo