Best way for paging with TopDocs class?

2009-04-16 Thread Ivan Vasilev
Hi All, As Hits class was deprecated in current Lucene and is expected to be excluded from Lucene 3.0 we decided to change our code so that to use TopDocs class. Our app provides paging and now we are uondering what is the bset way to do it with th TopDocs. I can see only this possibility: 1.

Re: Best way for paging with TopDocs class?

2009-04-16 Thread Ivan Vasilev
OK Guys Thanks , Thanks for your help. I really think that paging without caching will be best for in case. I think in most cases users find results in the first page. When not, I think they would not not go through more than 2-3 more pages or just will narrow the search by adding more filter

Re: Best way for paging with TopDocs class?

2009-04-17 Thread Ivan Vasilev
: Why you don't extend to HitCollector and put all logic you need into it? Ivan Vasilev-2 wrote: Hi All, As Hits class was deprecated in current Lucene and is expected to be excluded from Lucene 3.0 we decided to change our code so that to use TopDocs class. Our app provides paging an

SpanQuery wildcards?

2009-04-23 Thread Ivan Vasilev
Hy Guys, Does anybody knows if there is way to use wild cards in SpanQuery? My idea is for example instead of query - content:"expansive computer"~10 - we to use query - content:"exp* comp*"~10. And the results of first query to be subset of those of second one. I tried with parsing the above w

Re: SpanQuery wildcards?

2009-04-24 Thread Ivan Vasilev
Thanks Guys for the answers! Steven, I tried with the ".*" instead of "*" but it did not worked as desired. The ".*" does not replace any symbol(s) in the query. I tested with different Analyzers. Depending on Analyzer it is omitted or ".*" are treated just as normal symbols. Mark, your clas

Faster way for faceting?

2009-08-24 Thread Ivan Vasilev
Hi All, We use faceting in our app but it is very slow for the indexes that use our clients. First I will say what I understand under faceting - this is for each term for certain field to obtain 1. number of docs that contain it, 2. the total number of occurrences of the term in the index. No

Re: Faster way for faceting?

2009-08-25 Thread Ivan Vasilev
mDocs termDocs = this.reader.termDocs(term); int count = 0; while(termDocs.next()){ count += termDocs.freq(); } simon On Mon, Aug 24, 2009 at 6:14 PM, Ivan Vasilev wrote: Hi All, We use faceting in our app but it is very slow for the indexes that use our clients. First I will say w

is Lucene 3.0 coming soon?

2009-10-16 Thread Ivan Vasilev
Hi Lucene Guys, I am interested what is your plan date for releasing Lucene 3.0. I am asking because seeing on the changes in Lucene 2.9 (especially changes in backward compatibility) I guess that it will be difficult for us to adopt our app to Lucene 2.9. I see in your Jira there are not many

Re: is Lucene 3.0 coming soon?

2009-10-16 Thread Ivan Vasilev
OK, thanks guys! Grant Ingersoll wrote: On Oct 16, 2009, at 6:05 AM, Uwe Schindler wrote: I would recommend to adopt your app to 2.9 and enable deprecation warnings. As soon as all deprecation warning disappear during compile, you are able to just go to 3.0 (just drop in jars when available)

Chinese test resources wanted

2007-10-16 Thread Ivan Vasilev
Hi Guys, We just implemented multi language support in our application. We tested it with some files which content is copy/pasted from some Chinese sites and everything seems to work correctly, but we need to test it more thoroughly. Any suggestions from were to get some testing resources and

Re: Chinese test resources wanted

2007-10-18 Thread Ivan Vasilev
Hi Guys, Do anyone who tests the Analyzers can give me some CJK test resources or advice me from where to obtain. Best Regards, Ivan Ivan Vasilev wrote: Hi Guys, We just implemented multi language support in our application. We tested it with some files which content is copy/pasted from

Is there bug in Range searches?

2007-10-21 Thread Ivan Vasilev
Hi Guys, There is something in the Lucene that disturbs me. My question is about sorting. In the queries there are used collator objects that sort the results (in the class FieldSortedHitQueue). But in the indexing process they are not used. As I now all the terms are ordered during the indexi

How to change Collators per field when querying?

2007-10-21 Thread Ivan Vasilev
Hi Guys, We have implemented per field setting of Analyzers, based on the language that is used for the responding field. Example: field FileName is in English, field Content in Chinese. This we do by creating our class that implements Analyzer and wraps two analyzers StandardAnalyzer and CJK

Re: Is there bug in Range searches?

2007-10-22 Thread Ivan Vasilev
10x Hoss for the answer. It is good news that this topic is very rare and clients do not complain about this. I hope our clients will also not complain :) Looking strictly at this I think this leads to a non correct behavior on indexing applications, but as there are no unsatisfied clients may

Is there bug in CJKAnalyzer?

2007-10-22 Thread Ivan Vasilev
Hi Guys, I have made tests with the CJKAnalyzer and the results show something that seems very strange to me. First I have to say that I do not understand non of the CJK languages. What I do is the following I write some text in English and translate it using an on-line tool, which give me the

Re: Is there bug in CJKAnalyzer?

2007-10-23 Thread Ivan Vasilev
overlapping bigrams: AB BC CD. > Thus issuing a query containing one chinese sign will not retrieve any > documents. To overcome this, you have to index chinese characters as single > tokens (this will increase recall, but decrease precision). > > Hope this will help, > Samir &g

Re: Is there bug in CJKAnalyzer?

2007-10-24 Thread Ivan Vasilev
Thanks once again :) Best Regards, Ivan Steven Rowe wrote: > Hi Ivan, > > Ivan Vasilev wrote: > >> But how to understand the meaning of this: “To overcome this, you >> have to index chinese characters as single tokens (this will increase >> recall, but decrease pre

When is expected Lucene 2.3 to be released?

2007-12-07 Thread Ivan Vasilev
Hi Lucene Guys, Can you say approximately when will be released Lucene 2.3? We have some costumizations in the source code of hte Lucene and we will have to transfer them in the 2.3 release, so it is important for us to know when approximately this will happen in order to make our plans. Tha

Question about Lucene 2.3. file formats?

2008-01-22 Thread Ivan Vasilev
Hi Lucene Guys, As I see in the Lucene web site in file formats page the version 2.3 will have some changes in file formats that are very important for us. First I will say what we do and then will ask my questions. We distribute the index on some machines. The implementation is made so that

Re: Question about Lucene 2.3. file formats?

2008-01-23 Thread Ivan Vasilev
able to use our tools for splitting index. The only thing that we will have to do is to add (-1) in position of DocStoreOffset in segments_N file. Thanks, Ivan Michael McCandless wrote: Ivan Vasilev wrote: Hi Lucene Guys, As I see in the Lucene web site in file formats page the version 2.3

Lucene File Formats web page

2008-02-01 Thread Ivan Vasilev
Hi Guys, In the File Formats web page (http://lucene.apache.org/java/2_3_0/fileformats.html) there is section describing Segments File, where we read: Segments --> Format, Version, NameCounter, ... ... Format is -1 as of Lucene 1.4 and -3 (SemgentInfos.FORMAT_SINGLE_NORM_FILE) as of Lucene 2

Re: feedback: Indexing speed improvement lucene 2.2->2.3.1

2008-03-21 Thread Ivan Vasilev
Hi Uwe, Could you tell what Analyzer do you use when you marked so big indexing speedup? If you use StandardAnalyzer (that uses StandardTokenizer) may be the reason is in it. You can see the pre last report in the thread "Indexing Speed: 2.3 vs 2.2 (real world numbers)". According to the repor

Re: AW: feedback: Indexing speed improvement lucene 2.2->2.3.1

2008-03-24 Thread Ivan Vasilev
ing in a doc the greatest bigramms clusters covering the phrase token. Best Regards Uwe -Ursprüngliche Nachricht----- Von: Ivan Vasilev [mailto:[EMAIL PROTECTED] Gesendet: Freitag, 21. März 2008 16:25 An: java-user@lucene.apache.org Betreff: Re: feedback: Indexing speed improvement lucene 2.

Integrating Spell Checker contributed to Lucene

2008-03-25 Thread Ivan Vasilev
Hi Guys, Has anybody integrated the Spell Checker contributed to Lucene. I need advise from where to get free dictionary file (one that contains all words in English) that could be used to create instance of PlainTextDictionary class. I currently use for my tests responding files from Jazzy a

Re: Integrating Spell Checker contributed to Lucene

2008-03-26 Thread Ivan Vasilev
ackages out of it. If possible could you give a link from where to get these sources as they are? Best Regards, Ivan Mathieu Lecarme wrote: Ivan Vasilev a écrit : Hi Guys, Has anybody integrated the Spell Checker contributed to Lucene. http://blog.garambrogne.net/index.php?post/2008/03/07/A

Re: Integrating Spell Checker contributed to Lucene

2008-03-26 Thread Ivan Vasilev
nk/src/java': 200 OK (https://admin.garambrogne.net) Mathieu Lecarme wrote: Ivan Vasilev a écrit : Thanks Mathieu for your help! The contribution that you have made to Lucene by this patch seems to be great, but the hunspell dictionary is under LGPL which the lawyer of our company does not

Re: Integrating Spell Checker contributed to Lucene

2008-03-26 Thread Ivan Vasilev
This is! Now I finally got it :) OK will use it only for test integration by now (if there will time for this :) ) and will expect the third patch. Have a nice time :) Ivan Mathieu Lecarme wrote: Ivan Vasilev a écrit : Thanks Mathieu, I tryed to checkout but without success. Anyway I can

Re: PhraseQuery little bug?

2008-04-03 Thread Ivan Vasilev
Of cours in our system I can use SpanNearQuery instead of PhraseQuery. My question is is there known performance differences between the two classes? Ivan Vasilev wrote: Hi Guys, I make the following test – I create 2 files. File1.txt with content: “apple 2 3 4 pear” And File2.txt with

PhraseQuery little bug?

2008-04-03 Thread Ivan Vasilev
Hi Guys, I make the following test – I create 2 files. File1.txt with content: “apple 2 3 4 pear” And File2.txt with content: “pear 2 3 4 apple” I made the following searching tests: 1. Using Luke Search tab. 1.1. When searching for: content:"pear apple"~3 Then the File1.txt is returned. 1.2.

Re: PhraseQuery little bug?

2008-04-04 Thread Ivan Vasilev
ng words forward and consithers it exclusive when counting backwards. Darren Govoni wrote: One interpretation of the query with ~5 is that your text has 5 words and ~5 would imply a word in any position can match. Could it be this? - Original Message - From: "Ivan Vasilev" <[EMA

Updating tag-indexes

2008-08-19 Thread Ivan Vasilev
Hi Lucene Guys, I have a question that is simple but is important for me. I did not found the answer in the javadoc so I am asking here. When adding Document-s by the method IndexWriter.addDocument(doc) does the documents obtain Lucene IDs in the order that they are added to the IndexWriter? I

Re: Updating tag-indexes

2008-08-19 Thread Ivan Vasilev
). Also, this behavior isn't "promised" in the API, ie it could in theory (though I think it unlikely) change in a future release of Lucene. And remember when a merge completes (or, optimize), any deleted docs will "collapse down" all docIDs after them. Mike Ivan Vasile

Sorting with ParallelReader

2008-09-26 Thread Ivan Vasilev
Hi Guys, Does anybody know if it is possible results to be sorted using the ParallelReader? Best Regards, Ivan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Sorting with ParallelReader

2008-09-26 Thread Ivan Vasilev
Regards, Ivan Ivan Vasilev wrote: Hi Guys, Does anybody know if it is possible results to be sorted using the ParallelReader? Best Regards, Ivan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL

Compressing field content with Lucene 3.0

2009-12-28 Thread Ivan Vasilev
Hi Guys, Could you give me advice how to deal with Lucene 3.0 with 2.4 indexes that contain compressed data. Our case is following - we have code like this: Field.Store fieldStored = storedFieldsSet.contains(fieldName) ? (fieldValue.length() >= COMPRESS_THRESHOLD ? Field.Store.COMPRESS : Fi

Re: Compressing field content with Lucene 3.0

2009-12-29 Thread Ivan Vasilev
Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Ivan Vasilev [mailto:ivasi...@sirma.bg] Sent: Monday, December 28, 2009 7:13 PM To: LUCENE MAIL LIST Subject: Compressing field content with Lucene 3.0 Hi Guys, Could you give me advice how to deal with Lucene

Re: Compressing field content with Lucene 3.0

2009-12-29 Thread Ivan Vasilev
i.de eMail: u...@thetaphi.de -Original Message----- From: Ivan Vasilev [mailto:ivasi...@sirma.bg] Sent: Tuesday, December 29, 2009 11:50 AM To: java-user@lucene.apache.org Subject: Re: Compressing field content with Lucene 3.0 10x Uwe for your answer, It is good news that data compr

Re: Compressing field content with Lucene 3.0

2009-12-29 Thread Ivan Vasilev
fields get automatically decompressed. But there is nothing to do from your side! Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Ivan Vasilev [mailto:ivasi...@sirma.bg] Sent: Tuesday, December 29, 2009

NumericField exact match

2010-02-26 Thread Ivan Vasilev
Hi Guys, Is it possible to make exact searches on fields that are of type NumericField and if yes how? In the LIA book part 2 I found only information about Range searches on such fields and how to Sort them. Example - I have field "size" that can take integers as values. I want to get docs t

Re: NumericField exact match

2010-02-26 Thread Ivan Vasilev
es to a non-scored TermQuery. If you already changed QueryParser, you can also override the method for exactMatches (newTermQuery). - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Ivan Vasilev [mailto:ivasi..

How to avoid sharing docStore files?

2010-05-12 Thread Ivan Vasilev
Hi Guys, Can anybody tell me how to avoid sharing of docStore files (term vectors & stored fields)? I mean to avoid creation of cfx files. This is important for us because we support some operations like splitting index, updating index fields (via running optimization that has some differenc

Re: How to avoid sharing docStore files?

2010-05-12 Thread Ivan Vasilev
much as it did before. Can you explain in more detail what you are doing w/ Lucene that requires the doc stores to not be shared? EG for splitting an index, there is the multi-pass index splitter (in contrib/misc). Mike On Wed, May 12, 2010 at 5:33 AM, Ivan Vasilev wrote: Hi Guys, Can

Re: How to avoid sharing docStore files?

2010-05-12 Thread Ivan Vasilev
That`s fine Andrzej :) doing split in just one pass really matters for big indexes. Hope we will use it in our application. Thanks, Ivan Andrzej Bialecki wrote: On 2010-05-12 14:29, Ivan Vasilev wrote: Hi Michael, Thanks for your answer. What we do now: 1. Splitting indexes. We do it not

detect Lucene version

2010-10-08 Thread Ivan Vasilev
Hi Guys, Is there way to detect org.apache.lucene.util.Version of an index having IndexReader or just FSDirectory? I know I can open segments file and read the proper bytes according to rules of creating it but is there more smart way to do this without using RandomAccessFile or something lik

Re: detect Lucene version

2010-10-08 Thread Ivan Vasilev
to-level index information (see above). Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message----- From: Ivan Vasilev [mailto:ivasi...@sirma.bg] Sent: Friday, October 08, 2010 8:35 AM To: LUCENE MAIL LIST Subject: detect

Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-19 Thread Ivan Vasilev
On 18.1.2011 г. 23:04, Grant Ingersoll wrote: [x] ASF Mirrors (linked in our release announcements or via the Lucene website) [] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.) [x] I/we build them from source via an SVN/Git checkout. [] Other (someone in your company mirrors th

Lucene index update

2007-01-04 Thread Ivan Vasilev
Hi All, I want to update some documents in existing indexes by adding a new field to each of their documents. The documents contained in the indexes have some fields that are indexed and NOT stored. The new field that will be added will contain some metadata and will be Stored and not indexe

Re: Lucene index update

2007-01-05 Thread Ivan Vasilev
een 1 hour and 3 hours? or 1 day and two weeks? If you can get it built in a night, I'd do it the simple way. How long did it take to create the index originally anyway? Best Erick On 1/4/07, Ivan Vasilev <[EMAIL PROTECTED]> wrote: Hi All, I want to update some documents in e

Multy Language documents indexing

2007-02-22 Thread Ivan Vasilev
Hi All, Our application that uses Lucene for indexing will be used to index documents that each of which contains parts written in different languages. For example some document could contain English, Chinese and Brazilian text. So how to index such document? Is there some best practice to do

Re: Multy Language documents indexing

2007-02-23 Thread Ivan Vasilev
ator) then the same Notepad can not read it (unlike Wordpad or other programs) :). The second in Bulgarian means “here is a big bug”. Best Regards, Ivan Vasilev Erick Erickson wrote: I know this has been discussed several times, but sure don't remember the answers. Search the mail archive

Range search in numeric fields

2007-04-03 Thread Ivan Vasilev
Hi All, I have the following problem: I have to implement range search for fields that contain numbers. For example the field size that contains file size. The problem is that the numbers are not kept in strings with strikt length. There are field values like this: "32", "421", "1201". So when

Out of memory exception for big indexes

2007-04-06 Thread Ivan Vasilev
Hi All, I have the following problem - we have OutOfMemoryException when seraching on the indexes that are of size 20 - 40 GB and contain 10 - 15 million docs. When we make searches we perform query that match all the results but we DO NOT fetch all the results - we fetch 100 of them. We also

Re: Out of memory exception for big indexes

2007-04-23 Thread Ivan Vasilev
Hi All, THANK YOU FOR YOUR HELP :) I put this problem in the forum but I had no chance to work on it last week unfurtunately... So now I tested the Artem's patch but the results show: 1) speed is very slow compare with the usage without patch 2) There are not very big differences of memory usage

Re: Out of memory exception for big indexes

2007-04-25 Thread Ivan Vasilev
ite beefy to me - Intel core duo with 500M given to the application. Regards, Artem On 4/23/07, Ivan Vasilev <[EMAIL PROTECTED]> wrote: Hi All, THANK YOU FOR YOUR HELP :) I put this problem in the forum but I had no chance to work on it last week unfurtunately... So now I tested the Artem&#x

Treating values of numeric fields as numbers

2007-09-13 Thread Ivan Vasilev
Hi All, I have made some changes in my Lucene source, so that values of numeric fields to be treated as numbers but not as Strings. After testing everything seems to work correctly, but I still would like to know your opinion about this. So my approach is the following: 1. As during the inde

Re: Treating values of numeric fields as numbers

2007-09-14 Thread Ivan Vasilev
job/Lucene-Nightly/javadoc/org/apache/lucene/document/NumberTools.html> I'm curious if those utility methods solve the same problem you're working on. Erik On Sep 13, 2007, at 1:19 PM, Ivan Vasilev wrote: Hi All, I have made some changes in my Lucene source, so that va

Bug fix to contrib/.../IndexSplitter

2011-06-09 Thread Ivan Vasilev
Hi Guys, I would like to fix a class in contrib/misc/src/java/org/apache/lucene/index called IndexSplitter. It has a bug - when splits the segments in separate index the segment descriptor file contains a wrong data - the number (the name) of next segment to generate is 0. Although it can not

Is there some class to iterate on document's term positions in Lucene 4.0.0?

2012-10-25 Thread Ivan Vasilev
available? Cheers, Ivan Vasilev - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Is there some class to iterate on document's term positions in Lucene 4.0.0?

2012-10-25 Thread Ivan Vasilev
the answer to your second question. -- Ian. On Thu, Oct 25, 2012 at 2:50 PM, Ivan Vasilev wrote: Hy Guys, In previous versions of Lucene there was a class TermPositions that could be obtained form IndexReader. Is there something that replaces it in Lucene 4.0.0? Also is there some documentation t

Exception on creation IndexWriterConfig with Version.LUCENE_40

2012-10-26 Thread Ivan Vasilev
perfield What other Lucene packages I need to include to avid the Exception? I prefer adding source code instead of jar(s). Cheers, Ivan Vasilev - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional command

Re: Exception on creation IndexWriterConfig with Version.LUCENE_40

2012-10-26 Thread Ivan Vasilev
Thanks Robert On 26.10.2012 г. 18:49, Robert Muir wrote: On Fri, Oct 26, 2012 at 11:47 AM, Ivan Vasilev wrote: if you want to not use jars, then its not enough to add the /src/java directories. you also need /src/resources directories in the classpath

Term Positions added to one document forward

2012-10-29 Thread Ivan Vasilev
incremented form 1 to 4, and after each incrementation is invoked payloadAttr.setPayload(..), but strangely when reading DocsAndPositionsEnumwe see those payloads (1 to 4) belong actually to doc #1. Do I make some mistake with invoking setPayload(..) method or it is a bug? Cheers, Ivan Vasil

Re: Term Positions added to one document forward

2012-10-30 Thread Ivan Vasilev
Thanks Simon! On 29.10.2012 г. 21:38, Simon Willnauer wrote: you should call currDocsAndPositions.nextPosition() before you call currDocsAndPositions.getPayload() payloads are per positions so you need to advance the pos first! simon On Mon, Oct 29, 2012 at 6:44 PM, Ivan Vasilev wrote: Hi

IntField vs (IntDocValuesField + StoredField)

2012-10-31 Thread Ivan Vasilev
Hy Guys, Is there some advantage in speed or index size to use this: IntDocValuesField fld = new IntDocValuesField("fldName", 1); StoredField fld = new StoredField("fldName", 1); instead of this: IntField fld = new IntField("fld", 1, Field.Store.YES); Searching, sorting and retrieving data fr

Re: IntField vs (IntDocValuesField + StoredField)

2012-10-31 Thread Ivan Vasilev
Thanks Mike. On 31.10.2012 г. 15:52, Michael McCandless wrote: The big advantage of IntField is you can do NumericRangeQuery/Filter on the field. Mike McCandless http://blog.mikemccandless.com On Wed, Oct 31, 2012 at 9:42 AM, Ivan Vasilev wrote: Hy Guys, Is there some advantage in speed

Re: IntField vs (IntDocValuesField + StoredField)

2012-10-31 Thread Ivan Vasilev
mericRangeQuery/Filter on the field. Mike McCandless http://blog.mikemccandless.com On Wed, Oct 31, 2012 at 9:42 AM, Ivan Vasilev wrote: Hy Guys, Is there some advantage in speed or index size to use this: IntDocValuesField fld = new IntDocValuesField("fldName", 1); StoredField

writer.tryDeleteDocument(..) does not delete document

2012-10-31 Thread Ivan Vasilev
Hy Guys, I use as suggested in question "Lucene 4.0 delete by ID" from 29.Oct - instead of reader.delete(docID) use - writer.tryDeleteDocument(..) method but for some reason it does not work. My code is: IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE

Suggestion for extending class DocumentStoredFieldVisitor

2012-11-01 Thread Ivan Vasilev
Hy Guys, I intend to extend DocumentStoredFieldVisitor class like this: class DocumentStoredNonRepeatableFieldVisitor extends DocumentStoredFieldVisitor { @Override public Status needsField(FieldInfo fieldInfo) throws IOException { return fieldsToAdd == null || fieldsToAdd

Re: Suggestion for extending class DocumentStoredFieldVisitor

2012-11-01 Thread Ivan Vasilev
On 01.11.2012 г. 15:09, Michael McCandless wrote: On Thu, Nov 1, 2012 at 6:11 AM, Ivan Vasilev wrote: Hy Guys, I intend to extend DocumentStoredFieldVisitor class like this: class DocumentStoredNonRepeatableFieldVisitor extends DocumentStoredFieldVisitor { @Override public