Re: Custom Query Implementation

2024-12-02 Thread Patrick Zhai
Hi, have you tried to encode the sparse vector yourself using the BinaryDocValueField? One way I can think of is to encode it as (size, index_array, value_array) per doc Intuitively I feel like this should be more efficient than one dimension per field if your dimension is high enough Patrick On

Re: How to find RAM/disk usage of each vector field

2024-11-05 Thread Patrick Zhai
I wouldn't call this a good way, but as the last resort you can parse the metadata files yourself, as it is not so hard to parse (yet), the logics are in: Lucene99HnswVectorsFormat.java Lucene99FlatVectorsFormat.java The risk for sure is that whenever the format is changed the parsing logic will ne

[ANNOUNCE] Apache Lucene 9.8.0 released

2023-09-28 Thread Patrick Zhai
The Lucene PMC is pleased to announce the release of Apache Lucene 9.8.0. Apache Lucene is a high-performance, full-featured search engine library written entirely in Java. It is a technology suitable for nearly any application that requires structured search, full-text search, faceting, nearest-n

Re: Question about index segment search order

2023-05-04 Thread Patrick Zhai
; Thanks, > Wei > > On Thu, May 4, 2023 at 3:33 AM Michael Sokolov wrote: > > > There is no meaning to the sequence. The segments are created > concurrently > > by many threads and the merge process will merge them without regards to > > any ordering. > > &g

Re: Question about index segment search order

2023-05-03 Thread Patrick Zhai
For that part I'm not entirely sure, if other folks know it please chime in :) On Wed, May 3, 2023 at 8:48 AM Wei wrote: > Thanks Patrick! In the default case when no LeafSorter is provided, are the > segments traversed in the order of creation time, i.e. the oldest segment > is

Re: Question about index segment search order

2023-05-02 Thread Patrick Zhai
<https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/DirectoryReader.java#L75> when you're opening the IndexReader Patrick On Tue, May 2, 2023 at 5:28 PM Wei wrote: > Hello, > > We have a index that has multiple segments generated with continuous >

Re: Question about searcherManager applyAllDeletes parameter and maybeRefresh method

2023-03-03 Thread Patrick Zhai
knows more can chime in, but in the unit test since you're just deleting one doc, it's quite possible that IndexWriter will apply the delete right away regardless of what you have passed in. Hope that helps Patrick On Thu, Mar 2, 2023 at 3:50 PM Ningshan Li wrote: > Hi Patrick,

Re: Question about searcherManager applyAllDeletes parameter and maybeRefresh method

2023-03-02 Thread Patrick Zhai
s://github.com/apache/lucene-solr/blob/branch_7_4/lucene/core/src/java/org/apache/lucene/index/StandardDirectoryReader.java#L288> ) So basically the applyAllDeletes you passed into SearcherManager will affect every call to the maybeRefresh. Best Patrick On Thu, Mar 2, 2023 at 3:03 PM Ningshan

Re: Is there a way to customize segment names?

2022-12-30 Thread Patrick Zhai
; No, you can't control them. And we must not open up anything to try to > support this. > > On Fri, Dec 16, 2022 at 7:28 PM Patrick Zhai wrote: > > > > Hi Mike, Robert > > > > Thanks for replying, the system is almost like what Mike has described: > one wri

Re: Is there a way to customize segment names?

2022-12-16 Thread Patrick Zhai
at playing with filenames can become quite troublesome, but still, even out of my own curiosity, I want to understand whether we're able to control the segment names in some way? Best Patrick On Fri, Dec 16, 2022 at 6:36 AM Michael Sokolov wrote: > +1 trying to coordinate multiple writer

Re: Is there a way to customize segment names?

2022-12-15 Thread Patrick Zhai
replicator/nrt module has not provided a solution on when the primary node (main indexer) is down, how would we recover with a back up indexer? Thanks Patrick On Thu, Dec 15, 2022 at 7:16 PM Robert Muir wrote: > This multiple-writer isn't going to work and customizing names won't >

Is there a way to customize segment names?

2022-12-15 Thread Patrick Zhai
<https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/StandardDirectoryReader.java#L218>, and seems all we can do right now is to reload the whole index and that could be potentially a high cost. Sorry for the long email and thank you in advance for any replies! Best Patrick

Re: Multi-Value query test

2022-06-23 Thread Patrick Bernardina
Let me clarify: Example query: "(author:Patrick author:Michael) && type:pdf" Example result: 2 items: Doc1 with authors "Patrick, Adalberto" and Doc2 with authors "Patrick, Michael, Elias" I want to show the 2 items, but when I show the authors, I only wa

Multi-Value Query Test

2022-06-23 Thread Patrick Bernardina
How to test if a value in a multi-value field matches a specific query? Example of the problem: I've created a query to return all documents of some specific authors. The authors field contains multi-value sorted set. When showing the result, I want to show only the name of the authors specified

Multi-Value query test

2022-06-23 Thread Patrick Bernardina
How to test if a value in a multi-value field matches a specific query? Example of the problem: I've created a query to return all documents of some specific authors. The authors field contains multi-value sorted set. When showing the result, I want to show only the name of the authors specified

Re: ContainingIntervalsSource alternative

2021-06-02 Thread Patrick Zhai
Hi Elbek, Maybe go with ContainedByIntervalsSource? ContainingIntervalsSource is actually the big source filtered by small source, and ContainedByIntervalsSource is the opposite so it should give the expect behavior? Best Patrick elbek kamoliddinov 于2021年6月2日周三 下午2:55写道: > Hello every

Re: Multiple merge-runs from same set of segments

2021-05-27 Thread Patrick Zhai
Sorry for the delayed response, as for caching termDict data across threads, I do not aware of any existing lucene mechanism could do that (and it might be tricky since it is across threads), but maybe worth trying to see whether we can get some extra speed based on that! Patrick Ravikumar

Re: Multiple merge-runs from same set of segments

2021-05-24 Thread Patrick Zhai
log.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html Best Patrick Ravikumar Govindarajan 于2021年5月24日周一 上午9:54写道: > Thanks Michael! > > This was just what I was looking for!!. Just a couple of questions. > > >- When we call addIndexes(IndexReader...),

RE: [EXTERNAL] Re: Multivalued DocValuesField

2016-10-31 Thread Fielder, Todd Patrick
I don't believe I have it working, but have made progress I believe the issue was I was using a SortField() instead of a SortedNumericSortField() -Todd -Original Message- From: Fielder, Todd Patrick [mailto:tpfi...@sandia.gov] Sent: Monday, October 31, 2016 1:46 PM To: java

RE: [EXTERNAL] Re: Multivalued DocValuesField

2016-10-31 Thread Fielder, Todd Patrick
if that’s in Lucene 5.0, though, you may need to upgrade to something more recent. Alan Woodward www.flax.co.uk > On 31 Oct 2016, at 15:34, Fielder, Todd Patrick wrote: > > Hello, > > I have a question about Multivalued DocValuesFields...I am using > Lucene 5.0 > >

RE: [EXTERNAL] Re: Multivalued DocValuesField

2016-10-31 Thread Fielder, Todd Patrick
use a SortedNumericDocValuesField, which allows for multiple numeric values to be stored per-document. I’m not sure if that’s in Lucene 5.0, though, you may need to upgrade to something more recent. Alan Woodward www.flax.co.uk > On 31 Oct 2016, at 15:34, Fielder, Todd Patrick wrote: > > Hello, > &g

Multivalued DocValuesField

2016-10-31 Thread Fielder, Todd Patrick
Hello, I have a question about Multivalued DocValuesFields...I am using Lucene 5.0 I am indexing an object that contains an Array of Sub-objects. Those sub-objects have a Long value that I need to index with fieldStore=true. That works just fine. I also want to sort that field and so I am att

range query highlighting

2015-12-23 Thread Fielder, Todd Patrick
I have a NumericRangeQuery and a TermQuery that I am combining into a Boolean query. I would then like to pass the Boolean query to the highlighter to highlight both the range and term hits. Currently, only the terms are being highlighted. Any help on how to get the range values to highlight

RE: [EXTERNAL] Re: ignore a match in a query

2015-07-23 Thread Fielder, Todd Patrick
e a match in a query Maybe you can use the phrase search like: NOT "\"Record type\"" On 7/23/15, 12:53 PM, "Fielder, Todd Patrick" wrote: >Hi, >I'm wondering if there is a way to ignore a match in a query? For >example, I have two strings &g

ignore a match in a query

2015-07-23 Thread Fielder, Todd Patrick
Hi, I'm wondering if there is a way to ignore a match in a query? For example, I have two strings 1) "Record type: record" 2) "Record type: cd" I do not want the text "record type" to match, so searching for the text "record" should return string 1 and not string 2. I can't say "NO

multi valued facets

2015-06-04 Thread Fielder, Todd Patrick
I am trying to add a facet for which each document can have multiple values, but am receiving the following exception: dimension "Role Name" is not multiValued, but it appears more than once in this document How do I create a MultiValued Facet? Thanks in advance

lucene hanging when calling writer.deleteDocuments

2015-05-11 Thread Fielder, Todd Patrick
Hello, I have a call to writer.deleteDocuments(term); that hangs if the document is not in the index. It works fine if the document is in the index. Is this the expected behavior? If so, is there a better method to call if I don't know if the term is in the index? Thanks -Todd

IndexFormatTooOldException

2015-05-04 Thread Patrick Herber
search stops to work and I get this IndexFormatTooOldException. Do you have an idea what could be the cause of this problem? Thanks for your help! Patrick

highlighter/fragmenter question

2015-04-30 Thread Fielder, Todd Patrick
Hello, I'm not sure if this is the correct approach, so please let me know if there is a better way to accomplish the following task I am attempting to search an entire database for a keyword. To do this, I indexed all the data fields into a single "content" field with a delimiter between each

RE: [EXTERNAL] Re: general question

2015-04-02 Thread Fielder, Todd Patrick
their >> positions aren't necessarily reliable. >> >> Should we be suggesting an different approach to Todd's question? >> >> --Terry >> >> >> On Mon, Mar 30, 2015 at 6:08 PM, Fielder, Todd Patrick >> >> wrote: >> >&g

RE: [EXTERNAL] Re: general question

2015-03-30 Thread Fielder, Todd Patrick
ich fields those hits had matched. Mike McCandless http://blog.mikemccandless.com On Mon, Mar 30, 2015 at 1:07 PM, Fielder, Todd Patrick wrote: > Hello, > > I'm new to Lucene and am looking for advice. I'm wanting to search the > entire DB (or almost the entire DB) for

general question

2015-03-30 Thread Fielder, Todd Patrick
Hello, I'm new to Lucene and am looking for advice. I'm wanting to search the entire DB (or almost the entire DB) for a keyword. The users also want to know which field the string occurred in. I can think of two ways to do this, but neither are ideal and I'm looking for suggestions: 1)

lucene eclipseLink integration

2015-03-05 Thread Fielder, Todd Patrick
Hello, I'm using eclipseLink 2.3 and am wondering if there are any libraries that integrate eclipseLink and lucene to provide automatic index updates similar to HibernateSearch? Which versions of Lucene do they support? Thanks -Todd

RE: How to configure lucene 4.x to read 3.x index files

2014-09-23 Thread Patrick Mi
Hi Robert/Uwe, I have tried v4.8 and v4.9 - not working either. V4.7.0, V4.7.1, v4.7.2 are good. Regards, Patrick -Original Message- From: Patrick Mi [mailto:patrick...@touchpoint.co.nz] Sent: Wednesday, 24 September 2014 12:24 p.m. To: 'java-user@lucene.apache.org' Subject:

RE: How to configure lucene 4.x to read 3.x index files

2014-09-23 Thread Patrick Mi
not v3. Also I have tried an earlier version v4.7 as Uwe suggested and V4.7 version works on the V3 index that V4.10 failed to open. Regards, Patrick -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Tuesday, 23 September 2014 11:52 p.m. To: java-user Subject: Re: How

How to configure lucene 4.x to read 3.x index files

2014-09-22 Thread Patrick Mi
could point out the right direction. Regards, Patrick - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Are Okapi BM25 scores normalized into 0 and 1 ?

2011-04-29 Thread Patrick Diviacco
Can anybody provide me some information about it ? Even a small clue, I'm kinda stuck on this and the owner of the libraries do not answer emails. Thanks On 28 April 2011 13:49, Patrick Diviacco wrote: > Is Okapi BM25 (its implementation in Lucene: > nlp.uned.es/~jperezi/

Are Okapi BM25 scores normalized into 0 and 1 ?

2011-04-28 Thread Patrick Diviacco
Is Okapi BM25 (its implementation in Lucene: nlp.uned.es/~jperezi/Lucene-BM25) returning back normalized query scores (in between 0 and 1) ? According to Okapi formula the final score should be normalized. Could you give some information about that ? thanks

Re: termFreqVector is always null ?

2011-04-21 Thread Patrick Diviacco
Nevermind, I've solved by indexing the fields with with Field.TermVector.YES doc.add(new Field("tags", "foo bar", Store.NO, Index.ANALYZED, Field.TermVector.YES)); On 21 April 2011 10:57, Patrick Diviacco wrote: > Hi, > > for any document, the te

termFreqVector is always null ?

2011-04-21 Thread Patrick Diviacco
Hi, for any document, the termFreqVector is always null. I'm sure the documents are in the collection and the field exist. So where is the problem ? for (int i = 0; i < reader.numDocs(); i++){ TermFreqVector tfv = reader.getTermFreqVector(i, "tags"); thanks

Re: Lucene: Indexsearcher: java.lang.UnsupportedOperationException

2011-04-19 Thread Patrick Diviacco
ack-trace? > Also, the query.toString() > > -- > Anshum Gupta > http://ai-cafe.blogspot.com > > > On Tue, Apr 19, 2011 at 7:40 PM, Patrick Diviacco < > patrick.divia...@gmail.com> wrote: > > > I get the following error message: > java.lang.UnsupportedOper

Lucene: Indexsearcher: java.lang.UnsupportedOperationException

2011-04-19 Thread Patrick Diviacco
I get the following error message: java.lang.UnsupportedOperationException with Lucene search method: topDocs = searcher.search(booleanQuery, null, 100); I'm using an old version of Lucene: Lucene 2.4.1 (I cannot upgrade!) Can you help me to understand why I get such error ? thanks This is the c

Re: java.lang.IncompatibleClassChangeError with BM25BooleanQuery

2011-04-19 Thread Patrick Diviacco
I've also tried to use older Lucene versions such as: Lucene 3.1 and Lucene 2.9.4 with no luck. Thanks On 19 April 2011 14:48, Patrick Diviacco wrote: > Hi, I get this error: > > Exception in thread "main" java.lang.IncompatibleClassChangeError: &

java.lang.IncompatibleClassChangeError with BM25BooleanQuery

2011-04-19 Thread Patrick Diviacco
Hi, I get this error: Exception in thread "main" java.lang.IncompatibleClassChangeError: Implementing class at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClassCond(ClassLoader.java:632) at java.lang.ClassLoader.defineClass(ClassLoader.java:616) at java.securit

Question about passing tags to BM25BooleanQuery

2011-04-19 Thread Patrick Diviacco
I'm using BM25 Okapi Query form here: http://nlp.uned.es/~jperezi/Lucene-BM25/ I've a quick question. I've a list of tags: "tag1 tag2 tag3" and I'm currently passing them to the query in this way: BM25BooleanQuery okapiQuery = new BM25BooleanQuery("tag1 tag2 tag3", "tags", new WhitespaceAnalyzer(

Indexing 1 doc only ?

2011-04-08 Thread Patrick Diviacco
Is there a way to update only 1 doc in the index rather than index the entire collection everytime there is a change ? Given a specific field (I use as ID) of my indexed doc, how can I select it and update its other fields ? thanks

Re: indexing data without writing to disk ?

2011-04-04 Thread Patrick Diviacco
Ok, I've now seen RAMDirectory class instead and I'm using it together what the IndexWriter... it should be ok now thanks On 4 April 2011 13:10, Patrick Diviacco wrote: > ok Thanks, > > When I use IndexWriter, I call addDocument method to add a new instance to > the index.

how to delete a RAMDirectory from memory

2011-04-04 Thread Patrick Diviacco
Since I need to overwrite an old ramDirectory file and I don't want memory leaks, I have the following code lines to close first the existing RAMDirectory and create a new one. INDEX_DIR.close(); INDEX_DIR = new RAMDirectory(); However, I get the following exception. Should I remove close() line

Re: indexing data without writing to disk ?

2011-04-04 Thread Patrick Diviacco
RAMDirectory. The clue is in the name ... > > > > > > -- > > Ian. > > > > > > On Fri, Apr 1, 2011 at 11:08 AM, Patrick Diviacco > > wrote: > > > Is there a way to index data into memory without writing to disk in > > Lucene

indexing data without writing to disk ?

2011-04-01 Thread Patrick Diviacco
Is there a way to index data into memory without writing to disk in Lucene ? This is my current code storing it on disk writer = new IndexWriter(FSDirectory.open(index_dir), new IndexWriterConfig(org.apache.lucene.util.Version.LUCENE_40, new WhitespaceAnalyzer(org.apache.lucene.util.Version.LUCEN

Re: Filter to retrieve random documents without specific terms ?

2011-03-29 Thread Patrick Diviacco
t; > >> Plan B. > >> > >> Reverse your MUST NOT search to get a list of docids that you don't > >> want, then loop round Random.nextInt(indexreader.numDocs()), selecting > >> those that are not deleted (!indexreader.isDeleted(docid)) and are not >

Re: Filter to retrieve random documents without specific terms ?

2011-03-29 Thread Patrick Diviacco
probably better. > > > -- > Ian. > > > On Tue, Mar 29, 2011 at 8:00 PM, Patrick Diviacco > wrote: > > Ok I've solved the first part of the problem. I'm now selecting all > > documents that do not contain a given term with a BooleanFilter > >

Re: Filter to retrieve random documents without specific terms ?

2011-03-29 Thread Patrick Diviacco
2011 20:40, Patrick Diviacco wrote: > Is there a Filter to get a limited number of random collection docs from > the index which DO NOT contain a specific term ? > > i.e. term="pizza" > > I want to run the query against 10 random documents of the collection that > do not contain the term "pizza". > > thanks >

Filter to retrieve random documents without specific terms ?

2011-03-29 Thread Patrick Diviacco
Is there a Filter to get a limited number of random collection docs from the index which DO NOT contain a specific term ? i.e. term="pizza" I want to run the query against 10 random documents of the collection that do not contain the term "pizza". thanks

Re: cannot find org.apache.lucene.search.TermsFilter

2011-03-29 Thread Patrick Diviacco
Nevermind, I've compiled it using ant. solved thanks On 29 March 2011 17:41, Patrick Diviacco wrote: > Ok, the svn repository I can only find the source files. Should I build the > jar by myself or is there a packaged jar to download ? > > thanks > > > On 29 March

Re: cannot find org.apache.lucene.search.TermsFilter

2011-03-29 Thread Patrick Diviacco
think it is contrib-queries, so should be lucene-queries.jar). > > Uwe > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -Original Message- > > From: Patrick Diviacco [mailto:patrick.di

Re: cannot find org.apache.lucene.search.TermsFilter

2011-03-29 Thread Patrick Diviacco
packaged > the nightly build and aren't inadvertently getting older jars? > > Best > Erick > > On Tue, Mar 29, 2011 at 7:21 AM, Patrick Diviacco > wrote: > > I've downloaded the nightly build of Lucene (TRUNK) and I'm referring to > the > > following

cannot find org.apache.lucene.search.TermsFilter

2011-03-29 Thread Patrick Diviacco
I've downloaded the nightly build of Lucene (TRUNK) and I'm referring to the following documentation: https://hudson.apache.org/hudson/view/G-L/view/Lucene/job/Lucene-trunk/javadoc/all/index.html But I get: cannot find symbol symbol : class TermsFilter location: package org.apache.lucene.search

Can I run a query against few specific docs of the collection only ?

2011-03-29 Thread Patrick Diviacco
hi, Can I run a query against few specific docs of the collection only ? Can I filter the built collection according to documents fields content ? For example I would like to query over documents having field2 = "abc". thanks

Re: should I import the XML file into a mysql dataset ?

2011-03-29 Thread Patrick Diviacco
> But do the figuring out first - there is little point in speeding up > the bit that is already quick. > > > -- > Ian. > > > On Tue, Mar 29, 2011 at 10:22 AM, Patrick Diviacco > wrote: > > hi, > > > > I performing multiple queries (stored in a 100MB X

Re: should I import the XML file into a mysql dataset ?

2011-03-29 Thread Patrick Diviacco
My machine is Intel Dual Duo Core with 4GB ram.. is there something wrong here ? On 29 March 2011 11:22, Patrick Diviacco wrote: > hi, > > I performing multiple queries (stored in a 100MB XML file) against a > collection (indexed with lucene, and it was stored before in a 100M

should I import the XML file into a mysql dataset ?

2011-03-29 Thread Patrick Diviacco
hi, I performing multiple queries (stored in a 100MB XML file) against a collection (indexed with lucene, and it was stored before in a 100MB XML file). The process seems pretty long on my machine (more than 2 hours), so I was wondering if importing the 100MB queries XML file into a mysql dataset

Re: comparing lucene scores across queries

2011-03-29 Thread Patrick Diviacco
hey Uwe, so from your last answer, I understand I'm done.. no need to do anything, I can already compare the queries. However there is actually a misunderstanding: my booleanqueries have variable number of boolean clauses because the fields are fixed but the terms per field are not. So, for exampl

Re: comparing lucene scores across queries

2011-03-29 Thread Patrick Diviacco
normalizes all queries. But you are saying I should manually normalize them somehow ? It is not clear thanks Patrick > querynorm hsouldn't be a problem (since your booleanqueries all have hte > same structure, and odn't use query boosts ... i assume) but field norm > might

Re: comparing lucene scores across queries

2011-03-28 Thread Patrick Diviacco
n change the Similarity to only have the cosine > similarity left over - if you only want to use that one. > > Uwe > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -Original Mess

Re: comparing lucene scores across queries

2011-03-28 Thread Patrick Diviacco
> > Uwe > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -Original Message- > > From: Patrick Diviacco [mailto:patrick.divia...@gmail.com] > > Sent: Monday, March 28, 2

Re: comparing lucene scores across queries

2011-03-28 Thread Patrick Diviacco
On 28 March 2011 10:11, Uwe Schindler wrote: > >> Hi Patrick, >> >> You can disable the coord factor in the constructor of BooleanQuery. >> >> Uwe >> >> - >> Uwe Schindler >> H.-H.-Meier-Allee 63, D-28213 Bremen >> http://

Re: comparing lucene scores across queries

2011-03-28 Thread Patrick Diviacco
Cool, so just to be sure, if I disable the coord factor I can finally compare my BooleanQuery results ? On 28 March 2011 10:11, Uwe Schindler wrote: > Hi Patrick, > > You can disable the coord factor in the constructor of BooleanQuery. > > Uwe > > - > Uwe Schindle

Re: comparing lucene scores across queries

2011-03-28 Thread Patrick Diviacco
scoring: > > http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/search/Simila > rity.html > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -Original Message- > > Fro

comparing lucene scores across queries

2011-03-28 Thread Patrick Diviacco
Hi, sorry I've already asked few days ago, but I got no reply and I really need some help on this.. I'm running several queries against a doc collection. The queries are documents of the collection itself, I need to measure how similar is each document to the rest of the collection. Now, Lucene

Re: file formats: MacRoman and UTF-8...

2011-03-28 Thread Patrick Diviacco
uot;)); > > Unfortunately, you cannot give a charset to FileWriter itself. > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -Original Message- > > From: Patrick Diviacco [ma

Re: file formats: MacRoman and UTF-8...

2011-03-28 Thread Patrick Diviacco
hich java app are you using? > > paul > > > Le 28 mars 2011 à 09:03, Patrick Diviacco a écrit : > > > When I run my Lucene app and a parse a xml file I get the following error > > due to some fonts such as "é" written in the text file. > > > > If I s

file formats: MacRoman and UTF-8...

2011-03-28 Thread Patrick Diviacco
When I run my Lucene app and a parse a xml file I get the following error due to some fonts such as "é" written in the text file. If I save the text file as UTF-8 with my text editor I don't have this issue, but when I create it with a java app, it is saved as MacRoman. How can I specify a differ

Re: get the cosine similarity measure as output results ?

2011-03-26 Thread Patrick Diviacco
from the collection: I compare 1 doc from the Collection against all other docs. I need some more info about this... thanks On 26 March 2011 15:57, Patrick Diviacco wrote: > I'm performing several queries and I get scores per each document which I > have been told being not compara

get the cosine similarity measure as output results ?

2011-03-26 Thread Patrick Diviacco
I'm performing several queries and I get scores per each document which I have been told being not comparable across queries. For example, if I get score: 8.234234 for a specific document from a query A, I cannot compare such score with the document score: 3.342432 of the query B. However I need

Re: how to get all documents in the results ?

2011-03-23 Thread Patrick Diviacco
#x27;t be 0. > -- > Anshum Gupta > http://ai-cafe.blogspot.com > > > On Wed, Mar 23, 2011 at 1:38 PM, Patrick Diviacco < > patrick.divia...@gmail.com> wrote: > > > yeah it is clear. However I don't just want all documents, I still want > to > > per

Re: how to get all documents in the results ?

2011-03-23 Thread Patrick Diviacco
736393 84079911 All-text score:0.0018638512 Time:0.9796918 Harvesine:-5593.9307 false On 23 March 2011 09:01, Anshum wrote: > Hi Patrick, > You *don't* need to add a MatchAllDocs query to anything. If you just want > all docs, just pass it to the searcher.search functio

Re: Lucene, Luke: unknown format version: -12

2011-03-23 Thread Patrick Diviacco
ot supported by the used Luke version. > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -----Original Message- > > From: Patrick Diviacco [mailto:patrick.divia...@gmail.com] > > Sen

Re: how to get all documents in the results ?

2011-03-23 Thread Patrick Diviacco
419924 All-text score:0.0018638512 I was expecting the score to be 0, instead. thanks On 23 March 2011 08:44, Patrick Diviacco wrote: > The issue with > > > My confusion about MatchAllDocsQuery is that I cannot specify which terms > in which fields to search with it. I'm p

Re: how to get all documents in the results ?

2011-03-23 Thread Patrick Diviacco
ou may have a completely different option that you > haven't read which someone could advice if they know the exact intent. > > Hope this helps. > > -- > Anshum Gupta > http://ai-cafe.blogspot.com > > > On Tue, Mar 22, 2011 at 4:59 PM, Patrick Diviacco < > patr

Re: Am I correctly parsing the strings ? Terms or Phrases ?

2011-03-23 Thread Patrick Diviacco
s:f14 tags:usm tags:canonef50mmf14 tags:canonef50mmf14usm I can see the tags field repeated multiple times, so it seems to me correctly parsed... correct ? On 23 March 2011 07:50, Patrick Diviacco wrote: > Your answer is quite clear, but my question is a bit more specific: > as you s

Re: Am I correctly parsing the strings ? Terms or Phrases ?

2011-03-22 Thread Patrick Diviacco
d is probably perferred. > > Best > Erick > > On Tue, Mar 22, 2011 at 3:41 AM, Patrick Diviacco > wrote: > > OK, so I'm currently doing this: > > > > booleanQuery.add(new > QueryParser(org.apache.lucene.util.Version.LUCENE_40, > > "tags"

Re: Results: get per field scores ?

2011-03-22 Thread Patrick Diviacco
t > you only cared about this for debugging. What is the use-case > for having it on all the time? > > Best > Erick > > On Tue, Mar 22, 2011 at 12:40 PM, Patrick Diviacco > wrote: > > I've been told search explain should be used for debugging only becau

Re: Results: get per field scores ?

2011-03-22 Thread Patrick Diviacco
I've been told search explain should be used for debugging only because it slows down a lot computations. Is it true ? On 22 March 2011 14:29, Erick Erickson wrote: > Try Searcher.explain. > > Best > Erick > > On Tue, Mar 22, 2011 at 4:34 AM, Patrick Diviacco > wr

Re: Building a query of single terms...

2011-03-22 Thread Patrick Diviacco
the queries after they're assembled. I believe you'll > find that the difference is that the PhraseQuery would find text like > "Term1 Term2 Term3" but not text like "Term1 some stuff Term2 more > stuff Term3" whereas BooleanQuery would. > > Best > Eri

Re: how to get all documents in the results ?

2011-03-22 Thread Patrick Diviacco
ll' documents or only docs matching your query? > 2. if its about fetching all docs, why not use the matchalldocs query? > 3. did you try using a collector instead of topdocs? > > -- > Anshum Gupta > http://ai-cafe.blogspot.com > > > On Tue, Mar 22, 2011 at 4:46

Re: how to get all documents in the results ?

2011-03-22 Thread Patrick Diviacco
I don't think the link you suggested can help, but maybe I'm wrong. Also, the parameter MAX_HITS is not useful, it just limit the results, it doesn't add the not relevant docs. On 22 March 2011 12:10, Anshum wrote: > Hi Patrick, > You may have a look at this, perhaps th

Results: get per field scores ?

2011-03-22 Thread Patrick Diviacco
Is there a way to display Lucene scores per field instead of the global one ? Both my query and my docs have 3 fields. I would like to see the scores for each field in the results. Can I ? Or should I run the query 3 times for each single field ? thanks

how to get all documents in the results ?

2011-03-22 Thread Patrick Diviacco
I'm using the following code because I want to see the entire collection in my query results: //adding wildcards-term to see all results rest = new TermQuery(new Term("*","*")); booleanQuery.add(rest, BooleanClause.Occur.SHOULD); But it doesn't work, I only see the relevant docs and not all the o

Re: Am I correctly parsing the strings ? Terms or Phrases ?

2011-03-22 Thread Patrick Diviacco
OK, so I'm currently doing this: booleanQuery.add(new QueryParser(org.apache.lucene.util.Version.LUCENE_40, "tags", new WhitespaceAnalyzer(org.apache.lucene.util.Version.LUCENE_40)).parse(phrase[i]); , BooleanClause.Occur.SHOULD); I just want to add single terms to my booleanQuery. if I pass a q

Building a query of single terms...

2011-03-21 Thread Patrick Diviacco
I'm new to Lucene and I would like to know what's the difference (if there is any) between PhraseQuery.add(Term1) PhraseQuery.add(Term2) PhraseQuery.add(Term3) and term1 = new TermQuery(new Term(...)); booleanQuery.add(term1, BooleanClause.Occur.SHOULD); term2 = new TermQuery(new Term(...)); bo

Re: Am I correctly parsing the strings ? Terms or Phrases ?

2011-03-21 Thread Patrick Diviacco
One more thing: It is actually not clear to me how to use PhraseQuery... I thought I can just pass a phrase to it, but I see only add(Term) method... should I parse the string by myself to single terms ? On 21 March 2011 18:05, Patrick Diviacco wrote: > >> If description field is

Re: Am I correctly parsing the strings ? Terms or Phrases ?

2011-03-21 Thread Patrick Diviacco
> > > If description field is tokenized/analyzed during indexing you need to use > PhraseQuery. > Uhm yeah I'm using a WhitespaceAnalyzer. This is the code using for indexing: writer = new IndexWriter(FSDirectory.open(INDEX_DIR), new IndexWriterConfig(org.apache.lucene.util.Version.LUCENE_40, new

How to normalize Lucene scores... (over all queries)

2011-03-21 Thread Patrick Diviacco
I'm combining several scores for my queries performed with Lucene and other software. My issue is that I have lucene scores + other scores (not related to Lucene) for each query result. The other scores are all normalized between 1 and 0. I need to normalize Lucene scores (over all queries) beca

Am I correctly parsing the strings ? Terms or Phrases ?

2011-03-21 Thread Patrick Diviacco
I'm new to Lucene. If I use description = new TermQuery(new Term("description", "my string")); I ask Lucene to consider "my string" as unique word, right ? I actually need to consider each word, should I use PhraseQuery instead ? Or is it correct ? thanks

Re: Lucene 4.0 and WhitespaceAnalyzer

2011-03-13 Thread Patrick Diviacco
:07, Simon Willnauer wrote: > Why do you want to replace the WhitespaceAnalyzer? I don't really > understand what you are up to. > > simon > > On Fri, Mar 4, 2011 at 3:21 PM, Patrick Diviacco > wrote: > > What's the best way to replace WhitespaceAnalyzer in this li

Re: Lucene nightly build: similarity score per field

2011-03-05 Thread Patrick Diviacco
Nevermind, I've finally solved. I just now need to figure out how to retrieve the scores per fields in my results. I need to know how much similar each field is. I know I can use explain() but it slows down computations... thanks On 4 March 2011 21:21, Patrick Diviacco wrote: > ok tha

Re: Lucene nightly build: similarity score per field

2011-03-04 Thread Patrick Diviacco
20:39, Robert Muir wrote: > On Fri, Mar 4, 2011 at 2:12 PM, Patrick Diviacco > wrote: > > hey Robert, > > > > I know there is the documentation, I'm sorry I've confused setSimilarity > > with setSimilarityProvider. > > > > However, my questio

Re: Lucene nightly build: similarity score per field

2011-03-04 Thread Patrick Diviacco
ass implementing the SimilarityProvider and then implement the get method ? Also, inside the get method should I check the passed string field and return different custom similarities classes ? thanks Patrick On 4 March 2011 19:57, Robert Muir wrote: > On Fri, Mar 4, 2011 at 1:18 PM, Patrick Di

Re: Lucene nightly build: similarity score per field

2011-03-04 Thread Patrick Diviacco
) thanks On 3 March 2011 16:34, Robert Muir wrote: > On Thu, Mar 3, 2011 at 10:25 AM, Patrick Diviacco > wrote: > > I've downloaded Lucene nightly build because I need to customize the > > similarity *per field*. > > > > However I don't see the

  1   2   3   >