Error Doc id doesn't match the query in vector searches

2024-10-17 Thread Moll, Dr. Andreas
Hi, we are currently testing Solr 9.7 and experiencing an error we have not seen before with SolR 9.6.1 and we think the problem might occur in the underlying lucene code basis: ERROR o.a.s.h.RequestHandlerBase Server exception => at org.apache.lucene.search.TopFieldCollector.populateScores(To

hnsw parameters for vector search

2024-01-30 Thread Moll, Dr. Andreas
Hi, the hnsw documentation for the Lucene HnswGraph and the SolR vector search is not very verbose, especially in regards to the parameters hnswMaxConn and hnswBeamWidth. I find it hard to come up with sensible values for these parameters by reading the paper from 2018. Does anyone have experie

Re:How to limit SimpleCollector at N documents?

2017-08-17 Thread dr
i used to do the same thing. My way is also throwing exception to jump out. What does "then the search moves on to the next leaf" mean ? 在 2017-08-18 03:46:02,"Tod Olson" 写道: Hi everyone, I'm modifying an existing application, which uses a Lucene SimpleCollector to return document ids and some

Re:Re: Some questions about StandardTokenizer and UNICODE Regular Expressions

2016-06-16 Thread dr
Thank you so much, Steve. Your reply is very helpful. At 2016-06-16 23:01:18, "Steve Rowe" wrote: >Hi dr, > >Unicode’s character property model is described here: ><http://unicode.org/reports/tr23/>. > >Wikipedia has a description of Unic

Some questions about StandardTokenizer and UNICODE Regular Expressions

2016-06-16 Thread dr
Hi guys Currenly, I'm looking into the rules of StandardTokenizer, but met some probleam. As the docs says, StandardTokenizer implements the Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29. Also it is generated by JFlex, a lexer/sc

Re:some doubts about nodeblock in *.tim

2016-02-01 Thread dr
Anybody familiar with this issue?Help.. At 2016-01-26 18:05:12, "dr" wrote: >Dear all , >Currently, I'm studying the format of *.tip and *.tim. I found that, in the >*.tim it not only stores suffixes information, but it also stores information >of the c

some doubts about nodeblock in *.tim

2016-01-26 Thread dr
Dear all , Currently, I'm studying the format of *.tip and *.tim. I found that, in the *.tim it not only stores suffixes information, but it also stores information of the current prefix's child nodes' position.Taks terms 'ab1,abc1,abc2‘ for example, for the prefix 'ab', *.tim stores the suffi

Re:Re:Re: IO probleam in forceMerge(1)

2015-08-21 Thread dr
I used to say "when i did some copy operation on the machine the speed may become normal.", it seems not correct. But when forceMerge hangs on IO, if i use the command "echo 3 > /proc/sys/vm/drop_caches", the io become normal again. At 2015-08-21 11:59:30, "dr&quo

Re:Re: IO probleam in forceMerge(1)

2015-08-20 Thread dr
, i guess it won't need too much memory to merge them. At 2015-08-20 22:00:27, "Michael McCandless" wrote: >I would first try upgrading you JVM: 1.7.0_05 is ancient. > >Mike McCandless > >http://blog.mikemccandless.com > > >On Thu, Aug 20, 2015 at 7:49 AM,

IO probleam in forceMerge(1)

2015-08-20 Thread dr
Hi ,all Currenty i met a probleam with forceMerge(1). During forceMerging some of my machine spent too much time(10-20 hours for a index size of 15GB). By using some command like top, iostat, jstack. I found that the avg of cpu, and disk is too low, nearly zero. And the size of the the inde

Re: Indexing TREC GOV2 data in Lucene

2012-04-12 Thread Dr. Hany Azzam
Hi, I am not sure if there's something in the contrib for GOV2 but it really depends on what you want to parse. If you are just interested in full-text search then it should be similar to parsing a regular document while being conscious of the trec-specific delimiters. It's something like . Howeve

Table Defn and/or ER Diagram of Segment files

2011-12-16 Thread Dr. Ray Hoare
Is there an entity-relationship of the segment files and/or Berkeley DB tables (with table definitions)? I'm trying understand the segment files of Lucene and know that a Berkeley DB can be used to store the directory but can't locate any ER diagram or table definitions for the DB. Thanks Ray --

Re: How to make mutually exclusive lists of results

2008-06-22 Thread Dr. Fish
object. Iterating > over a large result set with a Hits object can be very inefficient because > the query re-executes every 100 or so. Think about a HitCollector > instead. > > Best > Erick. > > > > On Sun, Jun 22, 2008 at 2:59 PM, Dr. Fish &

RE: How to make mutually exclusive lists of results

2008-06-22 Thread Dr. Fish
Ah I think I got it hit.getDocument().getField("CityID").stringValue() seems to be what I wanted Thanks! Dr. Fish wrote: > > I tried this first, and I was having trouble iterating over them. > > > If I do something like this > > hit.getDocument().getField

RE: How to make mutually exclusive lists of results

2008-06-22 Thread Dr. Fish
How do I just get the value "countryID" out of the document? Steven A Rowe wrote: > > Hi Dr. Fish, > > You could make just a single query with the broadest query possible - e.g. > > bacon AND country:"united states" > > and then iterate o

How to make mutually exclusive lists of results

2008-06-22 Thread Dr. Fish
Hi, I currently am using Lucene to index documents. I index 4 fields, the body of the document, the city it is related to, the state it is related to, and the country it is related to. I have a java web application where the user types in some search text.. and it searches the body of the docume