Thank you ryan for pushing on this, being persistent and getting the vote
out.
On Tue, Sep 8, 2020 at 5:55 PM Ryan Ernst wrote:
> This vote is now closed. The results are as follows:
>
> Binding Results
> A1: 12 (55%)
> D: 6 (27%)
> A2: 4 (18%)
>
> All Results
> A1: 16 (55%)
> D: 7 (
February 2014, Apache Lucene™ 4.7 available
The Lucene PMC is pleased to announce the release of Apache Lucene 4.7
Apache Lucene is a high-performance, full-featured text search engine
library written entirely in Java. It is a technology suitable for nearly
any application that requires full-text
October 2013, Apache Lucene™ 4.6 available
The Lucene PMC is pleased to announce the release of Apache Lucene 4.6
Apache Lucene is a high-performance, full-featured text search engine
library written entirely in Java. It is a technology suitable for nearly
any application that requires full-text
the reason why you can't omit it today is that $num_position ==
$term_frequency ie. we need to store it anyways. Yet, I kind of agree
that this is an impl detail so we could in theory return 1 as the TF
from the DocsAndPosEnum but this would break our APIs as well since
DocsAndPositionsEnum require
one thing I wonder is if you could just publish your benchmark code?
simon
On Thu, Aug 1, 2013 at 7:45 PM, Michael McCandless
wrote:
> On Wed, Jul 31, 2013 at 7:17 PM, Zhang, Lisheng
> wrote:
>>
>> Hi Mike,
>>
>> I retested and results are the same:
>>
>> 1/ I did not use sort (so FieldCache sh
hey,
can you share your benchmark and/or tell us a little more about how your
data looks like and how you analyze the data. There might be analysis
changes that contribute to that?
simon
On Sun, Jul 14, 2013 at 7:56 PM, cischmidt77 wrote:
> I use Lucene/MemoryIndex for a large number of quer
Well IndexSearcher doesn't have a constructor that accepts a string,
maybe you should pass in an indexreader instead?
simon
On Fri, May 17, 2013 at 3:11 PM, fifi wrote:
> please,how I can solve this error?
>
> Exception in thread "main" java.lang.NoSuchMethodError:
> org.apache.lucene.search.Ind
This seems like a bug caused due to the fact that we moved the CFS
building into DWPT. Can you open an issue for this?
simon
On Wed, May 15, 2013 at 5:50 PM, Sergiusz Urbaniak
wrote:
> Hi all,
>
> We have an obvious deadlock between a "MaybeRefreshIndexJob" thread
> calling ReferenceManager.mayb
there is also elasticsearch (elasticsearch.org) build on top of lucene
that might feel more natural if you come from mongo
simon
On Wed, May 15, 2013 at 11:38 AM, Rider Carrion Cleger
wrote:
> Thanks you Hendrik,
> I'm new with Apache Lucene, the problem that arises is like starting with
> lucen
May 2013, Apache Lucene™ 4.3 available
The Lucene PMC is pleased to announce the release of Apache Lucene 4.3
Apache Lucene is a high-performance, full-featured text search engine
library written entirely in Java. It is a technology suitable for
nearly any application that requires full-text searc
hey there,
I think your english is perfectly fine! Given the info you provided
it's very hard to answer your question... I can't look into
org.wltea.analyzer.core.AnalyzeContext.fillBuffer(AnalyzeContext.java:124)
but apparently there is a nullpointer happening here. maybe you can
track that down
hey,
first, please don't crosspost! second, can you provide more infos like
that part where you index the data. maybe something that is
selfcontained?
simon
On Mon, Apr 8, 2013 at 1:16 AM, vempap wrote:
> Hi,
>
> I've the following snippet code where I'm trying to extract weighted span
> term
can you provide some information how much ram you are setting on the
index writer config?
also how many threads are you using for indexing?
simon
On Mon, Apr 1, 2013 at 2:21 PM, Arun Kumar K wrote:
> Hi Adrien,
>
> I have seen memory usage using linux command top for RES memory & i have
> used
You can do Filter#getDocIdSet(reader, acceptedDocs).bits()
yet, this method might return null if the filter can not be
represented as bits or for other reasons like performance.
simon
On Tue, Mar 26, 2013 at 10:37 AM, Ramprakash Ramamoorthy
wrote:
> Team,
>
> We are migrating from 2.3
ginal document.
> ____
> From: Simon Willnauer [simon.willna...@gmail.com]
> Sent: Monday, March 25, 2013 4:07 AM
> To: java-user@lucene.apache.org
> Subject: Re: Compression and Highlighter
>
> On Mon, Mar 25, 2013 at 8:13 AM, Bushman, Lamont wro
it
> describes my entire professional life
>
> "Bobby Tables" is another (http://xkcd.com/327/).
>
> There, I've done my bit to stop productivity today!
>
> Erick
>
>
> On Mon, Mar 25, 2013 at 2:08 PM, Simon Willnauer
> wrote:
>>
>> ad
r
> the record: https://issues.apache.org/jira/browse/LUCENE-4878
>
> Adam
>
> -Original Message-
> From: Simon Willnauer [mailto:simon.willna...@gmail.com]
> Sent: Sunday, March 24, 2013 9:28 AM
> To: java-user@lucene.apache.org
> Subject: Re: Assert / NPE using Mult
On Mon, Mar 25, 2013 at 8:13 AM, Bushman, Lamont wrote:
> I have a project where I need to index documents using Lucene 4.1.0. One
> of the fields for the stored Document is the actual text from the
> document(.pdf, .docx, etc.) I want to be able to highlight text from the
> documents in
On Mon, Mar 25, 2013 at 4:16 AM, Steve Rowe wrote:
> The wiki at http://wiki.apache.org/lucene-java/ has come under attack by
> spammers more frequently of late, so the PMC has decided to lock it down in
> an attempt to reduce the work involved in tracking and removing spam.
>
> From now on, onl
Alex, did you try to get it working with a single term like adding
"the foobar" and then drawing suggestions for "the foo" ?
simon
On Sun, Mar 24, 2013 at 8:51 PM, Alexander Reelsen wrote:
> Hey there,
>
> I am trying to get up some working example with the AnalyzingSuggester and
> stopwords - l
Hey,
this is in-fact a bug in the MultiFieldQueryParser, can you open a
ticket for this please in our bugtracker?
MultifieldQueryParser should override getRegexpQuery but it doesn't
simon
On Sun, Mar 24, 2013 at 3:57 PM, Adam Rauch wrote:
> I'm using MultiFieldQueryParser to parse search queri
t;> > -Original Message-
>> > From: Michael McCandless [mailto:luc...@mikemccandless.com]
>> > Sent: Friday, March 22, 2013 9:41 PM
>> > To: java-user@lucene.apache.org; simon.willna...@gmail.com
>> > Subject: Re: Field.Index deprecation ?
>&g
On Fri, Mar 22, 2013 at 5:28 PM, Michael McCandless
wrote:
> We badly need Lucene in Action 3rd edition!
go mike go!!!
;)
>
> The easiest approach is to use one of the new XXXField classes under
> oal.document, eg StringField for your example.
>
> If none of the existing XXXFields "fit", you can
On Fri, Mar 22, 2013 at 2:00 PM, Pablo Guerrero wrote:
> Hi all,
>
> I'm evaluating using Lucene for some data that would not be stored anywhere
> else, and I'm concerned about reliabilty. Having a database storing the
> data in addition to Lucene would be a problem, and I want to know if Lucene
>
can you send this to d...@lucene.apache.org?
simon
On Fri, Mar 22, 2013 at 7:52 PM, Ravikumar Govindarajan
wrote:
> Most of us, writing custom codec use segment-name as a handle and push data
> to a different storage
>
> Would it be possible to get a hook in the codec APIs, when obsolete segment
all statistics in lucene are per field so is document frequency
simon
On Fri, Mar 22, 2013 at 10:48 AM, Nicole Lacoste wrote:
> Hi
>
> I am trying to figure out if the document-frequency of a term used in
> calculating the score. Is it per field? Or is independent of the field?
>
> Thanks
>
>
ormance
>
> Please forceMerge only one time not every time (only to clean up your index)!
> If you are doing a reindex already, just fix your close logic as discussed
> before.
>
>
>
> Scott Smith schrieb:
>
>>Unfortunately, this is a production system which I can't touch
The BitSet basically counts how many documents have one or more values
in this field. Some docs might not have values in this field.
state.segmentInfo.getDocCount() is the # of docs in this segment but
we are flushing a single field here. We pass down the cardinality
here since
we keep the statist
On Sat, Mar 16, 2013 at 12:02 AM, Scott Smith wrote:
> " Do you always close IndexWriter after adding few documents and when
> closing, disable "wait for merge"? In that case, all merges are interrupted
> and the merge policy never has a chance to merge at all (because you are
> opening and clo
Can you tell us a little more about how you use lucene, how do you
index, do you use NRT or do you open an IndexReader for every request,
do you maybe us a custom merge policy or somthing like this, any
special IndexWriter settings?
On Fri, Mar 15, 2013 at 11:15 PM, Scott Smith wrote:
> We have a
On Thu, Mar 7, 2013 at 6:44 PM, Michael McCandless
wrote:
> This sounds reasonable (500 M docs / 50 GB index), though you'll need
> to test resulting search perf for what you want to do with it.
>
> To reduce merging time, maximize your IndexWriter RAM buffer
> (setRAMBufferSizeMB). You could als
On Thu, Mar 7, 2013 at 7:06 PM, Jan Stette wrote:
> Thanks for your suggestions, Mike, I'll experiment with the RAM buffer size
> and segments-per-tier settings and see what that does.
>
> The time spent merging seems to be so great though, that I'm wondering if
> I'm actually better off doing the
phew! thanks for clarifying
simon
On Tue, Feb 19, 2013 at 11:19 PM, Paul Taylor wrote:
> On 19/02/2013 20:56, Paul Taylor wrote:
>>
>>
>> Strange test failure after converting code from Lucene 3.6 to Lucene 4.1
>>
>> public void testIndexPuid() throws Exception {
>>
>> addReleaseOne();
>
, Eric Charles wrote:
> Hi,
> Why not having the IS#close() calling the wrapped IR#close() ?
>
> I would be happier having to only deal with the Searcher once created and
> forget it wraps a Reader: I create a Searcher, I close it.
>
> Thx, Eric
>
>
> On 18/02/201
On Thu, Feb 14, 2013 at 11:42 AM, VIGNESH S wrote:
> Hi,
>
> I have two questions
>
> 1.How to Get the enumeration of Terms Ending with a given word
> I saw we can get enumerations of word starting at a given word by
> Indexreader.terms(term())) method
unless you want to iterate all terms and che
On Mon, Feb 18, 2013 at 7:32 PM, saisantoshi wrote:
> I understand from the JIRA ticket(Lucene-3640) that the IndexSearcher.close()
> is no-op operation but not very clear on why it is a no-op? Could someone
> shed some light on this? We were using this method in the older versions and
> is it saf
On Fri, Jan 25, 2013 at 3:29 PM, saisantoshi wrote:
> Thanks a lot. If we want to wrap TopScoreDocCollector into
> PositiveScoresOnlyCollector. Can we do that?
> I need only positive scores and I dont think topscore collector can handle
> by itself right?
>
I guess so! But how do you get neg. sco
Directory directory = ...
final SegmentInfos sis = new SegmentInfos();
sis.read(directory);
Map commitUserData = sis.getUserData();
simon
On Fri, Jan 25, 2013 at 2:32 AM, wgggfiy wrote:
> hello, but there is no getCommitUserData in IndexReader,
> how can I get the userdata ??
> thx
>
>
>
> -
hey,
you don't need to set the indexreader in the constructor. An
AtomicReader is passed in for each segment to
Collector#setNextReader(AtomicReaderContext)
If you want to use a given collector and extend it with some custom
code in collect I would likely write a delegate Collector like this:
pub
Hi Bernd,
On Thu, Jan 24, 2013 at 9:30 AM, Bernd Müller wrote:
> Hello,
>
> In the lucene 4.1 release, there was introduced a compression for
> stored fields as described here:
> https://issues.apache.org/jira/browse/LUCENE-4226
yeah that is correct, its the new default. if you use Lucene 4.1 t
hey,
do you wanna open a jira issue for this and attach your code? this
might help others too and if the shit hits the fan its good to have
something in the lucene jar that can bring some data back.
simon
On Fri, Jan 18, 2013 at 6:37 PM, Michał Brzezicki wrote:
> in lucene (*.fdt). Code is avail
te both:
>>
>> 1. Oracle's JavaDoc Style Guide: http://www.oracle.com/**
>> technetwork/java/javase/**documentation/index-137868.**html#throwstag<http://www.oracle.com/technetwork/java/javase/documentation/index-137868.html#throwstag>
>> 2. Joshua Bloch'
Hey,
On Fri, Nov 2, 2012 at 2:20 PM, Michael-O <1983-01...@gmx.net> wrote:
> Hi,
>
> why does virtually every method (exaggerating) throw an IOE? I know there
> might be a failure in the underlying IO (corrupt files, passing checked exc
> up, etc) but
>
> 1. Almost none of the has a JavaDoc on i
hey scott,
this is intentional see the javadoc step 2:
http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/analysis/TokenStream.html
simon
On Fri, Nov 2, 2012 at 2:07 AM, Scott Smith wrote:
> I was doing some tokenizer/filter analysis attempting to fix a bug I have in
> highlighting und
hey michael,
On Thu, Nov 1, 2012 at 11:30 PM, Michael-O <1983-01...@gmx.net> wrote:
> Thanks for the quick response. Any chance this could be clearer in the
> JavaDoc of this class?
sure thing, do you wanna open an issues / create a patch I am happy to
commit it.
simon
>
>> Call it when you kno
hey scott,
On Mon, Oct 29, 2012 at 11:56 PM, Scott Smith wrote:
> Converting some code to lucene 4.0, it appears that we can no longer set
> whether we want to store norms or termvectors using the "sugared" Field
> classes (e.g., StringField() and TextField). I gather the defaults are to
> st
you should call currDocsAndPositions.nextPosition() before you call
currDocsAndPositions.getPayload() payloads are per positions so you
need to advance the pos first!
simon
On Mon, Oct 29, 2012 at 6:44 PM, Ivan Vasilev wrote:
> Hi Guys,
>
> I use the following code to index documents and set Pa
hey there,
in Lucene 4 you can override the termStatistics / CollectionStatistics
used for scoring in the IndexSearcher. You can take multiple fields
into account here in order use it for scoring. Here is the javadoc
link:
http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/IndexSe
hey.
On Sun, Oct 14, 2012 at 1:51 PM, emmanuel Gosse
wrote:
>>
>> Hi,
>
>
>
>> How could i take into account in a query the fact that the searched words
>> could be more precise in a document field than an other.
>>
>
> example :
> 2 documents :
> doc1 : title : taxi
> doc2 : title : taxi driver
quick answer, Lucene only operates on strings (from a high level
perspective)
simon
On Fri, Sep 21, 2012 at 11:54 AM, 惠达 王 wrote:
> hi all:
> I want to know that why transform numeric to string?
>
> public static int longToPrefixCoded(final long val, final int shift,
> final BytesRef bytes)
hey harald,
On Mon, Aug 6, 2012 at 1:22 PM, Harald Kirsch wrote:
> Hi,
>
> in my application I have to write tons of small documents to the index, but
> with a twist. Many of the documents are actually aggregations of pieces of
> information that appear in a data stream, usually close together, b
hey,
On Mon, Aug 6, 2012 at 11:34 AM, Li Li wrote:
> hi everyone,
> in lucene 4.0 alpha, I found the DocValues are available and gave
> it a try. I am following the slides in
> http://www.slideshare.net/lucenerevolution/willnauer-simon-doc-values-column-stride-fields-in-lucene
> I have g
ms I saw.
>>
>> So docIds can definitively change under the hood?
>>
>> Harald.
>>
>>
>> Am 03.08.2012 17:24, schrieb Simon Willnauer:
>>>
>>> hey harald,
>>>
>>> if you use a possibly different searcher (reader) than you us
hey harald,
if you use a possibly different searcher (reader) than you used for
the search you will run into problems with the doc IDs since they
might change during the request. I suggest you to use SearcherManager
or NRTMangager and carry on the searcher reference when you collect
the stored val
On Thu, Aug 2, 2012 at 11:09 PM, Bill Chesky
wrote:
> Hi,
>
> I understand that generally speaking you should use the same analyzer on
> querying as was used on indexing. In my code I am using the SnowballAnalyzer
> on index creation. However, on the query side I am building up a complex
> Bo
On Thu, Aug 2, 2012 at 7:53 AM, roz dev wrote:
> Thanks Robert for these inputs.
>
> Since we do not really Snowball analyzer for this field, we would not use
> it for now. If this still does not address our issue, we would tweak thread
> pool as per eks dev suggestion - I am bit hesitant to do th
On Mon, Jul 23, 2012 at 7:00 PM, snehal.chennuru wrote:
> Thanks for the heads up Ian. I know it is highly discouraged. But, like I
> said, it is a legacy application and it is very hard to go back and re-do
> it.
you really shouldn't do that! If you use lucene as a Primary key
generator why don'
hey SimonM :)
On Mon, Jul 23, 2012 at 6:37 PM, Simon McDuff wrote:
>
> Hello, (LUCENE 4.0.0-ALPHA)
>
> We are using the DocValues features (very nice).
cool!
>
> We are using FixedBytesRef.
>
> In that specific case, we were wondering why does it flush at the end (when
> we commit) ?
the reas
)
>>
>>
>> On Fri, Jul 20, 2012 at 2:29 AM, Simon McDuff wrote:
>> >
>> > Thank you Simon Willnauer!
>> >
>> > With your explanation, we`ve decided to control the flushing by spawning
>> > another thread. So the thread is available
hey simon ;)
On Fri, Jul 20, 2012 at 2:29 AM, Simon McDuff wrote:
>
> Thank you Simon Willnauer!
>
> With your explanation, we`ve decided to control the flushing by spawning
> another thread. So the thread is available to still ingest ! :-) (correct me
> if I'm wrong)W
hey,
On Thu, Jul 19, 2012 at 7:41 PM, Simon McDuff wrote:
>
> Thank you for your answer!
>
> I read all your blogs! It is always interesting!
for details see:
http://www.searchworkings.org/blog/-/blogs/gimme-all-resources-you-have-i-can-use-them!/
and
http://www.searchworkings.org/blog/-/blog
On Wed, Jul 18, 2012 at 9:05 PM, Tim Eck wrote:
> Rum is an essential ingredient in all software systems :-)
Absolutely! :)
simon
>
> -Original Message-
> From: Simon Willnauer [mailto:simon.willna...@gmail.com]
> Sent: Wednesday, July 18, 2012 11:49 AM
> To: java-user
1. use mmap directory
2. buy rum
3. get an SSD
simon :)
On Wed, Jul 18, 2012 at 8:36 PM, Vitaly Funstein wrote:
> You do not want to store 30 G of data in the JVM heap, no matter what
> library does this.
>
> On Wed, Jul 18, 2012 at 10:44 AM, Paul Jakubik wrote:
>> If only 30GB, go with RAM and
ferent queries (well, some are repeated
>> twice or thrice), and includes search time and doc loading (reading the two
>> fields I mentioned). The queries are all straight boolean conjunctions, and
>> yes, I am dropping the first few queries when calculating averages.
>>
>&
t 2200 different queries (well, some are repeated
>> twice or thrice), and includes search time and doc loading (reading the two
>> fields I mentioned). The queries are all straight boolean conjunctions, and
>> yes, I am dropping the first few queries when calculating ave
hey there,
On Sun, Jul 15, 2012 at 10:41 AM, Doron Yaacoby
wrote:
> Hi, I have the following situation:
>
> I have two pretty large indices. One consists of about 1 billion documents
> (takes ~6GB on disk) and the other has about 2 billion documents (~10GB on
> disk). The documents are very sho
You can safely reuse a single analyzer across threads. The Analyzer
class maintains ThreadLocal storage for TokenStreams internally so you
can just create the analyzer once and use it throughout your
application.
simon
On Thu, Jul 12, 2012 at 10:13 PM, Dave Seltzer wrote:
> I have one more quest
eleteDocument(int docId) in IndexWriter.
>> It seems like it would be easy to add as DocumentsWriter already has a
>> deletedDocID. I can file a jira and submit a patch if this is something
> that you
>> guys would accept.
>>
>> Sean
>>
>> On Thu, Jul 12,
it to another machine, which keeps the index forever. Before
>> we
>>> upload the index, we forceMerge(1) on it, and gather some stats about the
>>> index like max,min serial id, total documents. While calculating max and
>> min
>>> serial id, if we see a duplicate serial
can you tell us more about your index side of things? Are you using
positions in the index since I see PhraseQuery in your code?
Where are you passing the text you are searching for to the
BrasilianAnalyzer, I don't see it in your code. You need to process you
text at search time too to get results
On Thu, Jul 12, 2012 at 3:09 AM, Sean Bridges wrote:
> Is it possible to delete by docId in lucene 4? I can delete by docid
> in lucene 3 using IndexReader.deleteDocument(int docId), but that
> method is gone in lucene 4, and IndexWriter only allows deleting by
> Term or Query.
that is correct.
are you closing your underlying IndexReaders properly?
simon
On Wed, Jul 11, 2012 at 5:04 AM, Yang wrote:
> I'm running 8 index searchers java processes on a 8-core node.
> They all read from the same lucene index on local hard drive.
>
>
> the index contains about 20million docs, each doc is
]} arrays.
* This class is optimized for small memory-resident indexes.
* It also has bad concurrency on multithreaded environments.
simon
On Sat, Jul 7, 2012 at 1:29 PM, Simon Willnauer
wrote:
> On Fri, Jul 6, 2012 at 9:28 PM, Leon Rosenberg
> wrote:
>> Hello,
>>
>> we ha
On Fri, Jul 6, 2012 at 9:28 PM, Leon Rosenberg wrote:
> Hello,
>
> we have a small internet shop which uses lucene for product search.
> With increasing traffic we have continuos problem with literaly
> hundreds of threads being BLOCKED in lucene code:
>
> here is an example taken with jstack on p
see definitions:
http://lucene.apache.org/core/3_6_0/fileformats.html#Definitions
simon
On Wed, Jun 27, 2012 at 6:08 PM, Simon Willnauer
wrote:
> a term in this context is a (field,text) tuple - does this make sense?
> simon
>
> On Wed, Jun 27, 2012 at 11:40 AM, wangjing wr
a term in this context is a (field,text) tuple - does this make sense?
simon
On Wed, Jun 27, 2012 at 11:40 AM, wangjing wrote:
> http://lucene.apache.org/core/3_6_0/fileformats.html#Frequencies
>
> The .frq file contains the lists of documents which contain each term,
> along with the frequency o
see http://lucene.apache.org/core/3_6_0/fileformats.html#field_index
for file format documentation.
simon
On Mon, Jun 25, 2012 at 5:28 AM, wangjing wrote:
> .fdx file contains, for each document, a pointer to its field data.
>
> BUT fdx is contains pointer to WHAT? it's a pointer of field data
On Sat, May 26, 2012 at 2:59 AM, Yang wrote:
> I tested with more threads / processes. indeed this is completely
> cpu-bound, since running 1 thread gives the same latency as 4 threads (my
> box has 4 cores)
>
>
> given this, is there any way to simplify the scoring computation (i'm only
> using l
hey,
On Fri, May 25, 2012 at 2:45 PM, Nikolay Zamosenchuk
wrote:
> Hi everyone. We are using IndexReader.deleteDocument(Term) method to
> delete documents, since it returns the number of deleted documents.
> This is used to be sure that some docs were removed. We must know for
> sure if documents
we removed almost all serializable from lucene since it was causing
many problems and wasn't complete either. users should serialize
classes / logic themself or use higher level impls that deal with that
already.
simon
On Mon, May 21, 2012 at 1:05 PM, Lars Gjengedal
wrote:
> Hi
>
> I have not bee
On Fri, May 11, 2012 at 7:56 AM, Jong Kim wrote:
> When I update a document in Lucene (i.e., re-indexing), I have to delete
> the existing document, and create a new one. My understanding is that this
> assigns a new doc ID for the newly created document. If that is the case,
> is it true that the
Hey, do you get multiple files per segment or multiple files per index?
The compoundfile system writes a .cfs file (and a .cfe file in trunk)
per segment. So if you are seeing multiple .cfs fiels Lucene is
actually doing what you want. If there are files like .fdt/fdx or
tii/tis then the segment is
Hey Stuart,
Lucene solely relies on the FS cache with some exceptions for the
term-dictionary and FieldCache which is pulled entirely into memory.
FieldCache is not used to retrieve stored fields though, its rather an
univerted view (docID -> value) of an indexed (inverted) field. So
basically wha
one major thing that changed from 3.0.3 to 3.5 is that we use
TieredMergePolicy by default. can you try to use the same merge policy
on both 3.0.3 and 3.5 and report back? ie LogByteSizeMergePolicy or
whatever you are using...
simon
On Thu, Feb 9, 2012 at 5:28 AM, Vitaly Funstein wrote:
> Hello,
are you closing the NRTManager while other threads still accessing the
SearcherManager?
simon
On Wed, Feb 8, 2012 at 1:48 PM, Cheng wrote:
> I use it exactly the same way. So there must be other reason causing the
> problem.
>
> On Wed, Feb 8, 2012 at 8:21 PM, Ian Lea wrote:
>
>> Releasing a se
of the url, so that the url would
> determine which index was to be loaded by the dataimport command.
seems like you should look at solr's multicore feature:
http://wiki.apache.org/solr/CoreAdmin
simon
>
> F
>
> -Original Message-
> From: Simon Willnauer [mailt
Hey,
On Wed, Jan 25, 2012 at 11:01 PM, Cheng wrote:
> Hi,
>
> I am using multiple writer instances in a web service. Some instances are
> busy all the time, while some aren't. I wonder how to configure the writer
> to dissolve itself after a certain time of idling, say 30 seconds.
what do you me
hey Frank,
can you elaborate what you mean by different doc types? Are you
referring to an entity ie. a table per entity to speak in SQL terms?
in general you should get better responses for solr related questions
on solr-u...@lucene.apache.org
simon
On Wed, Jan 25, 2012 at 10:49 PM, Frank DeRos
I think the question is more related to the reopen thread. This class
directly extends Thread and instead of calling Thread#start() directly
you can simply pass it to the Executor since it implements Runnable -
is that what you are asking for?
simon
On Sun, Jan 15, 2012 at 7:53 PM, Michael McCandl
ittee Chairs:
* Isabel Drost (Nokia & Apache Mahout)
* Jan Lehnardt (CouchBase & Apache CouchDB)
* Simon Willnauer (SearchWorkings & Apache Lucene)
* Grant Ingersoll (Lucid Imagination & Apache Lucene)
* Owen O’Malley (Yahoo Inc. & Apache Hadoop)
* Jim Webber (Neo Tec
Folks,
I just committed LUCENE-3628 [1] which cuts over Norms to DocVaues.
This is an index file format change and if you are using trunk you
need to reindex before updating.
happy indexing :)
simon
[1] https://issues.apache.org/jira/browse/LUCENE-3628
-
hey peter,
On Wed, Jan 4, 2012 at 12:52 AM, Peter K wrote:
> Thanks Simon for you answer!
>
>> as far as I can see you are comparing apples and pears.
>
> When excluding the waiting time I also get the slight but reproducable
> difference**. The times for waitForGeneration are nearly the same
> (
Hey,
On Wed, Jan 4, 2012 at 1:15 PM, Hany Azzam wrote:
> Hi,
>
> I am experimenting with the Lucene trunk (aka 4.0), especially with the new
> IndexDocValues feature. I am trying to store some query-independent
> statistics such as PageRank, etc. One stat that I am trying to store is the
> sum
hey Peter,
as far as I can see you are comparing apples and pears. Your
comparison is waiting for merges to finish and if you are using
multiple threads lucene 4.0 will flush more segments to disk than 3.5
so what you are seeing is likely a merge that is still trying to merge
small segments. can y
hey charlie,
there are a couple of wrong assumptions in your last email mostly
related to merging. mergefactor = 10 doesn't mean that you are ending
up with one file neither is it related to files. Yet, my first guess
is that you are using CompoundFileSystem (CFS) so each segment
corresponds to a
I think you are confusing something here. BDB can be used as a
"Directory" implementation but a Directory is a simple "blob" store.
BDB only stores binary BLOB which corresponds to a file. AFAIK we
dropped the BDB support entirely a couple of releases ago.
In Lucene you can think of one large table
On Mon, Dec 19, 2011 at 9:04 PM, Simon Willnauer
wrote:
> On Mon, Dec 19, 2011 at 5:03 PM, Peter Karich wrote:
>> Hi Uwe,
>>
>> thanks for the talk suggestion(s)*.
>>
>> I was using it for faster term lookups of a long 'id'. How would this be
>> d
On Mon, Dec 19, 2011 at 5:03 PM, Peter Karich wrote:
> Hi Uwe,
>
> thanks for the talk suggestion(s)*.
>
> I was using it for faster term lookups of a long 'id'. How would this be
> done with 4.0? Before I did it via Term:
>
> new Term(fieldName, NumericUtils.longToPrefixCoded(longValue));
>
> How
On Thu, Dec 15, 2011 at 6:33 PM, Mike O'Leary wrote:
> We have a large set of documents that we would like to index with a
> customized stopword list. We have run tests by indexing a random set of about
> 10% of the documents, and we'd like to generate a list of the terms in that
> smaller set
1 20 761.95
> 262.49 63,139,256 91,881,472
>
> The performance is slightly better than the one using StandardAnalyzer, but
> this is still much worse than the performance with 2.4.1.
>
> Sean
>
> -Original Message-
> From: Simon Willnauer [
1 - 100 of 414 matches
Mail list logo