date:20140603

Re: How to approach indexing source code?

2014-06-03 Thread Aditya

Hi Johan, How you want to search, What is your search requirement and according to that you need to index. You could check duckduckgo or github code search. The easiest approach would be to have a parser which will read each source file and indexes as a single document. When you search, you will

Re: How to approach indexing source code?

2014-06-03 Thread Jack Krupansky

The first question for any search app should always be: How do you intend to query the data? That will in large part determine how you should index the data. IOW, how do you intend to use the data? Be specific. Provide some sample queries and then work backwards to how the data needs to be in

How to approach indexing source code?

2014-06-03 Thread Johan Tibell

Hi, I'd like to index (Haskell) source code. I've run the source code through a compiler (GHC) to get rich information about each token (its type, fully qualified name, etc) that I want to index (and later use when ranking). I'm wondering how to approach indexing source code. I can see two possib

RE: search performance

2014-06-03 Thread Toke Eskildsen

Jamie [ja...@mailarchiva.com] wrote: > It would be nice if, in future, the Lucene API could provide a > searchAfter that takes a position (int). It would not really help with large result sets. At least not with the current underlying implementations. This is tied into your current performance pr

Re: search performance

2014-06-03 Thread Jamie

Thanks Jon I'll investigate your idea further. It would be nice if, in future, the Lucene API could provide a searchAfter that takes a position (int). Regards Jamie On 2014/06/03, 3:24 PM, Jon Stewart wrote: With regards to pagination, is there a way for you to cache the IndexSearcher, Que

Re: search performance

2014-06-03 Thread Jon Stewart

With regards to pagination, is there a way for you to cache the IndexSearcher, Query, and TopDocs between user pagination requests (a lot of webapp frameworks have object caching mechanisms)? If so, you may have luck with code like this: void ensureTopDocs(final int rank) throws IOException {

Re: search performance

2014-06-03 Thread Jamie

Robert. Thanks, I've already done a similar thing. Results on my test platform are encouraging.. On 2014/06/03, 2:41 PM, Robert Muir wrote: Reopening for every search is not a good idea. this will have an extremely high cost (not as high as what you are doing with "paging" but still not good).

Re: search performance

2014-06-03 Thread Robert Muir

Reopening for every search is not a good idea. this will have an extremely high cost (not as high as what you are doing with "paging" but still not good). Instead consider making it near-realtime, by doing this every second or so instead. Look at SearcherManager for code that helps you do this. O

Re: search performance

2014-06-03 Thread Jamie

Robert FYI: I've modified the code to utilize the experimental function.. DirectoryReader dirReader = DirectoryReader.openIfChanged(cachedDirectoryReader,writer, true); In this case, the IndexReader won't be opened on each search, unless absolutely necessary. Regards Jamie On 2014/06

Re: search performance

2014-06-03 Thread Jamie

Robert Hmmm. why did Mike go to all the trouble of implementing NRT search, if we are not supposed to be using it? The user simply wants the latest result set. To me, this doesn't appear out of scope for the Lucene project. Jamie On 2014/06/03, 1:17 PM, Robert Muir wrote: No, you are

Re: search performance

2014-06-03 Thread Robert Muir

No, you are incorrect. The point of a search engine is to return top-N most relevant. If you insist you need to open an indexreader on every single search, and then return huge amounts of docs, maybe you should use a database instead. On Tue, Jun 3, 2014 at 6:42 AM, Jamie wrote: > Vitality / Rob

Re: search performance

2014-06-03 Thread Jamie

Vitality / Robert I wouldn't go so far as to call our pagination naive!? Sub-optimal, yes. Unless I am mistaken, the Lucene library's pagination mechanism, makes the assumption that you will cache the scoredocs for the entire result set. This is not practical when you have a result set that e

Re: Reader reopen

2014-06-03 Thread Michael McCandless

Sure, just use DirectoryReader.openIfChanged. Mike McCandless http://blog.mikemccandless.com On Tue, Jun 3, 2014 at 6:36 AM, Gergő Törcsvári wrote: > Hello, > > If I have an AtomicReader, and an IndexSearcher can I reopen the index to > get the new documents? > Like there: > http://lucene.apac

Fwd: Reader reopen

2014-06-03 Thread Gergő Törcsvári

Hello, If I have an AtomicReader, and an IndexSearcher can I reopen the index to get the new documents? Like there: http://lucene.apache.org/core/3_0_3/api/all/org/apache/lucene/index/IndexReader.html#reopen%28%29 There is any workaround? Thanks, Gergő P.S.: I accidentaly send this to general li

Re: search performance

2014-06-03 Thread Vitaly Funstein

Jamie, What if you were to forget for a moment the whole pagination idea, and always capped your search at 1000 results for testing purposes only? This is just to try and pinpoint the bottleneck here; if, regardless of the query parameters, the search latency stays roughly the same and well below

Re: search performance

2014-06-03 Thread Robert Muir

Check and make sure you are not opening an indexreader for every search. Be sure you don't do that. On Mon, Jun 2, 2014 at 2:51 AM, Jamie wrote: > Greetings > > Despite following all the recommended optimizations (as described at > http://wiki.apache.org/lucene-java/ImproveSearchingSpeed) , in so

Re: search performance

2014-06-03 Thread Jamie

Vitaly See below: On 2014/06/03, 12:09 PM, Vitaly Funstein wrote: A couple of questions. 1. What are you trying to achieve by setting the current thread's priority to max possible value? Is it grabbing as much CPU time as possible? In my experience, mucking with thread priorities like this is

Re: search performance

2014-06-03 Thread Vitaly Funstein

A couple of questions. 1. What are you trying to achieve by setting the current thread's priority to max possible value? Is it grabbing as much CPU time as possible? In my experience, mucking with thread priorities like this is at best futile, and at worst quite detrimental to responsiveness and o

Re: search performance

2014-06-03 Thread Jamie

FYI: We are also using a multireader to search over multiple index readers. Search under a million documents yields good response times. When you get into the 60M territory, search slows to a crawl. On 2014/06/03, 11:47 AM, Jamie wrote: Sure... see below: --

Re: search performance

2014-06-03 Thread Jamie

Sure... see below: protected void search(Query query, Filter queryFilter, Sort sort) throws BlobSearchException { try { logger.debug("start search {searchquery='" + getSearchQuery() + "',query='"+query.toString()+"',filterQuery='"+queryFilter+"',sort='"+sort

Re: search performance

2014-06-03 Thread Rob Audenaerde

Hi Jamie, What is included in the 5 minutes? Just the call to the searcher? seacher.search(...) ? Can you show a bit more of the code you use? On Tue, Jun 3, 2014 at 11:32 AM, Jamie wrote: > Vitaly > > Thanks for the contribution. Unfortunately, we cannot use Lucene's > pagination function

Re: search performance

2014-06-03 Thread Jamie

Vitaly Thanks for the contribution. Unfortunately, we cannot use Lucene's pagination function, because in reality the user can skip pages to start the search at any point, not just from the end of the previous search. Even the first search (without any pagination), with a max of 1000 hits, tak

Re: search performance

2014-06-03 Thread Vitaly Funstein

Something doesn't quite add up. TopFieldCollector fieldCollector = TopFieldCollector.create(sort, max,true, > false, false, true); > > We use pagination, so only returning 1000 documents or so at a time. > > You say you are using pagination, yet the API you are using to create your collector isn't

Re: search performance

2014-06-03 Thread Jamie

Toke Thanks for the contact. See below: On 2014/06/03, 9:17 AM, Toke Eskildsen wrote: On Tue, 2014-06-03 at 08:17 +0200, Jamie wrote: Unfortunately, in this instance, it is a live production system, so we cannot conduct experiments. The number is definitely accurate. We have many different sy

Re: search performance

2014-06-03 Thread Toke Eskildsen

On Tue, 2014-06-03 at 08:17 +0200, Jamie wrote: > Unfortunately, in this instance, it is a live production system, so we > cannot conduct experiments. The number is definitely accurate. > > We have many different systems with a similar load that observe the same > performance issue. To my knowle

Re: How to approach indexing source code?

Re: How to approach indexing source code?

How to approach indexing source code?

RE: search performance

Re: search performance

Re: search performance

Re: search performance

Re: search performance

Re: search performance

Re: search performance

Re: search performance

Re: search performance

Re: Reader reopen

Fwd: Reader reopen

Re: search performance

Re: search performance

Re: search performance

Re: search performance

Re: search performance

Re: search performance

Re: search performance

Re: search performance

Re: search performance

Re: search performance

Re: search performance

25 matches

Site Navigation

Mail list logo

Footer information