Re: Field Normalisation in Query across two indexes

2009-02-26 Thread Dino Korah
Would any one please help me with this. Thanks On 23/02/2009, Dino Korah wrote: > > Guys, > > I have a question on normalisation. > I am using a prehistoric version of lucene; 2.0.0 > > Context: http://markmail.org/message/z5lcz2htjvqsscam > > I have these two scenar

Field Normalisation in Query across two indexes

2009-02-23 Thread Dino Korah
Guys, I have a question on normalisation. I am using a prehistoric version of lucene; 2.0.0 Context: http://markmail.org/message/z5lcz2htjvqsscam I have these two scenario with indexes. One: 2 indexes; One with documents that has a field "field_one" instantiated as TOKENIZED and then setOmitNorm

Re: Index time Document Boosting and Query Time Sorts

2008-09-26 Thread Dino Korah
Cheers All 2008/9/24 Karl Wettin <[EMAIL PROTECTED]> > > 24 sep 2008 kl. 12.40 skrev Grant Ingersoll: > > One side note based on your example, below: Index time boosting does not >> have much granularity (only 255 values), in other words, there is a loss of >> precision. Thus, you >> want to m

Re: IndexSearcher.search

2008-09-24 Thread Dino Korah
Thanks Chris, It kinda makes sense to have control on what we do with the API, but for first time users, it will be vital to have classes that will help them smoothen their learning curve. By the looks of it, even advanced users use Hits and are porting to TopDocs only because of the deprecation.

Index time Document Boosting and Query Time Sorts

2008-09-24 Thread Dino Korah
Hi all, Could you please help me understand hos that works. If I boost documents at index time based on some kind of criteria and if I am to sort on a different criteria at query time, how will the result get affected by the boosting. So if I am to index a bunch of text files in a folder struct

RE: Multi Field search without Multifieldqueryparser

2008-09-23 Thread Dino Korah
Just an idea... Along winded one. I'm not sure either.! Pardon me if I am directing you in the wrong direction If you add a lucene doc like below into your main index - Doc 1 - Field1: rainy today Field2: rainy yesterday Field3: weather forcast for tomorrow - Doc 2 - Field1: rainy tomorrow Fiel

RE: Multi Field search without Multifieldqueryparser

2008-09-22 Thread Dino Korah
I would think, with the current capabilities of lucene, denormalisation is the solution. Create an extra indexed but not stored field called "searchable-mash" which will hold the values from all fields with added words to connect the data like "Male named George Bush whoes occupation is President o

Re: IndexSearcher.search

2008-09-19 Thread Dino Korah
I really think Hits is definitely a nice utility class, when it comes to GUI results presentation. If we are to use our own class for this purpose, it wouldn't be much different from Hits. Its a shame that we are drooping it.! 2008/9/19 Daniel Noll <[EMAIL PROTECTED]> > Chris Hostetter wrote: > >

Search all Related Documents

2008-09-18 Thread Dino Korah
Hi All, Scenario: I have 100 documents in an index and if these documents fall into 10 mutualy exclusive set; And within that set one of them is the main document. Now if I am to search on the index and group the result on 10 mutually exclusive sets. And if I have to display the result with field

Re: TopDocCollector & Paging

2008-09-17 Thread Dino Korah
Thanks Grant.. Please see my comments/response below. 2008/9/17 Grant Ingersoll <[EMAIL PROTECTED]> > > On Sep 17, 2008, at 4:39 PM, Dino Korah wrote: > > I know in applications where we search for a words or phrases and expect >> the >> result sorted by relevan

Re: TopDocCollector & Paging

2008-09-17 Thread Dino Korah
I know in applications where we search for a words or phrases and expect the result sorted by relevance, TopDocCollector would work like a dream. But what about scenario where the result needs to be sorted chronologically or by some kind of metadata. A very common application would be email applic

TopDocCollector & Paging

2008-09-17 Thread Dino Korah
Hello All, Anyone has tried this? My UI has a requirement to show total number of results and then show results in pages. How do I do that with TopDocCollector, without having to run search() twice, one to get the total number of hits and then the next one to get the page being displayed. (lik

search with Filter

2008-09-15 Thread Dino Korah
Hi All, I am trying to utilize Filter to see if I can get a bit more performance out of my application that searches over 100million document lucene index. On all my documents I have a two fields over which I will have to scope my searches. One is a date-time field (MMDDHHMMSS) and a user-i

HitCollector - Remote-ability

2008-09-11 Thread Dino Korah
Hi All, I vaguely remember discussions on lucene remote-ability of HitCollectors based search(). As far as I remember, it is not possible if I use HitCollectors. In lucene 3, we are doing away with a lot of search() variants, including the ones that return Hits. I would like to know which one o

Searcher - search() & Hits Deprecation

2008-09-11 Thread Dino Korah
Hi All, In my project I use Hits from Searcher.search() for my query results. If I am to move to Lucene 3's ways, I will have to use TopDocs I presume. It'll be great if someone could guide me with some sort of skeleton code. Also is it possible to cache the results like I do with Hits? Anoth

Analyzer at Query time

2008-08-28 Thread Dino Korah
Hi All, If I am to completely avoid the query parser and use the BooleanQuery along with TermQuery, RangeQuery, PrefixQuery, PhraseQuery, etc, does the search words still get to the Analyzer, before actually doing the real search? Many thanks, Dino

FW: Case Sensitivity

2008-08-28 Thread Dino Korah
is Gospodnetic wrote: > Dino, you lost me half-way through your email :( > > NO_NORMS does not mean the field is not tokenized. > UN_TOKENIZED does mean the field is not tokenized. > > > Otis-- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > >

RE: Case Sensitivity

2008-08-27 Thread Dino Korah
Dino, you lost me half-way through your email :( > > NO_NORMS does not mean the field is not tokenized. > UN_TOKENIZED does mean the field is not tokenized. > > > Otis-- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message -

RE: Case Sensitivity

2008-08-26 Thread Dino Korah
later set a few with setOmitNorms(true) (the index writer is plain StandardAnalyzer based)? A per field analyzer at query time ?! Many thanks, Dino -Original Message----- From: Dino Korah [mailto:[EMAIL PROTECTED] Sent: 26 August 2008 12:12 To: 'java-user@lucene.apache.org' Subject

RE: Case Sensitivity

2008-08-26 Thread Dino Korah
page for some hints: <http://wiki.apache.org/lucene-java/AnalysisParalysis> On 08/19/2008 at 8:57 AM, Dino Korah wrote: > From the discussion here what I could understand was, if I am using > StandardAnalyzer on TOKENIZED fields, for both Indexing and Querying, > I shouldn't have a

RE: Case Sensitivity

2008-08-22 Thread Dino Korah
(result); return result; } } On Wed, Aug 20, 2008 at 10:21 AM, Dino Korah <[EMAIL PROTECTED]> wrote: > Hi Steve, > > Thanks a lot for that. > > I have a question on TokenStreams and email addresses, but I will post them > on a separate thread. > > Many than

EmailAddressAnalyzer & TokenStreams

2008-08-20 Thread Dino Korah
Hi guys, If I am to tokenize an email address like "John Smith" < [EMAIL PROTECTED]> into [ [EMAIL PROTECTED] [John] [Smith] [J.Smith] [london.gb.world.net] [gb.world.net] [world.net] [world] [net] Is i

RE: Case Sensitivity

2008-08-20 Thread Dino Korah
o, have a look at the AnalysisParalysis wiki page for some hints: <http://wiki.apache.org/lucene-java/AnalysisParalysis> On 08/19/2008 at 8:57 AM, Dino Korah wrote: > From the discussion here what I could understand was, if I am using > StandardAnalyzer on TOKENIZED fields, for both Indexing

RE: Case Sensitivity

2008-08-19 Thread Dino Korah
Hi Guys, >From the discussion here what I could understand was, if I am using StandardAnalyzer on TOKENIZED fields, for both Indexing and Querying, I shouldn't have any problems with cases. But if I have any UN_TOKENIZED fields there will be problems if I do not case-normalize them myself before a

RE: Case Sensitivity

2008-08-13 Thread Dino Korah
Also would like to highlight the version of Lucene I am using; It is 2.0.0. _ From: Dino Korah [mailto:[EMAIL PROTECTED] Sent: 13 August 2008 17:10 To: 'java-user@lucene.apache.org' Subject: Case Sensitivity Hi All, Once I index a bunch of documents with a StandardAnalyz

Case Sensitivity

2008-08-13 Thread Dino Korah
Hi All, Once I index a bunch of documents with a StandardAnalyzer (and if the effort I need to put in to reindex the documents is not worth the effort), is there a way to search on the index without case sensitivity. I do not use any sophisticated Analyzer that makes use of LowerCaseTokenizer.

RE: Transaction semantics in Document addition

2008-05-19 Thread Dino Korah
a-user@lucene.apache.org Sent: Monday, May 19, 2008 4:02:52 AM Subject: Re: Transaction semantics in Document addition Dino Korah wrote: > Hi All, > > I am dealing with a situation where a document could possibly have > multiple attachments to it, and they are all added to the index under &g

Transaction semantics in Document addition

2008-05-19 Thread Dino Korah
Hi All, I am dealing with a situation where a document could possibly have multiple attachments to it, and they are all added to the index under a document-id (not lucene doc-id). Now if one of the attachments fail to get indexed due to failure of any subsystem like the text extraction module, I

RE: Indexing and Searching from within a single Document

2008-04-08 Thread Dino Korah
Lucene is a library to index data. Its up to you to drive it the way you want. Think about the search result, how would your user like to see the information that the search page has brought up. Do they want to know page numbers, or is it section number, or it could be even sentence number. Dependi

Index Reader Writer - 2 JVMs

2007-10-24 Thread Dino Korah
Hi All, I have a scenario where there are two processes (2 JVMs) accessing the same index. One of them is doing the indexing as documents arrive into the system and the second one servers search queries. Both the processes are running on the same machine. Is there a need to do some kind of lockin

Norm - please lit it up for me

2007-10-19 Thread Dino Korah
Hi, Could someone help me understand normalization factors for a field. Also please tell me what are the situations where I should omit normalization factors when adding a document. Many thanks. Dino Korah

RE: mixing analyzer

2007-10-01 Thread Dino Korah
andle things for you. But this may not fit your problem space ideally.... Best Erick On 10/1/07, Dino Korah <[EMAIL PROTECTED]> wrote: > > Hi, > > > > I am working on a lucene email indexing system which potentially can get > documents in various languages. Currently I

mixing analyzer

2007-10-01 Thread Dino Korah
Hi, I am working on a lucene email indexing system which potentially can get documents in various languages. Currently I am using StandardAnalyzer, which works for English but not for many of the other languages. One of the requirements for the search interface is that they have to search witho

RE: Multiple Indices vs Single Index

2007-09-21 Thread Dino Korah
In a similar scenario, if we had more than 40 of such grouping, on which the hits (not lucene hits, but hit for search by users) are not evenly distributed, would it be good to have a LRU caching mechanism for searcher objects. Is there any sort of grey are that should be avoided in doing that?

Lucene multiple indexes

2007-09-20 Thread Dino Korah
Hi People, I was trying to get lucene to work for a mail indexing solution. Scenario: Traffic into the index method is on average 250 mails and their attachments per minute. This volume has made me think of a solution that will split the index on domain names of the owner of the message. S

Re: index optimisation - disk fill-up

2007-03-10 Thread Dino Korah
Cheers Michael. On 10/03/07, Michael McCandless <[EMAIL PROTECTED]> wrote: "Dino Korah" <[EMAIL PROTECTED]> wrote: > I understand lucene has a requirement of double the size of index > available > free on the disk on which the index is being optimised. But if i

index optimisation - disk fill-up

2007-03-10 Thread Dino Korah
Hi All, I understand lucene has a requirement of double the size of index available free on the disk on which the index is being optimised. But if in case the disk gets filled up during optimisation, what will happen to the index, theoretically? Is there an effective way of avoiding this? Many T