Multi Field AND Search

2009-06-18 Thread saurabhs_iitk
Hi I have indexed 8 fileds with different boost. Now i have given a searchstring which consists of a words and phrases. Now i want to do AND search of that searchString on four fields and show the result based on boost. For me searchString should occur completely in one of the field and then the b

[jira] Updated: (LUCENE-1692) Contrib analyzers need tests

2009-06-18 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-1692: Attachment: LUCENE-1692.txt patch with the two one-line fixes: 1. fix offsets for thai analyzer so

Re: Some thoughts around the use of reader.isDeleted and hasDeletions

2009-06-18 Thread Earwin Burrfoot
Runtime change. Hard to imagine people relying on failing document() call. On Fri, Jun 19, 2009 at 00:29, Shai Erera wrote: > I've made the changes to SegmentMerger and want to make the following > changes to IndexReader.document(): (1) don't call ensureOpen() and (2) don't > check isDeleted. > >

[jira] Issue Comment Edited: (LUCENE-1466) CharFilter - normalize characters before tokenizer

2009-06-18 Thread Koji Sekiguchi (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721588#action_12721588 ] Koji Sekiguchi edited comment on LUCENE-1466 at 6/18/09 7:04 PM: ---

[jira] Updated: (LUCENE-1466) CharFilter - normalize characters before tokenizer

2009-06-18 Thread Koji Sekiguchi (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated LUCENE-1466: --- Attachment: LUCENE-1466.patch updated patch attached. - sync trunk (smart chinese analyzer(L

[jira] Commented: (LUCENE-1539) Improve Benchmark

2009-06-18 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721586#action_12721586 ] Jason Rutherglen commented on LUCENE-1539: -- I think it would be convenient to all

[jira] Updated: (LUCENE-1313) Near Realtime Search

2009-06-18 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-1313: - Attachment: LUCENE-1313.patch * TestThreadedOptimize passes, LogMergePolicy now filters

Re: madvise(ptr, len, MADV_SEQUENTIAL)

2009-06-18 Thread Jason Rutherglen
Hmm... So the list at the bottom of this page looks accurate? http://www.gnu.org/software/hello/manual/gnulib/posix_005ffadvise.html If it is, then posix_fadvise works on Linux only? Perhaps madvise will be better then (judging by the much smaller unsupported list)? It seems to run on most platf

[jira] Commented: (LUCENE-1692) Contrib analyzers need tests

2009-06-18 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721512#action_12721512 ] Robert Muir commented on LUCENE-1692: - later tonight i can workup a patch to address t

RE: Tests fail to compile on JDK 1.4?

2009-06-18 Thread Chris Hostetter
: We had some discussions about it, the easiest is, to set the bootclasspath : in the task to an older rt.jar during compilation. Because this : needs updates for e.g. Hudson (rt.jar missing) we said, that the one, who : releases the final version should simply check this before on the : compilat

[jira] Updated: (LUCENE-1692) Contrib analyzers need tests

2009-06-18 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-1692: Attachment: LUCENE-1692.txt patch with new testcase demonstrating the chinese behavior. > Contrib

[jira] Commented: (LUCENE-1692) Contrib analyzers need tests

2009-06-18 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721504#action_12721504 ] Robert Muir commented on LUCENE-1692: - ok got it, the IDEOGRAPHIC FULL STOP is being

[jira] Updated: (LUCENE-1692) Contrib analyzers need tests

2009-06-18 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-1692: Attachment: LUCENE-1692.txt michael: here is an updated patch. i removed that chinese test, there

[jira] Updated: (LUCENE-1692) Contrib analyzers need tests

2009-06-18 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-1692: Attachment: example.jpg Having trouble figuring this one out > Contrib analyzers need tests > ---

[jira] Commented: (LUCENE-1692) Contrib analyzers need tests

2009-06-18 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721475#action_12721475 ] Robert Muir commented on LUCENE-1692: - probably, fixed it and testing with ant now. il

Re: Fuzzy search change

2009-06-18 Thread eks dev
what would be the difference/benefit compared to standard lucene SpellChecker? If I I am not wrong: - Lucene SpellChecker uses standard lucene index as a storage for tokens instead of QDBM... meaning full inverted index with arbitrary N-grams length, with tf/idf/norms... not only HashMap -

[jira] Commented: (LUCENE-1692) Contrib analyzers need tests

2009-06-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721469#action_12721469 ] Michael McCandless commented on LUCENE-1692: Probably eclipse isn't running wi

[jira] Commented: (LUCENE-1692) Contrib analyzers need tests

2009-06-18 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721463#action_12721463 ] Robert Muir commented on LUCENE-1692: - michael, i guess junit from my eclipse != junit

[jira] Commented: (LUCENE-1692) Contrib analyzers need tests

2009-06-18 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721466#action_12721466 ] Robert Muir commented on LUCENE-1692: - michael: yes the stems/words.txt for stems.txt

[jira] Commented: (LUCENE-1692) Contrib analyzers need tests

2009-06-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721462#action_12721462 ] Michael McCandless commented on LUCENE-1692: Me too :) Robert can you cons up

[jira] Commented: (LUCENE-1692) Contrib analyzers need tests

2009-06-18 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721461#action_12721461 ] Mark Miller commented on LUCENE-1692: - heh - +1 on fixing them all. Including reclaim

Re: Some thoughts around the use of reader.isDeleted and hasDeletions

2009-06-18 Thread Shai Erera
I've made the changes to SegmentMerger and want to make the following changes to IndexReader.document(): (1) don't call ensureOpen() and (2) don't check isDeleted. Question is - can I make these changes on the current impls, or do I need to deprecate and come up w/ a new name? Here a new name is n

[jira] Commented: (LUCENE-1692) Contrib analyzers need tests

2009-06-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721460#action_12721460 ] Michael McCandless commented on LUCENE-1692: I'm seeing this test failure: {co

[jira] Commented: (LUCENE-1692) Contrib analyzers need tests

2009-06-18 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721457#action_12721457 ] Robert Muir commented on LUCENE-1692: - Michael, I think it would be nice to fix the Th

[jira] Commented: (LUCENE-1692) Contrib analyzers need tests

2009-06-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721451#action_12721451 ] Michael McCandless commented on LUCENE-1692: bq. michael: I'm think I'm done h

Re: Lucene 2.9 Again

2009-06-18 Thread Michael McCandless
On Thu, Jun 18, 2009 at 3:07 PM, Jason Rutherglen wrote: >> I pretty much find any excuse to go and write stuff in Python > > There's Scala... I've only read about it so far but it does look good. Mike - To unsubscribe, e-mail:

[jira] Resolved: (LUCENE-1595) Split DocMaker into ContentSource and DocMaker

2009-06-18 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller resolved LUCENE-1595. - Resolution: Fixed Lucene Fields: [New, Patch Available] (was: [New]) Thanks Shai, I just

[jira] Commented: (LUCENE-1646) QueryParser throws new exceptions even if custom parsing logic threw a better one

2009-06-18 Thread Hoss Man (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721427#action_12721427 ] Hoss Man commented on LUCENE-1646: -- As a general rule, code catching an execption and thr

[jira] Commented: (LUCENE-1693) AttributeSource/TokenStream API improvements

2009-06-18 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721426#action_12721426 ] Uwe Schindler commented on LUCENE-1693: --- I only tested performance with the lucene b

Re: Lucene 2.9 Again

2009-06-18 Thread Jason Rutherglen
> I pretty much find any excuse to go and write stuff in Python There's Scala... On Thu, Jun 18, 2009 at 2:37 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > On Wed, Jun 17, 2009 at 4:13 PM, Mark Miller wrote: > > Michael Busch wrote: > >> > >> Everyone who is unhappy with the relea

[jira] Commented: (LUCENE-1692) Contrib analyzers need tests

2009-06-18 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721418#action_12721418 ] Robert Muir commented on LUCENE-1692: - michael: I'm think I'm done here. if you consi

[jira] Updated: (LUCENE-1692) Contrib analyzers need tests

2009-06-18 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-1692: Attachment: LUCENE-1692.txt added tests for czech. added additional tests for smartchineseanalyzer

[jira] Commented: (LUCENE-1677) Remove GCJ IndexReader specializations

2009-06-18 Thread Hoss Man (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721410#action_12721410 ] Hoss Man commented on LUCENE-1677: -- {quote} I did ask: http://www.mail-archive.com/java-

[jira] Updated: (LUCENE-1700) LogMergePolicy.findMergesToExpungeDeletes need to get deletes from the SegmentReader

2009-06-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1700: --- Attachment: LUCENE-1700.patch Attached patch. I added a test case showing it, then

[jira] Commented: (LUCENE-1673) Move TrieRange to core

2009-06-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721326#action_12721326 ] Michael McCandless commented on LUCENE-1673: Latest patch looks good Uwe! We

[jira] Commented: (LUCENE-1595) Split DocMaker into ContentSource and DocMaker

2009-06-18 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721320#action_12721320 ] Shai Erera commented on LUCENE-1595: bq. they won't end up coming after you, they will

[jira] Commented: (LUCENE-1595) Split DocMaker into ContentSource and DocMaker

2009-06-18 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721311#action_12721311 ] Mark Miller commented on LUCENE-1595: - bq. I added readContentSource.alg just for that

[jira] Updated: (LUCENE-1630) Mating Collector and Scorer on doc Id orderness

2009-06-18 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-1630: --- Attachment: LUCENE-1630.patch Implemented the latest comments, except for TopScorer > Mating Collec

[jira] Commented: (LUCENE-1630) Mating Collector and Scorer on doc Id orderness

2009-06-18 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721285#action_12721285 ] Shai Erera commented on LUCENE-1630: bq. CustomScorer's nextDoc uses advance on its su

[jira] Commented: (LUCENE-1693) AttributeSource/TokenStream API improvements

2009-06-18 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721281#action_12721281 ] Grant Ingersoll commented on LUCENE-1693: - {quote} By the way, I tested Solr's tok

Re: Fuzzy search change

2009-06-18 Thread Michael McCandless
This would make an awesome addition to Lucene! This is similar to how Lucene's spellchecker identifies candidates, if I understand it right. Would you be able to port it to java? Mike On Thu, Jun 18, 2009 at 7:12 AM, Varun Dhussa wrote: > Hi, > > I wrote on this a long time ago, but haven't fol

[jira] Commented: (LUCENE-1630) Mating Collector and Scorer on doc Id orderness

2009-06-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721276#action_12721276 ] Michael McCandless commented on LUCENE-1630: bq. Can you please next time giv

[jira] Updated: (LUCENE-1693) AttributeSource/TokenStream API improvements

2009-06-18 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1693: -- Attachment: LUCENE-1693.patch Again an update: Unified the reuseable tokens in the TokenWrappe

[jira] Updated: (LUCENE-1693) AttributeSource/TokenStream API improvements

2009-06-18 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1693: -- Attachment: (was: LUCENE-1693.patch) > AttributeSource/TokenStream API improvements >

[jira] Commented: (LUCENE-1630) Mating Collector and Scorer on doc Id orderness

2009-06-18 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721208#action_12721208 ] Shai Erera commented on LUCENE-1630: bq. I think CustomScoreQuery.scorer should actual

[jira] Commented: (LUCENE-1630) Mating Collector and Scorer on doc Id orderness

2009-06-18 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721204#action_12721204 ] Shai Erera commented on LUCENE-1630: bq. QyertWeight -> QueryWeight I'll fix. Can you

[jira] Commented: (LUCENE-1630) Mating Collector and Scorer on doc Id orderness

2009-06-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721196#action_12721196 ] Michael McCandless commented on LUCENE-1630: * I wonder if we should have a

[jira] Assigned: (LUCENE-1700) LogMergePolicy.findMergesToExpungeDeletes need to get deletes from the SegmentReader

2009-06-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-1700: -- Assignee: Michael McCandless > LogMergePolicy.findMergesToExpungeDeletes need

Fuzzy search change

2009-06-18 Thread Varun Dhussa
Hi, I wrote on this a long time ago, but haven't followed it up. I just finished a C++ implementation of a spell check module in my software. I borrowed the idea from Xapian. It is to use a trigram index to filter results, and then use Edit Distance on the filtered set. Would such a solution

[jira] Commented: (LUCENE-1693) AttributeSource/TokenStream API improvements

2009-06-18 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721158#action_12721158 ] Uwe Schindler commented on LUCENE-1693: --- By the way, I tested Solr's token streams a

[jira] Updated: (LUCENE-1673) Move TrieRange to core

2009-06-18 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1673: -- Attachment: LUCENE-1673.patch Final patch version with updated javadocs. I will commit in a da

[jira] Commented: (LUCENE-1693) AttributeSource/TokenStream API improvements

2009-06-18 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721142#action_12721142 ] Uwe Schindler commented on LUCENE-1693: --- OK, we can merge our patches then! At the m

Re: Lucene 2.9 Again

2009-06-18 Thread Michael McCandless
On Wed, Jun 17, 2009 at 4:13 PM, Mark Miller wrote: > Michael Busch wrote: >> >> Everyone who is unhappy with the release TODO's, go back in your mail >> archive to the 2.2 release and check how many tedious little changes we made >> to improve the release quality. And besides the maven stuff, ther

Re: Synchronizing Lucene indexes across 2 application servers

2009-06-18 Thread Michael McCandless
Could you re-ask this on java-user instead? Thanks! (java-dev is for discussing how to make changes to Lucene; java-user is for discussing usage of Lucene). Mike On Thu, Jun 18, 2009 at 2:13 AM, mitu2009 wrote: > > I've a web application which uses Lucene for search functionality. Lucene > sear

[jira] Commented: (LUCENE-1693) AttributeSource/TokenStream API improvements

2009-06-18 Thread Michael Busch (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721135#action_12721135 ] Michael Busch commented on LUCENE-1693: --- {quote} For backwards-compatiblility we sho

[jira] Commented: (LUCENE-1696) Added New Token API impl for ASCIIFoldingFilter

2009-06-18 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721116#action_12721116 ] Simon Willnauer commented on LUCENE-1696: - I will be around and fix / adjust it if