Re: sort lucene results

2009-02-25 Thread Chris Hostetter
: but i need the result by the word place in the sentence like this: : : "bbb text 4...". , "text 2 bbb text " , "text 1 ok ok ok bbb" .. 1) SpanFirstQuery should work, it scores higher the closer the nested query is to the start -- just use a really high limit,. if you are only dealing with

[jira] Updated: (LUCENE-1548) LevenshteinDistance code normalization is incorrect

2009-02-25 Thread Thomas Morton (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Morton updated LUCENE-1548: -- Attachment: LUCENE-1548.patch Fixes issue (changes min to max in distance computation) and cor

[jira] Created: (LUCENE-1548) LevenshteinDistance code normalization is incorrect

2009-02-25 Thread Thomas Morton (JIRA)
LevenshteinDistance code normalization is incorrect --- Key: LUCENE-1548 URL: https://issues.apache.org/jira/browse/LUCENE-1548 Project: Lucene - Java Issue Type: Bug Components: cont

Re: Integrating Language Models into Lucene

2009-02-25 Thread Earwin Burrfoot
Have you looked at MG4J (http://mg4j.dsi.unimi.it/)? Last time I did, it looked like an opposite of lucene - nice and up-to-date algorithmics, but hard to apply to complex real-world tasks. On Thu, Feb 26, 2009 at 04:21, Koren Krupko wrote: > > Hello Lucene Developers! > > My name is Koren Krupko

[jira] Updated: (LUCENE-1516) Integrate IndexReader with IndexWriter

2009-02-25 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-1516: - Attachment: LUCENE-1516.patch - commitMergedDeletes uses the docIdMap from commitM

Integrating Language Models into Lucene

2009-02-25 Thread Koren Krupko
Hello Lucene Developers! My name is Koren Krupko. I'm quite new to Lucene but I do have experience in research in the fields of information retrieval. After reviewing Lucene's capabilities I understand that one of its major strengths is its scalability (as opposed to other frameworks such as Lemu

[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

2009-02-25 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676815#action_12676815 ] Jason Rutherglen commented on LUCENE-1516: -- Found the doc ID map in SegmentMerger

[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

2009-02-25 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676811#action_12676811 ] Jason Rutherglen commented on LUCENE-1516: -- {quote} I think we should change comm

Re: Use of Unicode data in Lucene

2009-02-25 Thread Robert Muir
Ken, Just my opinion here... i work with a lot of multilingual data with lucene. I can't imagine many serious real-world applications doing things such as search that wouldn't need ICU for something anyway... even if its not the lucene piece requiring it... I hope this doesn't discourage you from

[jira] Resolved: (LUCENE-1398) Add ReverseStringFilter

2009-02-25 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley resolved LUCENE-1398. -- Resolution: Fixed Fix Version/s: 2.9 Thanks, I just committed this. > Add ReverseStrin

Use of Unicode data in Lucene

2009-02-25 Thread Ken Krugler
Hi all, I've started working on something similar to https://issues.apache.org/jira/browse/LUCENE-1343, which is about creating a better (more universal) normalizer for words that "look the same". I'd like to avoid the dependency on ICU4J, which (I think) would otherwise prevent the code fr

[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

2009-02-25 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676754#action_12676754 ] Michael McCandless commented on LUCENE-1500: bq. So to be consistent, where el

[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

2009-02-25 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676749#action_12676749 ] Michael McCandless commented on LUCENE-1516: Looks good!: * The SegmentRead

[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

2009-02-25 Thread Mark Harwood (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676745#action_12676745 ] Mark Harwood commented on LUCENE-1500: -- So to be consistent, where else in Lucene mig

[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

2009-02-25 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676728#action_12676728 ] Michael McCandless commented on LUCENE-1500: The thing is, since it's an unch

[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

2009-02-25 Thread Hoss Man (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676714#action_12676714 ] Hoss Man commented on LUCENE-1500: -- bq. Perhaps we can turn this around and ask "under wh

[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

2009-02-25 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676713#action_12676713 ] Jason Rutherglen commented on LUCENE-1516: -- We'll need a method to check the Segm

[jira] Updated: (LUCENE-1516) Integrate IndexReader with IndexWriter

2009-02-25 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-1516: - Attachment: LUCENE-1516.patch - Revised to trunk - Still an incRef issue. TestIndexWrite

[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

2009-02-25 Thread Koji Sekiguchi (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676682#action_12676682 ] Koji Sekiguchi commented on LUCENE-1500: Can you post your document and schema so

RE: sort lucene results

2009-02-25 Thread Uwe Schindler
Go to the analyzer package decription, there is an example of a TokenFilter. Just step into your analyzers' TokenStream and implement a TokenFilter for it. The method next() is called for each token by o.a.l.d.Field on indexing. In documentation of http://lucene.apache.org/java/2_4_0/api/org/apac

RE: sort lucene results

2009-02-25 Thread shb
i set the index field like this: Field nameField = null; while(rs.next() == true) { String name = rs.getString("name"); nameField = new Field("name",name.trim(),Field.Store.YES,Field.Index.TOKENIZED); doc.add(nameField);

Re: LIA2 on l.a.o/java OK?

2009-02-25 Thread Grant Ingersoll
+1 On Feb 23, 2009, at 13:24, Chris Hostetter wrote: : I'm OK with LIA2 on the front page - as Erik suggests it does help lend : credibility to a project. +1 to more visibility to books focused on lucene on "official" www site pages (not just hte wiki) +1 to prominent display via

RE: sort lucene results

2009-02-25 Thread Uwe Schindler
With a custom Tokenizer/Analyzer you could boost the words (tokens) during indexing by their position, e.g. first word gets factor 100, second 99 and so on. As sorting is by relevance, hits where the word is more at the beginning gets higher ranking because of boost. - Uwe Schindler H.-H.-Meie

sort lucene results

2009-02-25 Thread shb
hi i need help. i need to search by word in sentences with lucene. for example by the word "bbb" i got the right results of all the sentences : "text ok ok ok bbb" , "text 2 bbb text " , "bbb text 4...". but i need the result by the word place in the sentence like this: "bbb text 4...". , "

[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

2009-02-25 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676652#action_12676652 ] Michael McCandless commented on LUCENE-1516: Jason could you rebase the patch

[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

2009-02-25 Thread Peter Wolanin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676648#action_12676648 ] Peter Wolanin commented on LUCENE-1500: --- Yes - this patch is not a fix - but a work-

[jira] Updated: (LUCENE-1398) Add ReverseStringFilter

2009-02-25 Thread Koji Sekiguchi (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated LUCENE-1398: --- Attachment: LUCENE-1398.patch {quote} I don't know how others feel, but I'd personally like

[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

2009-02-25 Thread Mark Harwood (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676633#action_12676633 ] Mark Harwood commented on LUCENE-1500: -- Hmmm. I'm not so sure that this "defensive co

[jira] Assigned: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

2009-02-25 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-1500: -- Assignee: Michael McCandless > Highlighter throws StringIndexOutOfBoundsExcept

[jira] Updated: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

2009-02-25 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1500: --- Lucene Fields: [New, Patch Available] (was: [Patch Available, New]) Fix Version/

[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

2009-02-25 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676611#action_12676611 ] Michael McCandless commented on LUCENE-1500: I think the defensive coding (the