[jira] Commented: (LUCENE-1478) Missing possibility to supply custom FieldParser when sorting search results

2009-01-30 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12668777#action_12668777 ] Uwe Schindler commented on LUCENE-1478: --- After reading your comment several times

[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs

2009-01-30 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12668797#action_12668797 ] Michael McCandless commented on LUCENE-1476: bq. Presumably you spliced the

[jira] Commented: (LUCENE-1478) Missing possibility to supply custom FieldParser when sorting search results

2009-01-30 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12668799#action_12668799 ] Michael McCandless commented on LUCENE-1478: Yonik, why was the failure so

[jira] Commented: (LUCENE-1478) Missing possibility to supply custom FieldParser when sorting search results

2009-01-30 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12668802#action_12668802 ] Michael McCandless commented on LUCENE-1478: bq. Write a FloatParser that maps

[jira] Updated: (LUCENE-1506) Adding FilteredDocIdSet and FilteredDocIdSetIterator

2009-01-30 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1506: --- Attachment: LUCENE-1506.patch Thanks John! I made a few tweaks (downgraded to Java

[jira] Commented: (LUCENE-1478) Missing possibility to supply custom FieldParser when sorting search results

2009-01-30 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12668812#action_12668812 ] Uwe Schindler commented on LUCENE-1478: --- bq. Uwe, would that result in a memory

[jira] Commented: (LUCENE-1478) Missing possibility to supply custom FieldParser when sorting search results

2009-01-30 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12668815#action_12668815 ] Uwe Schindler commented on LUCENE-1478: --- By the way: The Cache of FieldCache

[jira] Updated: (LUCENE-1476) BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs

2009-01-30 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1476: --- Attachment: hacked-deliterator.patch Alas I had a bug in my original test (my

Re: Realtime Search

2009-01-30 Thread Michael McCandless
Jason Rutherglen jason.rutherg...@gmail.com wrote: We'd also need to ensure when a merge kicks off, the SegmentReaders used by the merging are not newly reopened but also borrowed from The IW merge code currently opens the SegmentReader with a 4096 buffer size (different than the 1024

[jira] Commented: (LUCENE-1478) Missing possibility to supply custom FieldParser when sorting search results

2009-01-30 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12668889#action_12668889 ] Yonik Seeley commented on LUCENE-1478: -- Apologies, I meant to post in LUCENE-1483

[jira] Commented: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector

2009-01-30 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12668910#action_12668910 ] Michael McCandless commented on LUCENE-1483: One immediate workaround would be

[jira] Commented: (LUCENE-1314) IndexReader.clone

2009-01-30 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12668930#action_12668930 ] Jason Rutherglen commented on LUCENE-1314: -- Cool, cheers Mike!

[jira] Commented: (LUCENE-1314) IndexReader.clone

2009-01-30 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12668941#action_12668941 ] Jason Rutherglen commented on LUCENE-1314: -- I'm thinking of implementing a follow

BloomFilter-s with Lucene

2009-01-30 Thread Andrzej Bialecki
Hi all, I've been using BloomFilters for various tasks, and I can't shake the feeling that they could be of some use in Lucene internals, to speed up various membership tests, especially if we look for 100% correct negatives, and we can accept a small rate of false positives. For example,

[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs

2009-01-30 Thread Marvin Humphrey (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12668946#action_12668946 ] Marvin Humphrey commented on LUCENE-1476: - Actually I used your entire patch on

[jira] Created: (LUCENE-1532) File based spellcheck with doc frequencies supplied

2009-01-30 Thread David Bowen (JIRA)
File based spellcheck with doc frequencies supplied --- Key: LUCENE-1532 URL: https://issues.apache.org/jira/browse/LUCENE-1532 Project: Lucene - Java Issue Type: New Feature

[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs

2009-01-30 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12668954#action_12668954 ] Jason Rutherglen commented on LUCENE-1476: -- Maybe we should close this issue

[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs

2009-01-30 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12668971#action_12668971 ] Jason Rutherglen commented on LUCENE-1476: -- {quote} Just run sortBench2.py in

Re: BloomFilter-s with Lucene

2009-01-30 Thread markharw00d
Andrzej Bialecki wrote: Funny, I was having vague thoughts about this today too having been concerned about some of the big arrays that can end up in a typical Lucene app. Aside from providing space-efiicient lookups, another application for BloomFilters is in similarity measures e.g. ANDing

Re: BloomFilter-s with Lucene

2009-01-30 Thread Andrzej Bialecki
markharw00d wrote: Andrzej Bialecki wrote: Funny, I was having vague thoughts about this today too having been concerned about some of the big arrays that can end up in a typical Lucene app. Aside from providing space-efiicient lookups, another application for BloomFilters is in similarity

[jira] Commented: (LUCENE-1532) File based spellcheck with doc frequencies supplied

2009-01-30 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12669018#action_12669018 ] Eks Dev commented on LUCENE-1532: - bq. so it can suggest a very obscure word rather than a

Re: BloomFilter-s with Lucene

2009-01-30 Thread pdecrem
Well. I used 2 Broder similarity measures, and it works well. You obviously need to pick the right size bf's. Navendu Jain has a paper called using bloomfilters to refine web search results, which I think is relevant here. I talks about how remove near duplicate search results using bf's.

[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs

2009-01-30 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12669024#action_12669024 ] Michael McCandless commented on LUCENE-1476: {quote} Thanks for running all

[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs

2009-01-30 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12669025#action_12669025 ] Michael McCandless commented on LUCENE-1476: {quote} This seems like something

Re: BloomFilter-s with Lucene

2009-01-30 Thread eks dev
I have used them for speeding up huge switch clauses in charset normalization (eg lowercase and accent-plain form mapping). Big number of accented characters (this causes big switch statement) that appear seldom in corpus (big majority being not accented). If negative test, you do just simple

[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs

2009-01-30 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12669026#action_12669026 ] Michael McCandless commented on LUCENE-1476: bq. We need more performance data

[jira] Commented: (LUCENE-1532) File based spellcheck with doc frequencies supplied

2009-01-30 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12669029#action_12669029 ] Mark Miller commented on LUCENE-1532: - Our spellchecking def needs improvement. I

Re: BloomFilter-s with Lucene

2009-01-30 Thread Andi Vajda
On Fri, 30 Jan 2009, eks dev wrote: I have used them for speeding up huge switch clauses in charset normalization (eg lowercase and accent-plain form mapping). Big number of accented characters (this causes big switch statement) that appear seldom in corpus (big majority being not accented).

Re: [jira] Commented: (LUCENE-1476) BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs

2009-01-30 Thread eks dev
Maybe we should close this issue with a won't-fix and start a new one for filtered deletions? A few thoughts, without looking at the code, just thinking aloud :) It is inverted filter what we are talking about here, Lucene uses Filter as a pass filter (Set bit defines document that should

Re: [jira] Commented: (LUCENE-1476) BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs

2009-01-30 Thread Paul Elschot
On Friday 30 January 2009 23:24:42 eks dev wrote: ... This is conceptually almost equal (fully equal, when Paul gets Fillters as bolean clauses done) to having separate, single valued field indexed isDeleted {true, false} where each Query gets implicitly transformed to OriginalQuery

Re: BloomFilter-s with Lucene

2009-01-30 Thread eks dev
unfortunately this code is not mine, but is rather simple to try it: int bloom_filter; for (char accent : accents ) { bloom_filter = bloom_filter | 1 ( accent 0x1F ); } the rest is easy, this works well for 10-20 chars per bloom_filter, depends on

Re: [jira] Commented: (LUCENE-1476) BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs

2009-01-30 Thread eks dev
indeed :) From: Paul Elschot paul.elsc...@xs4all.nl To: java-dev@lucene.apache.org Sent: Friday, 30 January, 2009 23:37:08 Subject: Re: [jira] Commented: (LUCENE-1476) BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs On Friday 30

Re: Realtime Search

2009-01-30 Thread Jason Rutherglen
deletes made through reader (by docID) are immediately visible, but through writer are buffered until a flush or reopen? This is what I was thinking, IW buffers deletes, IR does not. Making IW.deletes visible immediately by applying them to the IR makes sense as well. What should be the

[jira] Created: (LUCENE-1533) Deleted documents as a Filter or top level Query

2009-01-30 Thread Jason Rutherglen (JIRA)
Deleted documents as a Filter or top level Query Key: LUCENE-1533 URL: https://issues.apache.org/jira/browse/LUCENE-1533 Project: Lucene - Java Issue Type: Improvement Components:

[jira] Commented: (LUCENE-1506) Adding FilteredDocIdSet and FilteredDocIdSetIterator

2009-01-30 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12669105#action_12669105 ] John Wang commented on LUCENE-1506: --- Thanks Michael! Adding FilteredDocIdSet and

Build failed in Hudson: Lucene-trunk #723

2009-01-30 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/723/changes Changes: [uschindler] fix javadocs [uschindler] Add some extra check for validity of c'tor parameters in TrieRangeFilter [mikemccand] LUCENE-1314: add IndexReader.clone(boolean readOnly) and reopen(boolean readOnly)

Sorting lucene search results

2009-01-30 Thread mitu2009
Hi, I'm using following code to get execute search query in Lucene.Net var collector = new GroupingHitCollector(searcher.GetIndexReader());searcher.Search(myQuery, collector);resultsCount = collector.Hits.Count;How do i sort these search results based on a field? I need to use collector

Re: Sorting lucene search results

2009-01-30 Thread Anshum
Hi Mitu, Could we have usage/implementation based questions at the user forum. Would help keep things segregated :). About your problem though, I wouldn't know about the .net port. You could (in Java Lucene) use: public TopFieldDocCollector(IndexReader reader, Sort sort, int numHits) i.e.: