JDBC access to a Lucene index

2009-10-16 Thread Jukka Zitting
Hi, Some while ago I implemented a simple JDBC to JCR bridge [1] that allows one to query a JCR repository from any JDBC client, most notably various reporting tools. Now I'm wondering if something similar already exists for a normal Lucene index. Something that would treat your entire index as

[jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2009-10-16 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766481#action_12766481 ] Michael McCandless commented on LUCENE-1458: OK thank for addressing the new

[jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2009-10-16 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766482#action_12766482 ] Michael McCandless commented on LUCENE-1458: bq. you wanna remove them commit

Re: lucene 2.9 sorting algorithm

2009-10-16 Thread Michael McCandless
Thanks John; I'll have a look. Mike On Fri, Oct 16, 2009 at 12:57 AM, John Wang john.w...@gmail.com wrote: Hi Michael:     I added classes: ScoreDocComparatorQueue and OneSortNoScoreCollector as a more general case. I think keeping the old api for ScoreDocComparator and SortComparatorSource

Re: search trough single pdf document - return page number

2009-10-16 Thread IvanDrago
Hey! I did it! Eric and Robert, you helped a lot. Thanks! I didn't use LucenePDFDocument. I created a new document for every page in a PDF document and added paga number info for every page. PDDocument pddDocument=PDDocument.load(f); PDFTextStripper textStripper=new

Re: search trough single pdf document - return page number

2009-10-16 Thread Erick Erickson
Glad things are progressing. The only problem here will be proximityqueries that span pages. Say, the last word on page 10 is salmon and the first word on page 11 is fishing. Structuring your index this way won't find the a proximity search for salmon fishing. If that's not a concern, then

Re: search trough single pdf document - return page number

2009-10-16 Thread IvanDrago
proximity queries that span pages are not a concern in my case. I asked another question on the bottom of my last post. Could you comment on that If you have some ideas? Erick Erickson wrote: Glad things are progressing. The only problem here will be proximityqueries that span pages. Say,

Re: search trough single pdf document - return page number

2009-10-16 Thread Erick Erickson
Well, you have to add another field to each document identifying thePDF it came from. From there, restricting to that doc just becomes adding an AND clause. Of course how you specify these is an exercise left to the reader G. Erick On Fri, Oct 16, 2009 at 8:01 AM, IvanDrago idrag...@gmail.com

Re: search trough single pdf document - return page number

2009-10-16 Thread IvanDrago
Yes, I tough of that too but i didn't know if I could search trough index only documents that have specific field name. After some researching I found a way to do that: String q = title:ant; Query query = parser.parse(q); title:ant - Contain the term ant in the title field Regards, Ivan

[jira] Updated: (LUCENE-1984) DisjunctionMaxQuery - Type safety

2009-10-16 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1984: -- Attachment: LUCENE-1984.patch Small updates in Patch (also implemented Iterable). I also

[jira] Resolved: (LUCENE-1984) DisjunctionMaxQuery - Type safety

2009-10-16 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler resolved LUCENE-1984. --- Resolution: Fixed Committed revision: 825881 Thanks Kay Kay! DisjunctionMaxQuery - Type

[jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2009-10-16 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766562#action_12766562 ] Mark Miller commented on LUCENE-1458: - just committed an initial stab at pulsing cache

[jira] Closed: (LUCENE-1984) DisjunctionMaxQuery - Type safety

2009-10-16 Thread Kay Kay (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kay Kay closed LUCENE-1984. --- Thanks Uwe. The revised patch looks good as well, with better code readability. DisjunctionMaxQuery - Type

[jira] Created: (LUCENE-1985) DisjunctionMaxQuery - Iterator code to for ( A a : container ) construct

2009-10-16 Thread Kay Kay (JIRA)
DisjunctionMaxQuery - Iterator code to for ( A a : container ) construct --- Key: LUCENE-1985 URL: https://issues.apache.org/jira/browse/LUCENE-1985 Project: Lucene - Java

[jira] Created: (LUCENE-1986) NPE in NearSpansUnordered from PayloadNearQuery

2009-10-16 Thread Peter Keegan (JIRA)
NPE in NearSpansUnordered from PayloadNearQuery --- Key: LUCENE-1986 URL: https://issues.apache.org/jira/browse/LUCENE-1986 Project: Lucene - Java Issue Type: Bug Components: Search

[jira] Updated: (LUCENE-1985) DisjunctionMaxQuery - Iterator code to for ( A a : container ) construct

2009-10-16 Thread Kay Kay (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kay Kay updated LUCENE-1985: Attachment: LUCENE-1985.patch DisjunctionMaxQuery - Iterator code to for ( A a : container )

[jira] Updated: (LUCENE-1986) NPE in NearSpansUnordered from PayloadNearQuery

2009-10-16 Thread Peter Keegan (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Keegan updated LUCENE-1986: - Attachment: TestPayloadNearQuery1.java Unit test that causes NPE NPE in NearSpansUnordered

[jira] Commented: (LUCENE-1984) DisjunctionMaxQuery - Type safety

2009-10-16 Thread Kay Kay (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766587#action_12766587 ] Kay Kay commented on LUCENE-1984: - As a related patch - LUCENE-1985 added to improve

[jira] Updated: (LUCENE-1124) short circuit FuzzyQuery.rewrite when input token length is small compared to minSimilarity

2009-10-16 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1124: --- Attachment: LUCENE-1124.patch Attach patch (based on 2.9) showing the bug, along

[jira] Reopened: (LUCENE-1124) short circuit FuzzyQuery.rewrite when input token length is small compared to minSimilarity

2009-10-16 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reopened LUCENE-1124: This fix breaks the case when the exact term is present in the index. short circuit

[jira] Resolved: (LUCENE-1985) DisjunctionMaxQuery - Iterator code to for ( A a : container ) construct

2009-10-16 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler resolved LUCENE-1985. --- Resolution: Fixed Fix Version/s: 3.0 Assignee: Uwe Schindler Committed

Re: lucene 2.9 sorting algorithm

2009-10-16 Thread John Wang
Mike, just a clarification on my first perf report email. The first section, numHits is incorrectly labeled, it should be 20 instead of 50. Sorry about the possible confusion. Thanks -John On Fri, Oct 16, 2009 at 3:21 AM, Michael McCandless luc...@mikemccandless.com wrote: Thanks John; I'll

Re: lucene 2.9 sorting algorithm

2009-10-16 Thread Michael McCandless
Oh, no problem... Mike On Fri, Oct 16, 2009 at 12:33 PM, John Wang john.w...@gmail.com wrote: Mike, just a clarification on my first perf report email. The first section, numHits is incorrectly labeled, it should be 20 instead of 50. Sorry about the possible confusion. Thanks -John On

ant build-contrib fails on trunk?

2009-10-16 Thread Michael McCandless
When I run ant build-contrib on current trunk, I hit this: compile-core: [javac] Compiling 1 source file to /lucene/tmp2/build/contrib/instantiated/classes/java [javac]

[jira] Updated: (LUCENE-1124) short circuit FuzzyQuery.rewrite when input token length is small compared to minSimilarity

2009-10-16 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1124: --- Fix Version/s: (was: 2.9) 3.0 2.9.1

RE: ant build-contrib fails on trunk?

2009-10-16 Thread Uwe Schindler
I'll fix, this is because of generics and compareTo(). I revert the change. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Friday, October 16,

Re: ant build-contrib fails on trunk?

2009-10-16 Thread Robert Muir
yes, not just you On Fri, Oct 16, 2009 at 1:00 PM, Michael McCandless luc...@mikemccandless.com wrote: When I run ant build-contrib on current trunk, I hit this: compile-core: [javac] Compiling 1 source file to /lucene/tmp2/build/contrib/instantiated/classes/java [javac]

RE: ant build-contrib fails on trunk?

2009-10-16 Thread Uwe Schindler
It was not the generics change, it was a bug in the comparator. There was one getTerm() missing. I'll add. The compile found the error, because of generics, the signature didn't match correct (in 1.4 it was just Object without a generics hint, now its Object and Term, but InstantiatedTerm does

Re: ant build-contrib fails on trunk?

2009-10-16 Thread Michael McCandless
OK thanks! Mike On Fri, Oct 16, 2009 at 1:09 PM, Uwe Schindler u...@thetaphi.de wrote: I'll fix, this is because of generics and compareTo(). I revert the change. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original

[jira] Resolved: (LUCENE-1124) short circuit FuzzyQuery.rewrite when input token length is small compared to minSimilarity

2009-10-16 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1124. Resolution: Fixed short circuit FuzzyQuery.rewrite when input token length is

[jira] Updated: (LUCENE-1257) Port to Java5

2009-10-16 Thread Kay Kay (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kay Kay updated LUCENE-1257: Attachment: LUCENE-1257-DisjunctionMaxQuery-more_type_safety.patch * DisjunctionMaxQuery.java - some of

[jira] Commented: (LUCENE-1985) DisjunctionMaxQuery - Iterator code to for ( A a : container ) construct

2009-10-16 Thread Kay Kay (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766652#action_12766652 ] Kay Kay commented on LUCENE-1985: - Thanks Uwe. Added another patch to LUCENE-1257 to get

[jira] Commented: (LUCENE-1976) isCurrent() and getVersion() on an NRT reader are broken

2009-10-16 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766654#action_12766654 ] Michael McCandless commented on LUCENE-1976: I plan to back-port this to

[jira] Commented: (LUCENE-1257) Port to Java5

2009-10-16 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766657#action_12766657 ] Uwe Schindler commented on LUCENE-1257: --- Committed revision: 826035 Port to Java5

Re: JDBC access to a Lucene index

2009-10-16 Thread Grant Ingersoll
I'm not aware of any, but you might get more mileage asking on java- user. On Oct 16, 2009, at 3:54 AM, Jukka Zitting wrote: Hi, Some while ago I implemented a simple JDBC to JCR bridge [1] that allows one to query a JCR repository from any JDBC client, most notably various reporting tools.

[jira] Resolved: (LUCENE-1976) isCurrent() and getVersion() on an NRT reader are broken

2009-10-16 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1976. Resolution: Fixed Fix Version/s: (was: 3.1) 3.0

[jira] Created: (LUCENE-1987) Remove rest of analysis deprecations (Token, CharacterCache)

2009-10-16 Thread Uwe Schindler (JIRA)
Remove rest of analysis deprecations (Token, CharacterCache) Key: LUCENE-1987 URL: https://issues.apache.org/jira/browse/LUCENE-1987 Project: Lucene - Java Issue Type: Task

[jira] Updated: (LUCENE-1987) Remove rest of analysis deprecations (Token, CharacterCache)

2009-10-16 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1987: -- Attachment: LUCENE-1987.patch Pastch with the first three points. The three deprecated

[jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2009-10-16 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766832#action_12766832 ] Mark Miller commented on LUCENE-1458: - Almost got an initial rough stab at the sep