Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

2012-12-13 Thread Michael McCandless
On Wed, Dec 12, 2012 at 3:02 PM, Wu, Stephen T., Ph.D. wrote: >>> Is there any (preliminary) code checked in somewhere that I can look at, >>> that would help me understand the practical issues that would need to be >>> addressed? >> >> Maybe we can make this more concrete: what new attribute are

Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

2012-12-13 Thread Michael McCandless
On Wed, Dec 12, 2012 at 9:08 PM, lukai wrote: > Do we have any plan to decouple the index process? > > Lucene was design for search, but according the question people ask in the > thread it beyonds search functionality sometimes. Like we might want to > customize our scoring function based on payl

Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

2012-12-13 Thread Carsten Schnober
Am 13.12.2012 12:27, schrieb Michael McCandless: >> For example: >> - part of speech of a token. >> - syntactic parse subtree (over a span). >> - semantically normalized phrase (to canonical text or ontological code). >> - semantic group (of a span). >> - coreference link. > > So for example

Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

2012-12-13 Thread Glen Newton
>Unfortunately, Lucene doesn't properly index spans (it records the start position but not the end position), so that limits what kind of matching you can do at search time. If this could be fixed (i.e. indexing the _end_ of a span) I think all the things that I want to do, and the things that can

Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

2012-12-13 Thread Wu, Stephen T., Ph.D.
That would be really nice. Full standoff annotations open a lot of doors. If we had them, though, I'm not sure exactly which of Mike's methods you'd use? I thought payloads were completely token-based and could not be attached to spans regardless. And the SynonymFilter is really to mimic the beh

Boolean and SpanQuery: different results

2012-12-13 Thread Carsten Schnober
Hi, I'm following Grant's advice on how to combine BooleanQuery and SpanQuery (http://mail-archives.apache.org/mod_mbox/lucene-java-user/201003.mbox/%3c08c90e81-1c33-487a-9e7d-2f05b2779...@apache.org%3E). The strategy is to perform a BooleanQuery, get the document ID set and perform a SpanQuery re

Re: Boolean and SpanQuery: different results

2012-12-13 Thread Jack Krupansky
Can you provide some examples of terms that don't work and the index token stream they fail on? Make sure that the Analyzer you are using doesn't do any magic on the indexed terms - your query term is unanalyzed. Maybe multiple, but distinct, index terms are analyzing to the same, but unexpect

Re: Boolean and SpanQuery: different results

2012-12-13 Thread Carsten Schnober
Am 13.12.2012 18:00, schrieb Jack Krupansky: > Can you provide some examples of terms that don't work and the index > token stream they fail on? The index I'm testing with is German Wikipedia and I've been testing with different (arbitrarily chosen) terms. I'm listing some results, the first numbe

Using GeohashPrefixTree for map clustering

2012-12-13 Thread Neil Ireson
Hi, I would like to be able to display up to multiple millions of lat/lng points on a map, to make this possible my intention is to plot less than 1000 clusters of points by dividing the world into a grid tree and I'm looking into using GeohashPrefixTree to do this. I am imagining that I

Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

2012-12-13 Thread Lance Norskog
Parts-of-speech is available now, in the indexer. LUCENE-2899 adds OpenNLP to the Lucene&Solr codebase. It does parts-of-speech, chunking and Named Entity Recognition. OpenNLP is an Apache project for natural-language processing. Some parts are in Solr that could be in Lucene. https://issues

Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

2012-12-13 Thread Glen Newton
It is not clear this is exactly what is needed/being discussed. >From the issue: "We are also planning a Tokenizer/TokenFilter that can put parts of speech as either payloads (PartOfSpeechAttribute?) on a token or at the same position." This adds it to a token, not a span. 'same position' does no

Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

2012-12-13 Thread Lance Norskog
I should not have added that note. The Opennlp patch gives a concrete example of adding an annotation to text. On 12/13/2012 01:54 PM, Glen Newton wrote: It is not clear this is exactly what is needed/being discussed. From the issue: "We are also planning a Tokenizer/TokenFilter that can put

Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

2012-12-13 Thread Glen Newton
Cool! Sounds great! :-) Any pointers to a (Lucene) example that attaches a payload to a start..end span that is more than one token? thanks, -Glen On Thu, Dec 13, 2012 at 5:03 PM, Lance Norskog wrote: > I should not have added that note. The Opennlp patch gives a concrete > example of adding a

Re: CheckIndex ArrayIndexOutOfBounds error for merged index

2012-12-13 Thread Tom Burton-West
Hi Robert, Our sysadmins installed a later java version (info below) and I redid the merge and then ran CheckIndex both using Java7. Same error (appended below). I suppose I could try merging 2 indexes, run checkindex and if its ok merge 3 indexes etc up to 12 to find the point where the proble

Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

2012-12-13 Thread SUJIT PAL
Hi Glen, I don't believe you can attach a single payload to multiple tokens. What I did for a similar requirement was to combine the tokens into a single "_" delimited single token and attached the payload to it. For example: The Big Bad Wolf huffed and puffed and blew the house of the Three Li

Re: Using GeohashPrefixTree for map clustering

2012-12-13 Thread sgruhier
Hi May be, I'm out of your scope. I'm the founder of maptimize, our new version wan handle million of points. If you are intersested, look at http://v3.maptimize.com and our demo http://onemilliontweetmap.com/. -- View this message in context: http://lucene.472066.n3.nabble.com/Using-GeohashPr