On Wed, Dec 12, 2012 at 3:02 PM, Wu, Stephen T., Ph.D.
wrote:
>>> Is there any (preliminary) code checked in somewhere that I can look at,
>>> that would help me understand the practical issues that would need to be
>>> addressed?
>>
>> Maybe we can make this more concrete: what new attribute are
On Wed, Dec 12, 2012 at 9:08 PM, lukai wrote:
> Do we have any plan to decouple the index process?
>
> Lucene was design for search, but according the question people ask in the
> thread it beyonds search functionality sometimes. Like we might want to
> customize our scoring function based on payl
Am 13.12.2012 12:27, schrieb Michael McCandless:
>> For example:
>> - part of speech of a token.
>> - syntactic parse subtree (over a span).
>> - semantically normalized phrase (to canonical text or ontological code).
>> - semantic group (of a span).
>> - coreference link.
>
> So for example
>Unfortunately, Lucene doesn't properly index
spans (it records the start position but not the end position), so
that limits what kind of matching you can do at search time.
If this could be fixed (i.e. indexing the _end_ of a span) I think all
the things that I want to do, and the things that can
That would be really nice. Full standoff annotations open a lot of doors.
If we had them, though, I'm not sure exactly which of Mike's methods you'd
use? I thought payloads were completely token-based and could not be
attached to spans regardless. And the SynonymFilter is really to mimic the
beh
Hi,
I'm following Grant's advice on how to combine BooleanQuery and
SpanQuery
(http://mail-archives.apache.org/mod_mbox/lucene-java-user/201003.mbox/%3c08c90e81-1c33-487a-9e7d-2f05b2779...@apache.org%3E).
The strategy is to perform a BooleanQuery, get the document ID set and
perform a SpanQuery re
Can you provide some examples of terms that don't work and the index token
stream they fail on?
Make sure that the Analyzer you are using doesn't do any magic on the
indexed terms - your query term is unanalyzed. Maybe multiple, but distinct,
index terms are analyzing to the same, but unexpect
Am 13.12.2012 18:00, schrieb Jack Krupansky:
> Can you provide some examples of terms that don't work and the index
> token stream they fail on?
The index I'm testing with is German Wikipedia and I've been testing
with different (arbitrarily chosen) terms. I'm listing some results, the
first numbe
Hi,
I would like to be able to display up to multiple millions of lat/lng
points on a map, to make this possible my intention is to plot less
than 1000 clusters of points by dividing the world into a grid tree
and I'm looking into using GeohashPrefixTree to do this.
I am imagining that I
Parts-of-speech is available now, in the indexer.
LUCENE-2899 adds OpenNLP to the Lucene&Solr codebase. It does
parts-of-speech, chunking and Named Entity Recognition. OpenNLP is an
Apache project for natural-language processing.
Some parts are in Solr that could be in Lucene.
https://issues
It is not clear this is exactly what is needed/being discussed.
>From the issue:
"We are also planning a Tokenizer/TokenFilter that can put parts of
speech as either payloads (PartOfSpeechAttribute?) on a token or at
the same position."
This adds it to a token, not a span. 'same position' does no
I should not have added that note. The Opennlp patch gives a concrete
example of adding an annotation to text.
On 12/13/2012 01:54 PM, Glen Newton wrote:
It is not clear this is exactly what is needed/being discussed.
From the issue:
"We are also planning a Tokenizer/TokenFilter that can put
Cool! Sounds great! :-)
Any pointers to a (Lucene) example that attaches a payload to a
start..end span that is more than one token?
thanks,
-Glen
On Thu, Dec 13, 2012 at 5:03 PM, Lance Norskog wrote:
> I should not have added that note. The Opennlp patch gives a concrete
> example of adding a
Hi Robert,
Our sysadmins installed a later java version (info below) and I redid the
merge and then ran CheckIndex both using Java7. Same error (appended
below).
I suppose I could try merging 2 indexes, run checkindex and if its ok merge
3 indexes etc up to 12 to find the point where the proble
Hi Glen,
I don't believe you can attach a single payload to multiple tokens. What I did
for a similar requirement was to combine the tokens into a single "_" delimited
single token and attached the payload to it. For example:
The Big Bad Wolf huffed and puffed and blew the house of the Three Li
Hi
May be, I'm out of your scope. I'm the founder of maptimize, our new version
wan handle million of points. If you are intersested, look at
http://v3.maptimize.com and our demo http://onemilliontweetmap.com/.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Using-GeohashPr
16 matches
Mail list logo