tor(null);
}
}
However, I found that the terms the uidIter iterates are no longer in
alphabetical order. Therefore, it breaks the algorithm. Is there anyway to
workaround this?
Thank you!
--
View this message in context:
http://lucene.472066.n3.nabble.com/Incremental-In
On Thu, Dec 20, 2012 at 3:54 PM, Wu, Stephen T., Ph.D.
wrote:
>> If you stuff the end of the span into the payload you'd have to create
>> a custom variant of PhraseQuery to properly match based on the end
>> span.
>
> How different is this from the functionality already avaialable through
> SpanQ
> If you stuff the end of the span into the payload you'd have to create
> a custom variant of PhraseQuery to properly match based on the end
> span.
How different is this from the functionality already avaialable through
SpanQuery?
stephen
--
Am 18.12.2012 12:36, schrieb Michael McCandless:
> On Thu, Dec 13, 2012 at 8:32 AM, Carsten Schnober
> wrote:
>> This is a relatively easy example, but how would deal with e.g.
>> annotations that include multiple tokens (as in spans), such as chunks,
>> or relations between tokens (and token sp
On Thu, Dec 13, 2012 at 10:09 AM, Glen Newton wrote:
>>Unfortunately, Lucene doesn't properly index
> spans (it records the start position but not the end position), so
> that limits what kind of matching you can do at search time.
>
> If this could be fixed (i.e. indexing the _end_ of a span) I t
On Thu, Dec 13, 2012 at 8:32 AM, Carsten Schnober
wrote:
> Am 13.12.2012 12:27, schrieb Michael McCandless:
>
>>> For example:
>>> - part of speech of a token.
>>> - syntactic parse subtree (over a span).
>>> - semantically normalized phrase (to canonical text or ontological code).
>>> - seman
Hi Glen,
I don't believe you can attach a single payload to multiple tokens. What I did
for a similar requirement was to combine the tokens into a single "_" delimited
single token and attached the payload to it. For example:
The Big Bad Wolf huffed and puffed and blew the house of the Three Li
Cool! Sounds great! :-)
Any pointers to a (Lucene) example that attaches a payload to a
start..end span that is more than one token?
thanks,
-Glen
On Thu, Dec 13, 2012 at 5:03 PM, Lance Norskog wrote:
> I should not have added that note. The Opennlp patch gives a concrete
> example of adding a
I should not have added that note. The Opennlp patch gives a concrete
example of adding an annotation to text.
On 12/13/2012 01:54 PM, Glen Newton wrote:
It is not clear this is exactly what is needed/being discussed.
From the issue:
"We are also planning a Tokenizer/TokenFilter that can put
It is not clear this is exactly what is needed/being discussed.
>From the issue:
"We are also planning a Tokenizer/TokenFilter that can put parts of
speech as either payloads (PartOfSpeechAttribute?) on a token or at
the same position."
This adds it to a token, not a span. 'same position' does no
Parts-of-speech is available now, in the indexer.
LUCENE-2899 adds OpenNLP to the Lucene&Solr codebase. It does
parts-of-speech, chunking and Named Entity Recognition. OpenNLP is an
Apache project for natural-language processing.
Some parts are in Solr that could be in Lucene.
https://issues
That would be really nice. Full standoff annotations open a lot of doors.
If we had them, though, I'm not sure exactly which of Mike's methods you'd
use? I thought payloads were completely token-based and could not be
attached to spans regardless. And the SynonymFilter is really to mimic the
beh
>Unfortunately, Lucene doesn't properly index
spans (it records the start position but not the end position), so
that limits what kind of matching you can do at search time.
If this could be fixed (i.e. indexing the _end_ of a span) I think all
the things that I want to do, and the things that can
Am 13.12.2012 12:27, schrieb Michael McCandless:
>> For example:
>> - part of speech of a token.
>> - syntactic parse subtree (over a span).
>> - semantically normalized phrase (to canonical text or ontological code).
>> - semantic group (of a span).
>> - coreference link.
>
> So for example
On Wed, Dec 12, 2012 at 9:08 PM, lukai wrote:
> Do we have any plan to decouple the index process?
>
> Lucene was design for search, but according the question people ask in the
> thread it beyonds search functionality sometimes. Like we might want to
> customize our scoring function based on payl
On Wed, Dec 12, 2012 at 3:02 PM, Wu, Stephen T., Ph.D.
wrote:
>>> Is there any (preliminary) code checked in somewhere that I can look at,
>>> that would help me understand the practical issues that would need to be
>>> addressed?
>>
>> Maybe we can make this more concrete: what new attribute are
Do we have any plan to decouple the index process?
Lucene was design for search, but according the question people ask in the
thread it beyonds search functionality sometimes. Like we might want to
customize our scoring function based on payload. Sometimes i dont need to
store TF/IDF information.
Thx very much!
Lingpipe and Gate are very useful, and new to me,
but is it too larger to realize the custom like
class TestPostingItem
{
int termId;
long startOffset;
long endOffset;
float score;
int segId;
long timeStamp;
} ?
-
---
+10
These are the kind of things you can do in GATE[1] using annotations[2].
A VERY useful feature.
-Glen
[1]http://gate.ac.uk
[2]http://gate.ac.uk/wiki/jape-repository/annotations.html
On Wed, Dec 12, 2012 at 3:02 PM, Wu, Stephen T., Ph.D.
wrote:
>>> Is there any (preliminary) code checked in
>> Is there any (preliminary) code checked in somewhere that I can look at,
>> that would help me understand the practical issues that would need to be
>> addressed?
>
> Maybe we can make this more concrete: what new attribute are you
> needing to record in the postings and access at search time?
On Fri, Nov 30, 2012 at 12:25 PM, Wu, Stephen T., Ph.D.
wrote:
> Is there any (preliminary) code checked in somewhere that I can look at,
> that would help me understand the practical issues that would need to be
> addressed?
>
> If I understand you correctly, it's a little different from what's h
gt; Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Tue, Nov 27, 2012 at 3:37 PM, Wu, Stephen T., Ph.D.
> wrote:
>> Following up on a previous question...
>> What is "flexible indexing" in Lucene 4.0? We assumed it was the ability to
>>
e output of a
standard Lucene analyzer.
-- Jack Krupansky
-Original Message-
From: Johannes.Lichtenberger
Sent: Friday, November 30, 2012 10:15 AM
To: java-user@lucene.apache.org
Cc: Michael McCandless
Subject: Re: What is "flexible indexing" in Lucene 4.0 if it's not the
On 11/28/2012 01:11 AM, Michael McCandless wrote:
Flexible indexing is the ability to make your own codec, which
controls the reading and writing of all index parts (postings, stored
fields, term vectors, deleted docs, etc.).
So for example if you want to store some postings as a bit set instead
:
> Following up on a previous question...
> What is "flexible indexing" in Lucene 4.0? We assumed it was the ability to
> easily make new postings formats/codecs -- but a response below says that
> would be "tricky"?
>
> stephen
>
>
> On 11/27/12 11:48 A
Following up on a previous question...
What is "flexible indexing" in Lucene 4.0? We assumed it was the ability to
easily make new postings formats/codecs -- but a response below says that
would be "tricky"?
stephen
On 11/27/12 11:48 AM, "David Causse" wrote:
w all methods. I really do not want to change the project's
> >> source code. Well... this is not important for this list!
> >>
> >> If anyone has another idea about how to implement semantic indexing in
> >> Lucene, I would be grateful!
> >>
> >> [
nly by API, because the Javadoc
>> does not show all methods. I really do not want to change the project's
>> source code. Well... this is not important for this list!
>>
>> If anyone has another idea about how to implement semantic indexing in
>> Lucene, I would
> If anyone has another idea about how to implement semantic indexing in
> Lucene, I would be grateful!
>
> []s,
> --
> Diego
>
>
> On Mon, May 23, 2011 at 21:30, Yiannis Gkoufas wrote:
>
>> It's not my blog! :D
>> I used some of the ideas in that
se the project programmatically
> (only by command line).
>
> I've seen your blog, but I haven't found any post about semantic indexing in
> Lucene. Can you point that for me, please?
>
> Thanks,
> --
> Diego
>
>
> On Mon, May 23, 2011 at 21:17, Yiannis
the project's
source code. Well... this is not important for this list!
If anyone has another idea about how to implement semantic indexing in
Lucene, I would be grateful!
[]s,
--
Diego
On Mon, May 23, 2011 at 21:30, Yiannis Gkoufas wrote:
> It's not my blog! :D
> I used some
27;ve seen your blog, but I haven't found any post about semantic indexing
> in
> Lucene. Can you point that for me, please?
>
> Thanks,
> --
> Diego
>
>
> On Mon, May 23, 2011 at 21:17, Yiannis Gkoufas
> wrote:
>
> > Hi Diego,
> >
> > Are you re
I've seen your blog, but I haven't found any post about semantic indexing in
Lucene. Can you point that for me, please?
Thanks,
--
Diego
On Mon, May 23, 2011 at 21:17, Yiannis Gkoufas wrote:
> Hi Diego,
>
> Are you referring to that project-->
> http://code.google.com/p
ectors).
>
> I've read old posts and some people said that Semantic Vectors plays well
> with Lucene. However, I noticed that its classes are used only by command
> line (throw method main) instead of by API.
>
> So, I'd like to know if anyone can suggest any other appr
used only by command
line (throw method main) instead of by API.
So, I'd like to know if anyone can suggest any other approach so that I
could use semantic indexing in Lucene.
Thanks,
Diego
Hi, thanks again for reporting this.
I created an issue here: http://issues.apache.org/jira/browse/LUCENE-2001
On Wed, Oct 21, 2009 at 2:05 AM, parag dave wrote:
> While using the Lucene WordNet package, we found that the Syns2Index
> program
> indexes the Synsets wrongly. For example, looking u
thanks, this sounds like a bug, I'll play with this today.
On Wed, Oct 21, 2009 at 2:05 AM, parag dave wrote:
> While using the Lucene WordNet package, we found that the Syns2Index
> program
> indexes the Synsets wrongly. For example, looking up the synsets for the
> word "king", we get:
>
> java
While using the Lucene WordNet package, we found that the Syns2Index program
indexes the Synsets wrongly. For example, looking up the synsets for the
word "king", we get:
java SynLookup wnindex king
baron
magnate
mogul
power
queen
rex
scrofula
struma
tycoon
Here, "scrofula" and "struma" are extra
to input a set of stop
> words to Lucene while doing this.
>
> --
> View this message in context:
> http://www.nabble.com/Preserving-dots-of-an-acronym-while-indexing-in-Lucene-tp24554342p24554342.html
> Sent from the Lucene
Hi,
If i want Lucene to preserve dots of acronyms(example: U.K,U.S.A. etc),
which analyzer do i need to use and how? I also want to input a set of stop
words to Lucene while doing this.
--
View this message in context:
http://www.nabble.com/Preserving-dots-of-an-acronym-while-indexing-in
possible to rename the Field's name inside Lucene Document.
I know its not possible to change the value of the Document's Field but
can
we change the field's name.
Any Ideas...
I am totally petrified of googling.
--
View this message in context:
http://www.nabble.com/Inrease-the-perform
com/Inrease-the-performance-of-Indexing-in-Lucene-tf4108165.html#a11687272
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
the field but will it be
> possible to change the name of the field or we have to control
> externally..?
>
> Please shade me some light in these things:
> Your help is highly anticipated
>
> --
> View this message in context:
> http://www.nabble.com/Inrease-the-per
I am using lucene in .net.. i.e. dotlucene.
So i believe i have to do this all by my own.
Or am i missing something..
--
View this message in context:
http://www.nabble.com/Inrease-the-performance-of-Indexing-in-Lucene-tf4108165.html#a11685997
Sent from the Lucene - Java Users mailing list
hing.?
> Also i might need to change the name of Field of Document
> indexed in lucene,
> will it be possible.?
> I know its not possible to change the value of the field but
> will it be
> possible to change the name of the field or we have to
> control externally..?
>
&
we have to control externally..?
Please shade me some light in these things:
Your help is highly anticipated
--
View this message in context:
http://www.nabble.com/Inrease-the-performance-of-Indexing-in-Lucene-tf4108165.html#a11682360
Sent from the Lucene - Java Users mailing list archive at
Search the archive, read the FAQ (see link in my signature).
On Mar 9, 2007, at 7:20 AM, ashwin kumar wrote:
hi all i have tried indexing .txt using lucene and its working fine.
now i want to index .doc , .pdf , .xls , . ppt with lucene
can some one help in doing that
thanks
regards
ashwin
hi all i have tried indexing .txt using lucene and its working fine.
now i want to index .doc , .pdf , .xls , . ppt with lucene
can some one help in doing that
thanks
regards
ashwin
go to http://briefcase.yahoo.com/pickupartistmistry
click on login
enter user pickupartistmistry
password: chotachetan
the document should be there
-tom
Bill Taylor wrote:
When I went there, I got a message that there were no shared folders
in the brief case.
It never gave me an opportunity t
When I went there, I got a message that there were no shared folders in
the brief case.
It never gave me an opportunity to enter the password.
Thanks.
Bill Taylor
On Oct 12, 2006, at 6:34 AM, sachin wrote:
Hello,
I have got lot of personal emails for sharing the "Lucene
Investigation"
docu
did someone delete the shared doc ?
[EMAIL PROTECTED] wrote:
Hello,
I have got lot of personal emails for sharing the "Lucene Investigation"
document. It is not possible to reply each of the Emails. So I am putting
this document inside my briefcase. Anyone interested please go to following
sit
Hello,
I have got lot of personal emails for sharing the "Lucene Investigation"
document. It is not possible to reply each of the Emails. So I am putting
this document inside my briefcase. Anyone interested please go to following
site and get the document.
http://briefcase.yahoo.com/pickupartistm
Sent: Tuesday, October 03, 2006 2:15 PM
To: lucene-net-commits@incubator.apache.org;
lucene-net-dev@incubator.apache.org; lucene-net-user@incubator.apache.org
Cc: lucene-dev@jakarta.apache.org; lucene-user@jakarta.apache.org
Subject: Indexing In Lucene
Hi,
Can you tell me how indexing
x27;to' and 'cc'. I agree it's
annoying.
>
> > > -Original Message-
> > > From: Ajani, Akil (Cognizant) [mailto:[EMAIL PROTECTED]
> > > Sent: dinsdag 3 oktober 2006 10:47
> > > Subject: Indexing In Lucene
> > >
> > >
> &
filter worked : I put a "To contains ..."
>
> > -Original Message-
> > From: Ajani, Akil (Cognizant) [mailto:[EMAIL PROTECTED]
> > Sent: dinsdag 3 oktober 2006 10:47
> > Subject: Indexing In Lucene
> >
> >
> >
> >
> >
>
Hi,
Can anyone tell me how indexing takes place in lucene(Depth).i
will be thankful to you if anyone help me..
Thanks & Regards,
Akil Ajani
Cognizant Technology Solutions India Pvt. Ltd.
Plot # 26, Rajiv Gandhi Infotech Park, MIDC
Hinjewadi, Pune 411057
Tel: (91) (20) 40201100 e
I don't know what you're doing but the to: header is empty in your email
which is really annoying (since I rely on the to: to sort my mail)
> -Original Message-
> From: Ajani, Akil (Cognizant) [mailto:[EMAIL PROTECTED]
> Sent: dinsdag 3 oktober 2006 10:47
> Subje
Hi,
Can you tell me how indexing takes place in lucene(Depth).if
document has 1n indices then which algorithm it uses,which
information retrival model it uses...
Thanks & Regards,
Akil Ajani
Cognizant Technology Solutions India Pvt. Ltd.
Plot # 26, Rajiv Gandhi Infotech Park, MID
Hi,
Can you tell me how indexing takes place in lucene(Depth).if
document has 1n indices then which algorithm it uses,which
information retrival model it uses...
Thanks & Regards,
Akil Ajani
Cognizant Technology Solutions India Pvt. Ltd.
Plot # 26, Rajiv Gandhi Infotech Park, MID
Hi Mike,
Yes you are right, when we run the optimize(), it creates one large
segment file and makes the searching faster. But the issue is our index
keeps growing every minute as we download documents add to the index, so
we cannot call optimize so often. The indexing seemed to be fine till w
Hello Harini,
When you are finished indexing the documents are you running the
optimize() method on the IndexWriter before closing it? This should
reduce the number of segments and make searching faster. Just a
thought.
--Mike
On 5/22/06, Harini Raghavan <[EMAIL PROTECTED]> wrote:
Hi All,
Hi All,
We have recently upgraded from lucene 1.4.3 to lucene 1.9.1 version.
After the upgrade, we are facing some issues:
1. Indexing seems to be behaving differently. There were more than 300
segment files(.cfs) in the index and the IndexSearcher is taking forever
to refresh the index. Have t
62 matches
Mail list logo