Incremental Indexing in Lucene 4.7

2014-03-24 Thread Yuan
tor(null); } } However, I found that the terms the uidIter iterates are no longer in alphabetical order. Therefore, it breaks the algorithm. Is there anyway to workaround this? Thank you! -- View this message in context: http://lucene.472066.n3.nabble.com/Incremental-In

Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

2012-12-21 Thread Michael McCandless
On Thu, Dec 20, 2012 at 3:54 PM, Wu, Stephen T., Ph.D. wrote: >> If you stuff the end of the span into the payload you'd have to create >> a custom variant of PhraseQuery to properly match based on the end >> span. > > How different is this from the functionality already avaialable through > SpanQ

Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

2012-12-20 Thread Wu, Stephen T., Ph.D.
> If you stuff the end of the span into the payload you'd have to create > a custom variant of PhraseQuery to properly match based on the end > span. How different is this from the functionality already avaialable through SpanQuery? stephen --

Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

2012-12-18 Thread Carsten Schnober
Am 18.12.2012 12:36, schrieb Michael McCandless: > On Thu, Dec 13, 2012 at 8:32 AM, Carsten Schnober > wrote: >> This is a relatively easy example, but how would deal with e.g. >> annotations that include multiple tokens (as in spans), such as chunks, >> or relations between tokens (and token sp

Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

2012-12-18 Thread Michael McCandless
On Thu, Dec 13, 2012 at 10:09 AM, Glen Newton wrote: >>Unfortunately, Lucene doesn't properly index > spans (it records the start position but not the end position), so > that limits what kind of matching you can do at search time. > > If this could be fixed (i.e. indexing the _end_ of a span) I t

Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

2012-12-18 Thread Michael McCandless
On Thu, Dec 13, 2012 at 8:32 AM, Carsten Schnober wrote: > Am 13.12.2012 12:27, schrieb Michael McCandless: > >>> For example: >>> - part of speech of a token. >>> - syntactic parse subtree (over a span). >>> - semantically normalized phrase (to canonical text or ontological code). >>> - seman

Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

2012-12-13 Thread SUJIT PAL
Hi Glen, I don't believe you can attach a single payload to multiple tokens. What I did for a similar requirement was to combine the tokens into a single "_" delimited single token and attached the payload to it. For example: The Big Bad Wolf huffed and puffed and blew the house of the Three Li

Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

2012-12-13 Thread Glen Newton
Cool! Sounds great! :-) Any pointers to a (Lucene) example that attaches a payload to a start..end span that is more than one token? thanks, -Glen On Thu, Dec 13, 2012 at 5:03 PM, Lance Norskog wrote: > I should not have added that note. The Opennlp patch gives a concrete > example of adding a

Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

2012-12-13 Thread Lance Norskog
I should not have added that note. The Opennlp patch gives a concrete example of adding an annotation to text. On 12/13/2012 01:54 PM, Glen Newton wrote: It is not clear this is exactly what is needed/being discussed. From the issue: "We are also planning a Tokenizer/TokenFilter that can put

Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

2012-12-13 Thread Glen Newton
It is not clear this is exactly what is needed/being discussed. >From the issue: "We are also planning a Tokenizer/TokenFilter that can put parts of speech as either payloads (PartOfSpeechAttribute?) on a token or at the same position." This adds it to a token, not a span. 'same position' does no

Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

2012-12-13 Thread Lance Norskog
Parts-of-speech is available now, in the indexer. LUCENE-2899 adds OpenNLP to the Lucene&Solr codebase. It does parts-of-speech, chunking and Named Entity Recognition. OpenNLP is an Apache project for natural-language processing. Some parts are in Solr that could be in Lucene. https://issues

Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

2012-12-13 Thread Wu, Stephen T., Ph.D.
That would be really nice. Full standoff annotations open a lot of doors. If we had them, though, I'm not sure exactly which of Mike's methods you'd use? I thought payloads were completely token-based and could not be attached to spans regardless. And the SynonymFilter is really to mimic the beh

Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

2012-12-13 Thread Glen Newton
>Unfortunately, Lucene doesn't properly index spans (it records the start position but not the end position), so that limits what kind of matching you can do at search time. If this could be fixed (i.e. indexing the _end_ of a span) I think all the things that I want to do, and the things that can

Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

2012-12-13 Thread Carsten Schnober
Am 13.12.2012 12:27, schrieb Michael McCandless: >> For example: >> - part of speech of a token. >> - syntactic parse subtree (over a span). >> - semantically normalized phrase (to canonical text or ontological code). >> - semantic group (of a span). >> - coreference link. > > So for example

Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

2012-12-13 Thread Michael McCandless
On Wed, Dec 12, 2012 at 9:08 PM, lukai wrote: > Do we have any plan to decouple the index process? > > Lucene was design for search, but according the question people ask in the > thread it beyonds search functionality sometimes. Like we might want to > customize our scoring function based on payl

Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

2012-12-13 Thread Michael McCandless
On Wed, Dec 12, 2012 at 3:02 PM, Wu, Stephen T., Ph.D. wrote: >>> Is there any (preliminary) code checked in somewhere that I can look at, >>> that would help me understand the practical issues that would need to be >>> addressed? >> >> Maybe we can make this more concrete: what new attribute are

Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

2012-12-12 Thread lukai
Do we have any plan to decouple the index process? Lucene was design for search, but according the question people ask in the thread it beyonds search functionality sometimes. Like we might want to customize our scoring function based on payload. Sometimes i dont need to store TF/IDF information.

Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

2012-12-12 Thread wgggfiy
Thx very much! Lingpipe and Gate are very useful, and new to me, but is it too larger to realize the custom like class TestPostingItem { int termId; long startOffset; long endOffset; float score; int segId; long timeStamp; } ? - ---

Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

2012-12-12 Thread Glen Newton
+10 These are the kind of things you can do in GATE[1] using annotations[2]. A VERY useful feature. -Glen [1]http://gate.ac.uk [2]http://gate.ac.uk/wiki/jape-repository/annotations.html On Wed, Dec 12, 2012 at 3:02 PM, Wu, Stephen T., Ph.D. wrote: >>> Is there any (preliminary) code checked in

Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

2012-12-12 Thread Wu, Stephen T., Ph.D.
>> Is there any (preliminary) code checked in somewhere that I can look at, >> that would help me understand the practical issues that would need to be >> addressed? > > Maybe we can make this more concrete: what new attribute are you > needing to record in the postings and access at search time?

Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

2012-11-30 Thread Michael McCandless
On Fri, Nov 30, 2012 at 12:25 PM, Wu, Stephen T., Ph.D. wrote: > Is there any (preliminary) code checked in somewhere that I can look at, > that would help me understand the practical issues that would need to be > addressed? > > If I understand you correctly, it's a little different from what's h

Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

2012-11-30 Thread Wu, Stephen T., Ph.D.
gt; Mike McCandless > > http://blog.mikemccandless.com > > On Tue, Nov 27, 2012 at 3:37 PM, Wu, Stephen T., Ph.D. > wrote: >> Following up on a previous question... >> What is "flexible indexing" in Lucene 4.0? We assumed it was the ability to >>

Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

2012-11-30 Thread Jack Krupansky
e output of a standard Lucene analyzer. -- Jack Krupansky -Original Message- From: Johannes.Lichtenberger Sent: Friday, November 30, 2012 10:15 AM To: java-user@lucene.apache.org Cc: Michael McCandless Subject: Re: What is "flexible indexing" in Lucene 4.0 if it's not the

Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

2012-11-30 Thread Johannes.Lichtenberger
On 11/28/2012 01:11 AM, Michael McCandless wrote: Flexible indexing is the ability to make your own codec, which controls the reading and writing of all index parts (postings, stored fields, term vectors, deleted docs, etc.). So for example if you want to store some postings as a bit set instead

Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

2012-11-27 Thread Michael McCandless
: > Following up on a previous question... > What is "flexible indexing" in Lucene 4.0? We assumed it was the ability to > easily make new postings formats/codecs -- but a response below says that > would be "tricky"? > > stephen > > > On 11/27/12 11:48 A

What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

2012-11-27 Thread Wu, Stephen T., Ph.D.
Following up on a previous question... What is "flexible indexing" in Lucene 4.0? We assumed it was the ability to easily make new postings formats/codecs -- but a response below says that would be "tricky"? stephen On 11/27/12 11:48 AM, "David Causse" wrote:

Re: Semantic indexing in Lucene

2011-05-24 Thread Diego Cavalcanti
w all methods. I really do not want to change the project's > >> source code. Well... this is not important for this list! > >> > >> If anyone has another idea about how to implement semantic indexing in > >> Lucene, I would be grateful! > >> > >> [

Re: Semantic indexing in Lucene

2011-05-24 Thread Danica Damljanovic
nly by API, because the Javadoc >> does not show all methods. I really do not want to change the project's >> source code. Well... this is not important for this list! >> >> If anyone has another idea about how to implement semantic indexing in >> Lucene, I would

Re: Semantic indexing in Lucene

2011-05-23 Thread Paul Libbrecht
> If anyone has another idea about how to implement semantic indexing in > Lucene, I would be grateful! > > []s, > -- > Diego > > > On Mon, May 23, 2011 at 21:30, Yiannis Gkoufas wrote: > >> It's not my blog! :D >> I used some of the ideas in that

Re: Semantic indexing in Lucene

2011-05-23 Thread Sujit Pal
se the project programmatically > (only by command line). > > I've seen your blog, but I haven't found any post about semantic indexing in > Lucene. Can you point that for me, please? > > Thanks, > -- > Diego > > > On Mon, May 23, 2011 at 21:17, Yiannis

Re: Semantic indexing in Lucene

2011-05-23 Thread Diego Cavalcanti
the project's source code. Well... this is not important for this list! If anyone has another idea about how to implement semantic indexing in Lucene, I would be grateful! []s, -- Diego On Mon, May 23, 2011 at 21:30, Yiannis Gkoufas wrote: > It's not my blog! :D > I used some

Re: Semantic indexing in Lucene

2011-05-23 Thread Yiannis Gkoufas
27;ve seen your blog, but I haven't found any post about semantic indexing > in > Lucene. Can you point that for me, please? > > Thanks, > -- > Diego > > > On Mon, May 23, 2011 at 21:17, Yiannis Gkoufas > wrote: > > > Hi Diego, > > > > Are you re

Re: Semantic indexing in Lucene

2011-05-23 Thread Diego Cavalcanti
I've seen your blog, but I haven't found any post about semantic indexing in Lucene. Can you point that for me, please? Thanks, -- Diego On Mon, May 23, 2011 at 21:17, Yiannis Gkoufas wrote: > Hi Diego, > > Are you referring to that project--> > http://code.google.com/p

Re: Semantic indexing in Lucene

2011-05-23 Thread Yiannis Gkoufas
ectors). > > I've read old posts and some people said that Semantic Vectors plays well > with Lucene. However, I noticed that its classes are used only by command > line (throw method main) instead of by API. > > So, I'd like to know if anyone can suggest any other appr

Semantic indexing in Lucene

2011-05-23 Thread Diego Cavalcanti
used only by command line (throw method main) instead of by API. So, I'd like to know if anyone can suggest any other approach so that I could use semantic indexing in Lucene. Thanks, Diego

Re: Parsing Error while indexing in Lucene WordNet package

2009-10-21 Thread Robert Muir
Hi, thanks again for reporting this. I created an issue here: http://issues.apache.org/jira/browse/LUCENE-2001 On Wed, Oct 21, 2009 at 2:05 AM, parag dave wrote: > While using the Lucene WordNet package, we found that the Syns2Index > program > indexes the Synsets wrongly. For example, looking u

Re: Parsing Error while indexing in Lucene WordNet package

2009-10-21 Thread Robert Muir
thanks, this sounds like a bug, I'll play with this today. On Wed, Oct 21, 2009 at 2:05 AM, parag dave wrote: > While using the Lucene WordNet package, we found that the Syns2Index > program > indexes the Synsets wrongly. For example, looking up the synsets for the > word "king", we get: > > java

Parsing Error while indexing in Lucene WordNet package

2009-10-20 Thread parag dave
While using the Lucene WordNet package, we found that the Syns2Index program indexes the Synsets wrongly. For example, looking up the synsets for the word "king", we get: java SynLookup wnindex king baron magnate mogul power queen rex scrofula struma tycoon Here, "scrofula" and "struma" are extra

Re: Preserving dots of an acronym while indexing in Lucene

2009-07-18 Thread Shai Erera
to input a set of stop > words to Lucene while doing this. > > -- > View this message in context: > http://www.nabble.com/Preserving-dots-of-an-acronym-while-indexing-in-Lucene-tp24554342p24554342.html > Sent from the Lucene

Preserving dots of an acronym while indexing in Lucene

2009-07-18 Thread mitu2009
Hi, If i want Lucene to preserve dots of acronyms(example: U.K,U.S.A. etc), which analyzer do i need to use and how? I also want to input a set of stop words to Lucene while doing this. -- View this message in context: http://www.nabble.com/Preserving-dots-of-an-acronym-while-indexing-in

Re: Inrease the performance of Indexing in Lucene

2007-07-19 Thread Erick Erickson
possible to rename the Field's name inside Lucene Document. I know its not possible to change the value of the Document's Field but can we change the field's name. Any Ideas... I am totally petrified of googling. -- View this message in context: http://www.nabble.com/Inrease-the-perform

Re: Inrease the performance of Indexing in Lucene

2007-07-19 Thread miztaken
com/Inrease-the-performance-of-Indexing-in-Lucene-tf4108165.html#a11687272 Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Inrease the performance of Indexing in Lucene

2007-07-19 Thread Michael McCandless
the field but will it be > possible to change the name of the field or we have to control > externally..? > > Please shade me some light in these things: > Your help is highly anticipated > > -- > View this message in context: > http://www.nabble.com/Inrease-the-per

RE: Inrease the performance of Indexing in Lucene

2007-07-19 Thread miztaken
I am using lucene in .net.. i.e. dotlucene. So i believe i have to do this all by my own. Or am i missing something.. -- View this message in context: http://www.nabble.com/Inrease-the-performance-of-Indexing-in-Lucene-tf4108165.html#a11685997 Sent from the Lucene - Java Users mailing list

RE: Inrease the performance of Indexing in Lucene

2007-07-19 Thread Ard Schrijvers
hing.? > Also i might need to change the name of Field of Document > indexed in lucene, > will it be possible.? > I know its not possible to change the value of the field but > will it be > possible to change the name of the field or we have to > control externally..? > &

Inrease the performance of Indexing in Lucene

2007-07-18 Thread miztaken
we have to control externally..? Please shade me some light in these things: Your help is highly anticipated -- View this message in context: http://www.nabble.com/Inrease-the-performance-of-Indexing-in-Lucene-tf4108165.html#a11682360 Sent from the Lucene - Java Users mailing list archive at

Re: pdf,.doc,.xls,.ppt indexing in lucene

2007-03-09 Thread Grant Ingersoll
Search the archive, read the FAQ (see link in my signature). On Mar 9, 2007, at 7:20 AM, ashwin kumar wrote: hi all i have tried indexing .txt using lucene and its working fine. now i want to index .doc , .pdf , .xls , . ppt with lucene can some one help in doing that thanks regards ashwin

pdf,.doc,.xls,.ppt indexing in lucene

2007-03-09 Thread ashwin kumar
hi all i have tried indexing .txt using lucene and its working fine. now i want to index .doc , .pdf , .xls , . ppt with lucene can some one help in doing that thanks regards ashwin

Re: Document on Indexing in Lucene

2006-10-12 Thread Tom Bouctou
go to http://briefcase.yahoo.com/pickupartistmistry click on login enter user pickupartistmistry password: chotachetan the document should be there -tom Bill Taylor wrote: When I went there, I got a message that there were no shared folders in the brief case. It never gave me an opportunity t

Re: Document on Indexing in Lucene

2006-10-12 Thread Bill Taylor
When I went there, I got a message that there were no shared folders in the brief case. It never gave me an opportunity to enter the password. Thanks. Bill Taylor On Oct 12, 2006, at 6:34 AM, sachin wrote: Hello, I have got lot of personal emails for sharing the "Lucene Investigation" docu

Re: Document on Indexing in Lucene

2006-10-12 Thread Prasenjit Mukherjee
did someone delete the shared doc ? [EMAIL PROTECTED] wrote: Hello, I have got lot of personal emails for sharing the "Lucene Investigation" document. It is not possible to reply each of the Emails. So I am putting this document inside my briefcase. Anyone interested please go to following sit

Document on Indexing in Lucene

2006-10-12 Thread sachin
Hello, I have got lot of personal emails for sharing the "Lucene Investigation" document. It is not possible to reply each of the Emails. So I am putting this document inside my briefcase. Anyone interested please go to following site and get the document. http://briefcase.yahoo.com/pickupartistm

RE: Indexing In Lucene

2006-10-04 Thread sachin
Sent: Tuesday, October 03, 2006 2:15 PM To: lucene-net-commits@incubator.apache.org; lucene-net-dev@incubator.apache.org; lucene-net-user@incubator.apache.org Cc: lucene-dev@jakarta.apache.org; lucene-user@jakarta.apache.org Subject: Indexing In Lucene Hi, Can you tell me how indexing

Re: Indexing In Lucene

2006-10-03 Thread Nicolas Lalevée
x27;to' and 'cc'. I agree it's annoying. > > > > -Original Message- > > > From: Ajani, Akil (Cognizant) [mailto:[EMAIL PROTECTED] > > > Sent: dinsdag 3 oktober 2006 10:47 > > > Subject: Indexing In Lucene > > > > > > > &

Re: Indexing In Lucene

2006-10-03 Thread Nicolas Lalevée
filter worked : I put a "To contains ..." > > > -Original Message- > > From: Ajani, Akil (Cognizant) [mailto:[EMAIL PROTECTED] > > Sent: dinsdag 3 oktober 2006 10:47 > > Subject: Indexing In Lucene > > > > > > > > > > >

Indexing In Lucene

2006-10-03 Thread Ajani, Akil \(Cognizant\)
Hi, Can anyone tell me how indexing takes place in lucene(Depth).i will be thankful to you if anyone help me.. Thanks & Regards, Akil Ajani Cognizant Technology Solutions India Pvt. Ltd. Plot # 26, Rajiv Gandhi Infotech Park, MIDC Hinjewadi, Pune 411057 Tel: (91) (20) 40201100 e

RE: Indexing In Lucene

2006-10-03 Thread W.H. van Atteveldt
I don't know what you're doing but the to: header is empty in your email which is really annoying (since I rely on the to: to sort my mail) > -Original Message- > From: Ajani, Akil (Cognizant) [mailto:[EMAIL PROTECTED] > Sent: dinsdag 3 oktober 2006 10:47 > Subje

Indexing In Lucene

2006-10-03 Thread Ajani, Akil \(Cognizant\)
Hi, Can you tell me how indexing takes place in lucene(Depth).if document has 1n indices then which algorithm it uses,which information retrival model it uses... Thanks & Regards, Akil Ajani Cognizant Technology Solutions India Pvt. Ltd. Plot # 26, Rajiv Gandhi Infotech Park, MID

Indexing In Lucene

2006-10-03 Thread Ajani, Akil \(Cognizant\)
Hi, Can you tell me how indexing takes place in lucene(Depth).if document has 1n indices then which algorithm it uses,which information retrival model it uses... Thanks & Regards, Akil Ajani Cognizant Technology Solutions India Pvt. Ltd. Plot # 26, Rajiv Gandhi Infotech Park, MID

Re: indexing in lucene 1.9.1

2006-05-22 Thread Harini Raghavan
Hi Mike, Yes you are right, when we run the optimize(), it creates one large segment file and makes the searching faster. But the issue is our index keeps growing every minute as we download documents add to the index, so we cannot call optimize so often. The indexing seemed to be fine till w

Re: indexing in lucene 1.9.1

2006-05-22 Thread Mike Richmond
Hello Harini, When you are finished indexing the documents are you running the optimize() method on the IndexWriter before closing it? This should reduce the number of segments and make searching faster. Just a thought. --Mike On 5/22/06, Harini Raghavan <[EMAIL PROTECTED]> wrote: Hi All,

indexing in lucene 1.9.1

2006-05-21 Thread Harini Raghavan
Hi All, We have recently upgraded from lucene 1.4.3 to lucene 1.9.1 version. After the upgrade, we are facing some issues: 1. Indexing seems to be behaving differently. There were more than 300 segment files(.cfs) in the index and the IndexSearcher is taking forever to refresh the index. Have t