Re: Using Payloads

2009-04-23 Thread liat oren
Dear Murat, I saw your question and wondered how did you implement these changes? The requirement below are the same ones as I am trying to code now. Did you modify the source code itself or only used Lucene's jar and just override code? I would very much apprecicate if you could give me a short

Re: semi-infinite loop during merging

2009-04-23 Thread Michael McCandless
On Tue, Apr 21, 2009 at 6:40 PM, Christiaan Fluit wrote: > I may be on to something already. > > I just looked at the commitMerge code and was surprised to see that the > commitMerge message that is almost at the beginning wasn't printed. Then I > saw the "if (hitOOM) return false;" part that tak

Re: How to search special characters in LUcene

2009-04-23 Thread Erick Erickson
OK, this is a much different problem than you were originally asking about, effectively "how to index/search mixed language documents". This topic has been discussed multiple times on the user list, I think your first step should be to search the archive. I *was* going to find the old searchable m

SpanQuery wildcards?

2009-04-23 Thread Ivan Vasilev
Hy Guys, Does anybody knows if there is way to use wild cards in SpanQuery? My idea is for example instead of query - content:"expansive computer"~10 - we to use query - content:"exp* comp*"~10. And the results of first query to be subset of those of second one. I tried with parsing the above w

RE: SpanQuery wildcards?

2009-04-23 Thread Steven A Rowe
Hi Ivan, SpanRegexQuery should work - just use ".*" instead of "*". - Steve > -Original Message- > From: Ivan Vasilev [mailto:ivasi...@sirma.bg] > Sent: Thursday, April 23, 2009 11:42 AM > To: LUCENE MAIL LIST > Subject: SpanQuery wildcards? > > Hy Guys, > > Does anybody knows if there i

Re: SpanQuery wildcards?

2009-04-23 Thread mark harwood
Related: https://issues.apache.org/jira/browse/LUCENE-1486 - Original Message From: Steven A Rowe To: "java-user@lucene.apache.org" Sent: Thursday, 23 April, 2009 16:54:08 Subject: RE: SpanQuery wildcards? Hi Ivan, SpanRegexQuery should work - just use ".*" instead of "*". - Steve

Re: Why is CustomScoreQuery limited to ValueSourceQuery type?

2009-04-23 Thread Steven Bethard
On 4/22/2009 2:26 PM, Doron Cohen wrote: > Steve, I added a patch in https://issues.apache.org/jira/browse/LUCENE-1608, > > which allows to wrap any query in a value source, and then create a value > source query out of it. > Let us know how this works for you... Thanks! I'll try this out in the

Error: there are more terms than documents...

2009-04-23 Thread Bill.Chesky
Hello, I'm getting a strange error when I make a Lucene (2.2.0) query w/ the following call: java.lang.RuntimeException: there are more terms than documents in field "objectId", but it's impossible to sort on tokenized fields at org.apache.lucene.search.FieldCacheImpl$10.createValue(

RE: Error: there are more terms than documents...

2009-04-23 Thread Bill.Chesky
Sorry for that terrible formatting. Let me try again. == Hello, I'm getting a strange error when I make a Lucene (2.2.0) query: java.lang.RuntimeException: there are more terms than documents in field "objectId", but it's impossible to sort

Re: Error: there are more terms than documents...

2009-04-23 Thread Doron Cohen
On Thu, Apr 23, 2009 at 10:39 PM, wrote: > I'm getting a strange error when I make a Lucene (2.2.0) query: > > java.lang.RuntimeException: there are more terms than documents in field > "objectId", but it's impossible to sort on tokenized fields > Is it possible that, for at least one document,

Change boost of documents / single fields / external scoring ?

2009-04-23 Thread Marcus Herou
Hi. Confusing subject eh ? Trying to become a little clearer in a few sentences. We have a Solr/Lucene index where each document is a Blog Entry. We have just implemented the PageRank algorithm for Blogs and are about to add a column to the index called score and perhaps adjust the document boost

Re: Change boost of documents / single fields / external scoring ?

2009-04-23 Thread Marcus Herou
Could an ExternalFileField help me ? http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html On Thu, Apr 23, 2009 at 10:01 PM, Marcus Herou wrote: > Hi. > > Confusing subject eh ? Trying to become a little clearer in a few > sentences. > > We have a Solr/Lucene index where

RE: Error: there are more terms than documents...

2009-04-23 Thread Bill.Chesky
Doron, thanks for the reply. > Is it possible that, for at least one document, multiple "objectId" fields > were created? > This would also create this problem. I read that online as well. I don't think so. We do have an update process that updates the index. During the update process we have

Re: exponential boosts

2009-04-23 Thread Marcus Herou
Hi. I think we are doing similar things, at least I am trying to implement document boosting with pagerank. Having issues of howto appky the scoring of specific docs without actually reindex them. I feel something should be done at query time which looks at external data but do not know howto impl

RE: Error: there are more terms than documents...

2009-04-23 Thread Bill.Chesky
I figured it out. We are using Hibernate Search and in my ORM class I am doing the following: @Field(index=Index.TOKENIZED,store=Store.YES) protected String objectId; So when I persisted a new object to our database I was inadvertently creating a document in the Lucene index with the tokenized a

Re: exponential boosts

2009-04-23 Thread Doron Cohen
> > I think we are doing similar things, at least I am trying to implement > document boosting with pagerank. Having issues of howto appky the scoring > of > specific docs without actually reindex them. I feel something should be > done > at query time which looks at external data but do not know h

Re: exponential boosts

2009-04-23 Thread Marcus Herou
Yes I have considered it for 30 minutes :) How do one apply that in the real world ? If the only thing I get access to is the actual docId would it not be really expensive to get the Document itself from the index and later use some field in it as external lookup in some optimized structure for t

Re: exponential boosts

2009-04-23 Thread Marcus Herou
But perhaps one could use a FieldCache somehow ? /M On Thu, Apr 23, 2009 at 11:07 PM, Marcus Herou wrote: > Yes I have considered it for 30 minutes :) > > How do one apply that in the real world ? > > If the only thing I get access to is the actual docId would it not be > really expensive to get

Re: exponential boosts

2009-04-23 Thread Steven Bethard
On 4/23/2009 1:58 PM, Doron Cohen wrote: >> I think we are doing similar things, at least I am trying to implement >> document boosting with pagerank. Having issues of howto appky the scoring >> of >> specific docs without actually reindex them. I feel something should be >> done >> at query time w

Re: exponential boosts

2009-04-23 Thread Steven Bethard
On 4/23/2009 2:08 PM, Marcus Herou wrote: > But perhaps one could use a FieldCache somehow ? Some code snippets that may help. I add the PageRank value as a field of the documents I index with Lucene like this: Document document = new Document(); double pageRank = this.pageRanks.getCount(

Re: exponential boosts

2009-04-23 Thread Marcus Herou
Thanks! (I started my reply and then saw that you added code snippets) I think we are narrowing down the problem to the updating issue of the PageRank score. So what you basically are saying is that: 1. You have an index which contains data that is more or less static (no updates) or you have an

Re: exponential boosts

2009-04-23 Thread Marcus Herou
Never mind of how to open the ParallellReader stuff (I am an idiot): RTFM: http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/index/ParallelReader.html But the rest is of course interesting :) /M On Thu, Apr 23, 2009 at 11:42 PM, Marcus Herou wrote: > Thanks! (I started my reply and then

Re: exponential boosts

2009-04-23 Thread Steven Bethard
On 4/23/2009 2:42 PM, Marcus Herou wrote: > So what you basically are saying is that: > > 1. You have an index which contains data that is more or less static (no > updates) or you have another update interval than the PR interval. > 2. A PR index which is rebuilt (from scratch ?) every X days/wee

Re: exponential boosts

2009-04-23 Thread Marcus Herou
Thank you Steve, now it's implementation time... I'll be back :) /M On Fri, Apr 24, 2009 at 3:13 AM, Steven Bethard wrote: > On 4/23/2009 2:42 PM, Marcus Herou wrote: > > So what you basically are saying is that: > > > > 1. You have an index which contains data that is more or less static (no >