Re: sorting on dynamic fields - good, bad, neither?

2007-11-05 Thread Chris Hostetter
: Each element of the cached array is a ... what? The ID of the the elements of the array are the values, the indexes into the array are the document IDs ... esentailly it's inverted-inverted-index. : document? (I'll be happy to answer this myself by reading the source : code, but I'm not quite

Re: solr instances for different content?

2007-11-05 Thread Chris Hostetter
: I don't think that will solve the relevance issues, given that the IDF : (described at : http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/Similarity.html) : is per document, not per field. In the end, though, it may be negligible. well .. yes, but

Re: Tomcat JNDI Settings

2007-11-05 Thread Chris Hostetter
: : : : SEVERE: Exception starting filter SolrRequestFilter class : java.lang.NoClassDefFoundError: Could not initialize class : org.apache.solr.core.SolrConfig this may be a variant of SOLR-337 ... are you sure /var/lib/tomcat5/solr/home exists? does it contain a ./conf directory? does

Re: Score of exact matches

2007-11-05 Thread Walter Underwood
This is fairly straightforward and works well with the DisMax handler. Indes the text into three different fields with three different sets of analyzers. Use something like this in the request handler: 0.01 exact^16 noaccent^4 stemmed exact^16 no

Re: Score of exact matches

2007-11-05 Thread Mike Klaas
On 5-Nov-07, at 9:05 PM, Papalagi Pakeha wrote: Hi all, I use Solr 1.2 on a job advertising site. I started from the default setup that runs all documents and queries through EnglishPorterFilterFactory. As a result for example an ad with "accounts" in its title is matched when someone runs a qu

Re: "value boosts"? (boosting a multiValued field's data)

2007-11-05 Thread Yonik Seeley
On 11/6/07, evol__ <[EMAIL PROTECTED]> wrote: > Hi. Is the "expansion" method described in the following year old post still > the best available way to do this? > http://www.nabble.com/newbie-Q-regarding-schema-configuration-tf1814271.html#a4956602 > > The way I understand it, indexing these >

"value boosts"? (boosting a multiValued field's data)

2007-11-05 Thread evol__
Hi. Is the "expansion" method described in the following year old post still the best available way to do this? http://www.nabble.com/newbie-Q-regarding-schema-configuration-tf1814271.html#a4956602 The way I understand it, indexing these First val Less important value would just make the

Score of exact matches

2007-11-05 Thread Papalagi Pakeha
Hi all, I use Solr 1.2 on a job advertising site. I started from the default setup that runs all documents and queries through EnglishPorterFilterFactory. As a result for example an ad with "accounts" in its title is matched when someone runs a query for "accountant" because both are stemmed to th

Re: specify index location

2007-11-05 Thread Yonik Seeley
On 11/5/07, evol__ <[EMAIL PROTECTED]> wrote: > Just a remark: > > Might be a good idea to change this to ./data/index to reflect the location > that is expected in there. ./data is the generic solr data directory "index" stores the main index under the data directory. -Yonik

Re: specify index location

2007-11-05 Thread evol__
Thanks Grant. Solrconfig.xml was the first place I looked into, but somehow missed it *scratches head*... Just a remark: Might be a good idea to change this to ./data/index to reflect the location that is expected in there. D. Grant Ingersoll-6 wrote: > > Have a look at the solrconfig.xml

Re: how to use PHP AND PHPS?

2007-11-05 Thread James liu
first: i m sure i enable php and phps in my solrconfig.xml two: i can't get answer. *phps: *http://localhost:8080/solr1/select/?q=2&version=2.2&start=0&rows=10&indent=on&wt=phps '; $a = file_get_contents($url); echo 'before unserialize'; var_dump($a); $a = unserialize($a); echo 'after unserial

Re: Phrase Query Performance Question and score threshold

2007-11-05 Thread Yonik Seeley
On 11/5/07, Haishan Chen <[EMAIL PROTECTED]> wrote: > As for the first issues. The number of different phrase queries have > performance issues I found so far are about 10. If these are normal phrase queries (no slop), a good solution might be to simply index and query these phrases as a single t

RE: Phrase Query Performance Question and score threshold

2007-11-05 Thread Haishan Chen
> Date: Mon, 5 Nov 2007 14:55:21 -0500> From: [EMAIL PROTECTED]> To: > solr-user@lucene.apache.org> Subject: Re: Phrase Query Performance Question > and score threshold> > On 11/5/07, Haishan Chen <[EMAIL PROTECTED]> wrote:> > > If I limit the documents returned based on a score threshold (fi

Re: sorting on dynamic fields - good, bad, neither?

2007-11-05 Thread Mike Klaas
On 5-Nov-07, at 2:22 PM, Charles Hornberger wrote: On 11/5/07, Charles Hornberger <[EMAIL PROTECTED]> wrote: Also, it seems a bit inefficient to bother allocating an array containing an entry for each document when only some small percentage of the documents actually contain values for the fiel

Re: sorting on dynamic fields - good, bad, neither?

2007-11-05 Thread Charles Hornberger
On 11/5/07, Charles Hornberger <[EMAIL PROTECTED]> wrote: > Also, it seems a bit inefficient to bother allocating an array > containing an entry for each document when only some small percentage > of the documents actually contain values for the field. Would it be > worth investigating whether you

Re: Can you parse the contents of a field to populate other fields?

2007-11-05 Thread Yonik Seeley
On 11/5/07, Kristen Roth <[EMAIL PROTECTED]> wrote: > I'm wondering if this is possible... I am trying to model a hierarchy of > facets, and have a field in my xml (Category) that structured like this: > facet1::facet2::facet3... At index time, I would like to split this > field on the :: to popul

Re: Phrase Query Performance Question and score threshold

2007-11-05 Thread Yonik Seeley
On 11/5/07, Haishan Chen <[EMAIL PROTECTED]> wrote: > If I limit the documents returned based on a score threshold (filter by > score) will it be able to improve query performance? No. Taking a different approach can really speed up queries though. To figure out what approach you should take, we

RE: Phrase Query Performance Question and score threshold

2007-11-05 Thread Haishan Chen
Hoss, If I limit the documents returned based on a score threshold (filter by score) will it be able to improve query performance? My intuition is it won't be able to because you will still have to calculate the score and then compare to the threshold. I know it may not be meaningful to do s

Tomcat JNDI Settings

2007-11-05 Thread Wayne Graham
I'm attempting to set up multiple instances of Solr using JNDI (taken from http://wiki.apache.org/solr/SolrTomcat). I created a new solr.xml file in $CATALINA_HOME/conf/Catalina/localhost with: The box is running tomcat 5.5.23, so the solr.war file is out of the webapps path and the solr/ho

Re: sorting on dynamic fields - good, bad, neither?

2007-11-05 Thread Charles Hornberger
On 10/31/07, Chris Hostetter <[EMAIL PROTECTED]> wrote: > the biggest factor to worry about is the number of "sources" ... the key > to understanidng the performance risks is to understand that: > 1) no matter how many documents do or don't have a value for a given > field, when you sort on thta

Re: solr instances for different content?

2007-11-05 Thread Grant Ingersoll
I don't think that will solve the relevance issues, given that the IDF (described at http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/Similarity.html) is per document, not per field. In the end, though, it may be negligible. Can you test it out

Re: customer request handler doesn't envok the query tokenization chain

2007-11-05 Thread Yonik Seeley
On 11/5/07, Yu-Hui Jin <[EMAIL PROTECTED]> wrote: > Just curious, does the default operator ( "AND" or "OR") specify the > relationship between a field/value component or between the tokens of the > same field/value componenet? between any clauses in a boolean query. > e.g. for a query like this

Re: how to use PHP AND PHPS?

2007-11-05 Thread Stu Hood
Did you enable the PHP serialized response writer in your solrconfig.xml? It is not enabled by default. Thanks, Stu -Original Message- From: James liu <[EMAIL PROTECTED]> Sent: Monday, November 5, 2007 9:03am To: solr-user@lucene.apache.org Subject: Re: how to use PHP AND PHPS? i know

Re: solr instances for different content?

2007-11-05 Thread Tim Archambault
Good points Grant. I'm envisioning my front end working so that a user would never be able to search across all the verticals at once. EVERY query would inject "vertical:jobs" or "vertical:news" or "vertical:Autos", etc.. etc... This may detrimentally affect my faceted results sets so I'll have t

Re: solr instances for different content?

2007-11-05 Thread Grant Ingersoll
One reason to consider separate indexes is in terms of relevance. Do you want content from classifieds effecting the rankings of your news searches? May not be an issue for you depending on your term distributions, but might be something to consider.As you suspect, though, having mult

Re: solr instances for different content?

2007-11-05 Thread Yonik Seeley
500K is definitely doable with good hardware, but it also depends on what your queries look like, how many fields you are faceting on, etc... look at your performance now to try and judge how much headroom you have. -Yonik On 11/5/07, Tim Archambault <[EMAIL PROTECTED]> wrote: > Typical newspape

solr instances for different content?

2007-11-05 Thread Tim Archambault
Typical newspaper site with: news, jobs, homes, autos, classifieds, community-generated content, guestimate of .5 million documents Do I really need to create a different solr index for each vertical? How ineffecient is it to add a few additional fields for each content type? Thinking of having a

Re: how to use PHP AND PHPS?

2007-11-05 Thread James liu
i know it...but u try it,,u will find simlar question. On 11/5/07, Robert Young <[EMAIL PROTECTED]> wrote: > > I would imagine you have to unserialize > > On 11/5/07, James liu <[EMAIL PROTECTED]> wrote: > > i find they all return string > > > > > $url = ' > > > http://localhost:8080/solr/s

Re: how to use PHP AND PHPS?

2007-11-05 Thread Dave Lewis
On Nov 5, 2007, at 8:56 AM, Robert Young wrote: I would imagine you have to unserialize On 11/5/07, James liu <[EMAIL PROTECTED]> wrote: i find they all return string http://localhost:8080/solr/select/? q=solr&version=2.2&start=0&rows=10&indent=on&wt=php '; var_dump(file_get_contents(

Re: how to use PHP AND PHPS?

2007-11-05 Thread Robert Young
I would imagine you have to unserialize On 11/5/07, James liu <[EMAIL PROTECTED]> wrote: > i find they all return string > >$url = ' > http://localhost:8080/solr/select/?q=solr&version=2.2&start=0&rows=10&indent=on&wt=php > '; > var_dump(file_get_contents($url); > ?> > > > -- > regards >

how to use PHP AND PHPS?

2007-11-05 Thread James liu
i find they all return string http://localhost:8080/solr/select/?q=solr&version=2.2&start=0&rows=10&indent=on&wt=php '; var_dump(file_get_contents($url); ?> -- regards jl

where to hook in to SOLR to read field-label from functionquery

2007-11-05 Thread Britske
My question sounds strange I know, but I'll try to explain: Say I have a custom functionquery MinFloatFunction which takes as its arguments an array of valuesources. MinFloatFunction(ValueSource[] sources) In my case all these valuesources are the values of a collection of fields. What I need

Re: customer request handler doesn't envok the query tokenization chain

2007-11-05 Thread Yu-Hui Jin
Thanks, Yonik. Just curious, does the default operator ( "AND" or "OR") specify the relationship between a field/value component or between the tokens of the same field/value componenet? e.g. for a query like this: field1:abc field2:xyz does the operator connect field1:abc and field2:xyz ,