Re: stemming the index

2010-07-08 Thread Jaran Nilsen
Although I have not tested it myself yet, the Lucene-Hunspell project might be worth to have a look at: http://code.google.com/p/lucene-hunspell/ Jaran On Wed, Jul 7, 2010 at 10:15 PM, sarfaraz masood sarfarazmasood2...@yahoo.com wrote: Thanx Erick :-) --- On Thu, 8/7/10, Erick Erickson

Faceting unknown fields

2010-07-08 Thread Mickael Magniez
Hello, I'm wondering if it's possible to index and facet unknown fields. Let's me explain: I've got a set of 1M products (from computer to freezer), and each category of product has some attributes, so number of attributes is pretty large (1000+). I've started to describe each attribute in my

Re: Faceting unknown fields

2010-07-08 Thread Rebecca Watson
hi, So, can I index and facet these fields, without describe then in my schema? I will first try with dynamic fields, but I'm not sure it's going to work. we do all our facet fields in this way, with just general string field for single/multivalued fields: !-- dynamic facet fields

Re: Faceting unknown fields

2010-07-08 Thread Mickael Magniez
Thanks, I'll test your solution shortly Mickael. -- View this message in context: http://lucene.472066.n3.nabble.com/Faceting-unknown-fields-tp951008p951027.html Sent from the Solr - User mailing list archive at Nabble.com.

Spellcheck help

2010-07-08 Thread Marc Ghorayeb
Hello,I've been trying to get rid of a bug when using the spellcheck but so far with no success :(When searching for a word that starts with a number, for example 3dsmax, i get the results that i want, BUT the spellcheck says it is not correctly spelled AND the collation gives me 33dsmax.

Score boosting

2010-07-08 Thread Chamnap Chhorn
Hi everyone, I have a requirement to achieve, but i can't figure out how to do it. Hope someone could help me. Here is the requirement: A book has several keyphrases (available to use in searching). The author could buy the search result position with these keyphrases or simply add keyphrases

Distributed Indexing

2010-07-08 Thread Li Li
Is there any tools for Distributed Indexing? It refers to KattaIntegration and ZooKeeperIntegration in http://wiki.apache.org/solr/DistributedSearch. But it seems that they concern more on error processing and replication. I need a dispatcher that dispatch different docs by uniqueKey(such

Re: How do I get the matched terms of my query?

2010-07-08 Thread osocurious2
if you want only documents that have both values then make your q q=content:videos+AND+content:songs If you want the more open query, but to be able to tell which docs have videos, which have songs and which have both...then I'm not sure. Using debugQuery=on might help with your

Re: Score boosting

2010-07-08 Thread osocurious2
Sounds like you want Payloads. I don't think you can guarantee a position, but you can boost relative to others. You can give one author/book a boost of 0 for the phrase Cooking, and another author/book a boost of .5 and yet another a boost of 1.0. For searches that include the phrase Cooking,

Filter multivalue fields from search result

2010-07-08 Thread Alex J. G. Burzyński
Hi, Is it possible to remove from search results the multivalued fields that don't pass the search criteria? My schema is defined as: !-- course_id -- field name=id type=string indexed=true stored=true required=true / !-- course_name -- field name=name type=string indexed=true stored=true/ !--

solr connection question

2010-07-08 Thread ZAROGKIKAS,GIORGOS
Hi solr users I need to know how solr manages the connections when we make a request(select update commit) Is there any connection pooling or an article to learn about it connection management?? How can I log in a file the connections solr server I have setup my solr 1.4 with tomcat Thanks

Re: solr connection question

2010-07-08 Thread Sven Maurmann
Hi, Solr runs as a Web application. The requests you most probably mean are just HTTP-requests to the underlying container. Internally each request is processed against the Lucene index, usually being a file- based one. Therefore there are no connections like in a database application, where you

Re: solr connection question

2010-07-08 Thread Ruben Abad
Jorl, ok tendré que modificar mi petición de vacaciones :( Rubén Abad rua...@gmail.com On Thu, Jul 8, 2010 at 2:46 PM, ZAROGKIKAS,GIORGOS g.zarogki...@multirama.gr wrote: Hi solr users I need to know how solr manages the connections when we make a request(select update commit) Is

RE: solr connection question

2010-07-08 Thread ZAROGKIKAS,GIORGOS
Yes I mean HTTP-requests How can I log them? -Original Message- From: Sven Maurmann [mailto:sven.maurm...@kippdata.de] Sent: Thursday, July 08, 2010 3:56 PM To: solr-user@lucene.apache.org Subject: Re: solr connection question Hi, Solr runs as a Web application. The requests you most

Re: solr connection question

2010-07-08 Thread Alejandro Gonzalez
ok please don't forget it :) 2010/7/8 Ruben Abad rua...@gmail.com Jorl, ok tendré que modificar mi petición de vacaciones :( Rubén Abad rua...@gmail.com On Thu, Jul 8, 2010 at 2:46 PM, ZAROGKIKAS,GIORGOS g.zarogki...@multirama.gr wrote: Hi solr users I need to know how solr

RE: Distributed Indexing

2010-07-08 Thread Yuval Feinstein
Li, as far as I know, you still have to do this part yourself. A possible way to shard is to number the shards from 0 to numShards-1, calculate hash(uniqueKey)%numShards per each document, and send the document to the resulting shard number. This number is consistent and sends documents

Determining matched tokens in original query

2010-07-08 Thread Mark Holland
Hi, I'm trying to find out which tokens in a user's query matched against each result. I've been trying to use the highlight component for this, however it doesn't quite fit the bill. I'm using edismax, with mm set to 50%, and I want to extract for each matching doc which tokens /didn't/ match

Realtime + Batch indexing

2010-07-08 Thread bbarani
Hi, Currently we are trying to acheive both realtime and batch indexing using SOLR. For batch indexing we have setup a master SOLR server which uses DIH and indexes the data. For slave we post the XML (real time) in to the SOLR slave and add that to the existing SOLR document. Now my issue

DIH batch job

2010-07-08 Thread Sanjeev Kakar
Hi, We are trying to import data from the ORACLE database into Solr 1.4 for free text search and would like to provide a faceted search experience. There are files on the network which we are indexing as well. We are using the DIH for indexing the data from the database and have

Re: Using hl.regex.pattern to print complete lines

2010-07-08 Thread Peter Spam
To clarify, I never want a snippet, I always want a whole line returned. Is this possible? Thanks! -Pete On Jul 7, 2010, at 5:33 PM, Peter Spam wrote: Hi, I have a text file broken apart by carriage returns, and I'd like to only return entire lines. So, I'm trying to use this:

Delta Import by ID

2010-07-08 Thread Frank A
I'm still having issues - my config looks like: entity name=place query=select DestID,DestinationName,Geo_ Long,Geo_Lat,Address,City,State,Zip,PhoneNumber,cost from destinations deltaQuery=select DestID from destinations where CreationDate

Indexing slowdowns

2010-07-08 Thread Mark Holland
Since I began using the 2010-05-18 nightly I'm experiencing indexing slow downs which I didn't with solr-1.4. I'm seeing indexing slow down roughly every 7m records. I'm indexing about 28m in total. These records are batched into csv files of 1m rows, which are loaded with stream.file. Solr

Re: Using hl.regex.pattern to print complete lines

2010-07-08 Thread Peter Spam
Thanks for the note, Koji. However, hl.fragsize=0 seems to return the entire document, rather than just one single line. Here's what I tried (what I previously had was commented out): regexv = ^.*$ thequery =

Re: Indexing slowdowns

2010-07-08 Thread Robert Muir
On Thu, Jul 8, 2010 at 7:44 PM, Mark Holland mark.holl...@zoopla.co.ukwrote: Can anyone suggest where I might start looking for answers? I have a yourkit snapshot if anyone would care to see it. Doesn't sound good. I'd like to see whatever data you can provide (i worry it might be something

Re: DIH batch job

2010-07-08 Thread Lance Norskog
There is no batch job scheduling in Solr. You will have to script this with your OS tools (probably the 'cron' program). Tika is integrated into the DataImportHandler in Solr 1.5. This gives you flexibility in indexing and is worth extra effort. On Thu, Jul 8, 2010 at 10:48 AM, Sanjeev Kakar

Re: Using hl.regex.pattern to print complete lines

2010-07-08 Thread Koji Sekiguchi
(10/07/09 9:30), Peter Spam wrote: Thanks for the note, Koji. However, hl.fragsize=0 seems to return the entire document, rather than just one single line. Here's what I tried (what I previously had was commented out): regexv = ^.*$ thequery =

Re: Indexing slowdowns

2010-07-08 Thread Mark Miller
On 7/8/10 8:55 PM, Yonik Seeley wrote: Hmm, did the default number of background merge threads change sometime recently? I seem to recall so, but I can't find a reference to it. -Yonik http://www.lucidimagination.com It did change - from 3 to 1-3: maxThreadCount = Math.max(1, Math.min(3,

Re: Using symlinks to alias cores

2010-07-08 Thread Chris Hostetter
: However, the wiki recommends against using the ALIAS command in CoreAdmin in : a couple of places, and SOLR-1637 says it's been removed now anyway. correct, there were a lot of problems with how to cleanly/sanely deal with core operations on aliases -- he command may return at some future

Re: Using hl.regex.pattern to print complete lines

2010-07-08 Thread Chris Hostetter
: If you can use the latest branch_3x or trunk, hl.fragListBuilder=single : is available that is for getting entire field contents with search terms : highlighted. To use it, set hl.useFastVectorHighlighter to true. He doesn't want the entire field -- his stored field values contain multi-line

Re: Realtime + Batch indexing

2010-07-08 Thread bbarani
Hi, Thanks a lot for your reply. As you suggested the best option is to have another core started up at same / different port and use shards for distributed search. I had also thought of another approach where I would be writing the real time data to both master and slave hence it will be

Re: Realtime + Batch indexing

2010-07-08 Thread Lance Norskog
No, this second part will not work. Lucene creates new index files independent of when and what you index. So copying files from one indexer to another will never work: the indexes will be out of sync. You don't have to change your UI to use distributed search. You can add a new requestHandler

making rotating timestamped logs from solr output

2010-07-08 Thread Cam Bazz
Hello, I would like to log the solr console. although solr logs requests in timestamped format, this only logs the requests, i.e. does not log number of hits for a given query, etc. is there any easy way to do this other then reverting to methods for capturing solr output. I usually run solr on

Re: Realtime + Batch indexing

2010-07-08 Thread bbarani
Thanks a ton for your reply.. Your suggestion always helped me out :) Your inputs on configuring shards via SOLR config would help us a lot!!! One final question about replication.. When I initiate replication I thought SOLR would delete the existing index in slave and just transfers the