Re: Using IDF to find Collactions and SIPs . . ?

2010-01-03 Thread Siddhartha Pahade
pl unsubscribe me On 12/28/09, Subscriptions sub.scripti...@metaheuristica.com wrote: I am trying to write a query analyzer to pull: 1. Common phrases (also known as Collocations) with in a query 2. Highly unusual phrases (also known as Statistically Improbable Phrases or

Re: solrJ and spell check queries

2010-01-03 Thread Sascha Szott
Hi, Jay Fisher wrote: I'm trying to find a way to formulate the following query in solrJ. This is the only way I can get the desired result but I can't figure out how to get solrJ to generate the same query string. It always generates a url that starts with select and I need it to start with

Re: solrJ and spell check queries

2010-01-03 Thread Jay Fisher
Thank you. That did it. ~ Jay On Sun, Jan 3, 2010 at 7:21 AM, Sascha Szott sz...@zib.de wrote: Hi, Jay Fisher wrote: I'm trying to find a way to formulate the following query in solrJ. This is the only way I can get the desired result but I can't figure out how to get solrJ to

Re: SOLR: Replication

2010-01-03 Thread Yonik Seeley
On Sat, Jan 2, 2010 at 11:35 PM, Fuad Efendi f...@efendi.ca wrote: I tried... I set APR to improve performance... server is slow while replica; but top shows only 1% of I/O wait... it is probably environment specific; So you're saying that stock tomcat (non-native APR) was also 10 times slower?

Tokenizing problem with numbers in query

2010-01-03 Thread Bernd Brod
Hello, when searching for a string: asdf5qwerty solr will tokenize it to: asdf, 5, qwerty and display documents matching either string. How can i stop this behaviour and make it just search for plain asdf5qwerty? thanks in advance. Bernd

RE: SOLR: Replication

2010-01-03 Thread Fuad Efendi
Thank you Yonik, excellent WIKI! I'll try without APR, I believe it's environmental issue; 100Mbps switched should do 10 times faster (current replica speed is 1Mbytes/sec) -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent:

Re: Tokenizing problem with numbers in query

2010-01-03 Thread Ahmet Arslan
when searching for a string: asdf5qwerty solr will tokenize it to: asdf, 5, qwerty and display documents matching either string. How can i stop this behaviour and make it just search for plain asdf5qwerty? What is the type of your field? If you have solr.WordDelimiterFilterFactory in

Re: Tokenizing problem with numbers in query

2010-01-03 Thread Erick Erickson
This is an *extremely* useful page for figuring out what various tokenizers/filters are doing. The javadocs for the classes referenced can also provide some additional details http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters Erick On Sun, Jan 3, 2010 at 11:26 AM, Bernd Brod

Re: SOLR: Replication

2010-01-03 Thread Peter Wolanin
Related to the difference between rsync and native Solr replication - we are seeing issues with Solr 1.4 where search queries that come in during a replication request hang for excessive amount of time (up to 100's of seconds for a result normally that takes ~50 ms). We are replicating pretty

Re: SOLR Performance Tuning: Pagination

2010-01-03 Thread Peter Wolanin
At the NOVA Apache Lucene/Solr Meetup last May, one of the speakers from Near Infinity (Aaron McCurry I think) mentioned that he had a patch for lucene that enabled unlimited depth memory-efficient paging. Is anyone in contact with him? -Peter On Thu, Dec 24, 2009 at 11:27 AM, Grant Ingersoll

Re: Remove the deleted docs from the Solr Index

2010-01-03 Thread Ravi Gidwani
Lance: At times we dont have the freedom make these Database changes. Currently I am in this situation. Hence the requirement on the DIH. ~Ravi. On Sat, Jan 2, 2010 at 3:44 PM, Lance Norskog goks...@gmail.com wrote: The other option is to have a 'deleted' column in your table, and

Any way to modify result ranking using an integer field?

2010-01-03 Thread Andy
Is there any way to modify result ranking using an integer field? I have documents that have an integer field popularity. I want to rank results by a combination of normal fulltext search relevance and popularity. It's kinda like search in digg - result ranking is based on the search

Re: Any way to modify result ranking using an integer field?

2010-01-03 Thread Ahmet Arslan
Is there any way to modify result ranking using an integer field? I have documents that have an integer field popularity. I want to rank results by a combination of normal fulltext search relevance and popularity. It's kinda like search in digg - result ranking is based on the

Indexing the latests MS Office documents

2010-01-03 Thread Roland Villemoes
Hi All, Anyone who knows how to index the latest MS office documents like .docx and .xlsx ? From searching it seems like Tika only supports the earlier formats .doc and .xls med venlig hilsen/best regards Roland Villemoes Tel: (+45) 22 69 59 62 E-Mail: mailto:r...@alpha-solutions.dk

Re: SOLR: Replication

2010-01-03 Thread Yonik Seeley
On Sun, Jan 3, 2010 at 2:55 PM, Peter Wolanin peter.wola...@acquia.com wrote: Related to the difference between rsync and native Solr replication - we are seeing issues with Solr 1.4 where search queries that come in during a replication request hang for excessive amount of time (up to 100's

Re: Indexing the latests MS Office documents

2010-01-03 Thread Mattmann, Chris A (388J)
Hi Roland, You probably want to send your email to tika-u...@lucene.apache.org. Best of luck! Cheers, Chris On 1/3/10 4:00 PM, Roland Villemoes r...@alpha-solutions.dk wrote: Hi All, Anyone who knows how to index the latest MS office documents like .docx and .xlsx ? From searching

Rules engine and Solr

2010-01-03 Thread Avlesh Singh
I have a Solr (version 1.3) powered search server running in production. Search is keyword driven is supported using custom fields and tokenizers. I am planning to build a rules engine on top search. The rules are database driven and can't be stored inside solr indexes. These rules would

Re: performance question

2010-01-03 Thread A. Steven Anderson
Sorting and index norms have space penalties. Sorting on a field creates an array of Java ints, one for every document in the index. Index norms (used for boosting documents and other things) create an array of bytes in the Lucene index files, one for every document in the index. If you sort

Re: performance question

2010-01-03 Thread Chris Hostetter
: If you sort on many of your dynamic fields your memory use will : explode, and the same with index norms and disk space. : Thanks for the info. In general, I knew sorting was expensive, but I didn't : realize that dynamic fields made it worse. dynamic fields don't make it worse ... the

Re: performance question

2010-01-03 Thread A. Steven Anderson
dynamic fields don't make it worse ... the number of actaul field names you sort on makes it worse. If you sort on 100 fields, the cost is the same regardless of wether all 100 of those fields exist because of a single dynamicField/ declaration, or 100 distinct field/ declarations.

Re: Any way to modify result ranking using an integer field?

2010-01-03 Thread Andy
Thanks Ahmet. Do I need to do anything to enable BoostQParserPlugin in Solr, or is it already enabled? --- On Sun, 1/3/10, Ahmet Arslan iori...@yahoo.com wrote: From: Ahmet Arslan iori...@yahoo.com Subject: Re: Any way to modify result ranking using an integer field? To:

Search algorithm used in Solr

2010-01-03 Thread abhishes
Hello everyone, Is there an article which explains (on a high level) the algorithm of search in Solr? How does Solr search approach compare to the inverted index technique? Regards, Abhishek --Original Message-- From: Mattmann, Chris A (388J) To: solr-user@lucene.apache.org ReplyTo:

RE: Reverse sort facet query [SOLR-1672]

2010-01-03 Thread Chris Hostetter
: Yes, I thought about adding some 'new syntax', but I opted for a separate 'facet.sortorder' parameter, : : mainly because I'm not familiar enough with the codebase to know what effect this might have on : : backward compatibility. It would be easy enough to modify the patch I created to do

Re: Any way to modify result ranking using an integer field?

2010-01-03 Thread Andy
What I meant was that is there any way to makeĀ  {!boost b=log(popularity)} the default query type so that every query will be using it. From: Andy angelf...@yahoo.com Subject: Re: Any way to modify result ranking using an integer field? To: solr-user@lucene.apache.org Date: Monday, January 4,