Re: Retrieve time of last optimize

2010-04-23 Thread Jon Baer
I don't think there is anything low level in Lucene that will specifically output anything like lastOptimized() to you, since it can be setup a few ways. Your best bet is probably adding a postOptimize hook and dumping it to log / file / monitor / etc, probably something like ... listener

Re: Solr full-import not working as expected

2010-04-23 Thread MitchK
Saratv, is there any unique-ID (defined in your schema.xml) that may be duplicate? - Mitch saratv wrote: I am trying to use DIH (where database has around 93k rows..from different tables), and when i ran full import few times, only 91k documents were indexed (not sure why and what

Re: Best way to prevent this search lockup (apparently caused during big segment merges)?

2010-04-23 Thread Michael McCandless
I don't know much about how Solr does its locking, so I'm guessing below: It looks like one thread is doing a commit, by closing the writer, and is likely holding a lock that prevents other (add/delete) ops from running? Probably this lock is held because the writer is in the process of being

Questions on autocommit and optimize operations

2010-04-23 Thread dipti khullar
Hi Solr Gurus We are thinking about optimizing our production master slave solr setup, just wanted to poll the group on following questions: 1. Currently we are using autocommit feature with setting of 50 docs and 5 mins. Now the requirement is to reduce this time. So we are analyzing the

Problem with pdf, upgrading Cell

2010-04-23 Thread Marc Ghorayeb
Hello, I configured a Solr server to be able to extract data from various documents, including pdfs. Unfortunately, the data extraction fails on several pdfs. I have read around here that this may be due to the old Tika library being used?I looked around and saw that the svn had a newer

Re: Problem with pdf, upgrading Cell

2010-04-23 Thread Otis Gospodnetic
Marc, got anything in your logs? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Marc Ghorayeb dekay...@hotmail.com To: solr-user@lucene.apache.org Sent: Fri, April 23, 2010 8:42:53

RE: Problem with pdf, upgrading Cell

2010-04-23 Thread Marc Ghorayeb
I'm launching it with the start.jar utility, and there doesn't seem to be anything weird inside the console when i upload a pdf. Is there a way to output the console to a log file? The only log file that get's updated is a log file in the logs directory, and it seems to only show the

RE: Problem with pdf, upgrading Cell

2010-04-23 Thread Marc Ghorayeb
Seems like i'm not the only one with this no extraction problem:http://www.mail-archive.com/solr-user@lucene.apache.org/msg33609.htmlApparently he tried the same thing, building from the trunk, and indexing a pdf, and no extraction occured... Strange. Marc G. From: dekay...@hotmail.com To:

RE: Problem with pdf, upgrading Cell

2010-04-23 Thread Marc Ghorayeb
Seems like i'm not the only one with this no extraction problem:http://www.mail-archive.com/solr-user@lucene.apache.org/msg33609.htmlApparently he tried the same thing, building from the trunk, and indexing a pdf, and no extraction occured... Strange. Marc G.

Comparing two queries

2010-04-23 Thread Villemos, Gert
We want to support that a user can register for interest in information, based on a query he has defined himself. For example that he type in a query, press a save button, provides his email and the system will now email him with a daily digest. As part of this, it would be nice to be able to

Multiple query searches in one request

2010-04-23 Thread phoey
Hi there, Is it possible to do a search more than once, where only the filter query changes. The response is the three different search results. We want a page which shows a clustered view of 5 of each of the three types (images, news articles, editorial articles), ordered by their score. One

Re: Comparing two queries

2010-04-23 Thread Otis Gospodnetic
Hello Gert, I think you'd have to apply custom heuristics that involves looking at top N hits for each query and looking at the % overlap. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message

What hardware do I need ?

2010-04-23 Thread Xavier Schepler
Hi, I'm working with Solr 1.4. My schema has about 50 fields. I'm using full text search in short strings (~ 30-100 terms) and facetted search. My index will have 100 000 documents. The number of requests per second will be low. Let's say between 0 and 1000 because of auto-complete. Is a

Re: Problem with pdf, upgrading Cell

2010-04-23 Thread Otis Gospodnetic
Marc, These are your request logs. You want to look at your Solr logs. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Marc Ghorayeb dekay...@hotmail.com To:

Merging Solr Cores Urgent

2010-04-23 Thread abhatna...@vantage.com
Hi, I have a Question- Merging Solr Cores The Wiki Documentation says that Merged core must exist prior to calling the merge command So I created the Merged core and pointed it to some data dir. However even after merging the cores it does still points to the old data dir Shouldn't the merge

Re: Multiple query searches in one request

2010-04-23 Thread Otis Gospodnetic
Hi, Yes, a custom SearchComponent will do this. We'd done stuff like this before and actually have this sort of functionality in some of Sematext products - it works well if you don't mind writing and adding another SearchComponent to your chain. Otis Sematext :: http://sematext.com/

Re: What hardware do I need ?

2010-04-23 Thread Otis Gospodnetic
Xavier, 0-1000 QPS is a pretty wide range. Plus, it depends on how good your auto-complete is, which depends on types of queries it issues, among other things. 100K short docs is small, so that will all fit in RAM nicely, assuming those other processes leave enough RAM for the OS to cache the

RE: Problem with pdf, upgrading Cell

2010-04-23 Thread Marc Ghorayeb
Yes, the only log i can actually get is the one in the command console from windows and there are no errors there ... Here are the last lines when i upload a pdf to the update/extract url: Apr 23, 2010 5:47:03 PM org.apache.solr.servlet.SolrServlet initINFO: SolrServlet.init() doneApr 23, 2010

Re: What hardware do I need ?

2010-04-23 Thread Xavier Schepler
Le 23/04/2010 17:08, Otis Gospodnetic a écrit : Xavier, 0-1000 QPS is a pretty wide range. Plus, it depends on how good your auto-complete is, which depends on types of queries it issues, among other things. 100K short docs is small, so that will all fit in RAM nicely, assuming those other

Re: Problem with pdf, upgrading Cell

2010-04-23 Thread Paul Borgermans
On Fri, Apr 23, 2010 at 5:48 PM, Marc Ghorayeb dekay...@hotmail.com wrote: Yes, the only log i can actually get is the one in the command console from windows and there are no errors there ... Here are the last lines when i upload a pdf to the update/extract url: snip I am pretty sure it is

SolrJ + BasicAuth

2010-04-23 Thread Jon Baer
Uggg I just got bit hard by this on a Tomcat project ... https://issues.apache.org/jira/browse/SOLR-1238 Is there anyway to get access to that RequestEntity w/o patching? Also are there security implications w/ using the repeatable payloads? Thanks. - Jon

Re: Comparing two queries

2010-04-23 Thread Erik Hatcher
Or, use facet.query to get the overlap. Here's ? q=query1facet=onfacet.query=query2 You'll get the hit count from query #1 in the results, and the overlapping count to query #2 in the facet query response. Erik - http://www.lucidimagination.com On Apr 23, 2010, at 11:01 AM, Otis

Solr does not honor facet.mincount and field.facet.mincount

2010-04-23 Thread Umesh_
Hi All, I am trying to restrict facets in solr response, by setting facet.mincount = 1, which does not work as the request and response are shown below: REQUEST:

Tomcat vs. WebSphere

2010-04-23 Thread Ken Lane (kenlane)
Does anyone know of any advantages/disadvantages to running SOLR on WebSphere versus Tomcat? Thanks, Ken

Re: Tomcat vs. WebSphere

2010-04-23 Thread Otis Gospodnetic
I've never used WebSphere, but I always got the impression that people have more issues with it than with simpler solutions. Personally, I would suggest Jetty. I've used it dozens of times and never had issues with it. It's small, simple, and fast. Otis Sematext :: http://sematext.com/

Re: What hardware do I need ?

2010-04-23 Thread Otis Gospodnetic
Xavier, 100-700 QPS is still high. I'm guessing your 1 box won't handle that without sweating a lot (read: slow queries). Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Xavier

Re: Best way to prevent this search lockup (apparently caused during big segment merges)?

2010-04-23 Thread Otis Gospodnetic
Chris, It looks like Mike already offered several solutions though I don't know what Solr does without looking at the code. But I'm curious: * how big is your index? and do you know how large the segments being merged are? * do you batch docs or do you make use of Streaming SolrServer?

Re: Collapse problem

2010-04-23 Thread Chris Hostetter
: basically, we are running query with field collapsing (Solr 1.4 with : patch 236). The responses tells us that there are about 2700 documents : matching our query. However, I can not get passed the 431th document. : From this point on, the response will not contain any document. isn't that

Re: Tomcat vs. WebSphere

2010-04-23 Thread Deo, Shantanu
We have run SOLR in weblogic without problems. The only change we see is some spurious extra logging info which we don¹t see in the case of tomcat. Anyone have an idea of how to control that ? Thanks Shantanu On 4/23/10 12:53 PM, Ken Lane (kenlane) kenl...@cisco.com wrote: Does anyone know of

RE: Comparing two queries

2010-04-23 Thread Villemos, Gert
I was thinking along the lines 1. Retrieve the top result for one query. 2. Take the resulting document and evaluate the score that it would get in another query. 3. If the scores are similar, then the queries most likely overlap. I guess that if I had two simple query strings archive crash

Re: Comparing two queries

2010-04-23 Thread Otis Gospodnetic
Gert, In your second query example you used qf= Did you mean fq= ? If so, the answer is no - filter queries don't affect the score. I haven't tried your approach, but intuitively feel that looking at % overlap may work better. Otis Sematext :: http://sematext.com/ :: Solr -

RE: Comparing two queries

2010-04-23 Thread Villemos, Gert
Yes, your solution is much simpler, providing the result through a single query. I didnt understand it the first time I read it. I guess you would need to run it backwards as well to really evaluate the relevance, i.e. First q=query1facet=onfacet.query=query2 Then

Re: Solr full-import not working as expected

2010-04-23 Thread MitchK
Unfortunately you haven't answered my question, saratv. The important question is, why did your DIH-configuration not import those rows. Without providing any schema-information or configuration-details of your DIH, no one will be able to help you. Just for the future: If something don't work,

Re: Solr does not honor facet.mincount and field.facet.mincount

2010-04-23 Thread Koji Sekiguchi
Umesh_ wrote: Hi All, I am trying to restrict facets in solr response, by setting facet.mincount = 1, which does not work as the request and response are shown below: REQUEST:

Boost function on *:*

2010-04-23 Thread Blargy
Is it possible to use boost function across the whole index/empty search term? I'm guessing the next question that would be asked is Why would you want to do that. Well with have a bunch of custom business metrics included in each document (a product). I would like to only show the best

mix cased search terms

2010-04-23 Thread Tuan Nguyen
Hello list, first time posting here. I am trying to find an answer to a strange search behaviour we're finding in our VuFind application. In order to eliminate any VuFind related variables, I have used the vanilla Solr example schema to try our problematic search. I posted this xml to the