Can one determine which results are "good enough" to alert users about?

2012-05-09 Thread Chris Harris
I'm trying to think through a Solr-based email alerting engine that would have the following properties: 1. Users can enter queries they want to be alerted on, and the syntax for alert queries should be the same syntax as my regular solr (dismax) queries. 1a. Corollary: Because of not just tf-idf

Re: Tomcat startup script

2010-06-08 Thread Chris Harris
For the record, I've been running one of our production Solr 1.4 installs under the Ubuntu 9.04 tomcat6 + OpenJDK. package, and haven't run into difficulties yet. On Tue, Jun 8, 2010 at 8:00 AM, K Wong wrote: > Okay. I've been running multicore Solr 1.4 on Tomcat 5.5/OpenJDK 6 > straight out of t

Re: TikaEntityProcessor on Solr 1.4?

2010-05-21 Thread Chris Harris
You are right that TikaEntityProcessor has a couple of other prereqs beyond stock Solr 1.4. I think the main point is that they're relatively minor. I've merged TikaEntityProcessor (and some prereqs) and its dependencies into my Solr 1.4 tree and it compiles fine, though I haven't yet tested that T

Re: Best way to prevent this search lockup (apparently caused during big segment merges)?

2010-04-26 Thread Chris Harris
use I've never encountered this problem before... > > Thanks, > Otis > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > > > > - Original Message >> From: Chris Harris >> T

Best way to prevent this search lockup (apparently caused during big segment merges)?

2010-04-22 Thread Chris Harris
I'm running Solr 1.4+ under Tomcat 6, with indexing and searching requests simultaneously hitting the same Solr machine. Sometimes Solr, Tomcat, and my (C#) indexing process conspire to render search inoperable. So far I've only noticed this while big segment merges (i.e. merges that take multiple

Function Queries Use Indexed Values, Not Stored Values, Right?

2010-04-21 Thread Chris Harris
I pretty sure that function queries always work off of indexed values, rather than stored values. So, for example, if you want to use a field in a function query, it needs to be indexed. I want to add this fact to the wiki, where it's not currently stated explicitly, but I wanted to first confirm t

Re: Handling missing date fields in a date-oriented function query

2010-04-16 Thread Chris Harris
that none of my legitimate date field values will evaluate to numeric value 0. Since I don't show the time component of dates to my users, I don't think this would cause any real trouble. It feels slightly unclean, though. On Thu, Apr 8, 2010 at 1:05 PM, Chris Harris wrote: > If anyon

Re: Handling missing date fields in a date-oriented function query

2010-04-08 Thread Chris Harris
If anyone is curious, I've created a patch that creates a variant of map that can be used in the way indicated below. See http://issues.apache.org/jira/browse/SOLR-1871 On Wed, Apr 7, 2010 at 3:41 PM, Chris Harris wrote: > Option 1. Use map > > The most obvious way to do this wo

Re: Handling missing date fields in a date-oriented function query

2010-04-08 Thread Chris Harris
che.org/solr/FunctionQuery#max which says that "max(x,c) returns the max of another function __and a constant__". I assume the decision to support only constants was made with performance in mind. > > On Wed, Apr 7, 2010 at 3:41 PM, Chris Harris wrote: >> I'm using fu

Handling missing date fields in a date-oriented function query

2010-04-07 Thread Chris Harris
I'm using function queries to boost more recent documents, using something like the recip(ms(NOW,mydatefield),3.16e-11,1,1) approach described on the wiki: http://wiki.apache.org/solr/FunctionQuery#Date_Boosting What I'd like to do is figure out the best way to tweak how documents with missi

Re: error with multicore CREATE action

2009-11-23 Thread Chris Harris
Are there any use cases for CREATE where the instance directory *doesn't* yet exist? I ask because I've noticed that Solr will create an instance directory for me sometimes with the CREATE command. In particular, if I run something like http://solrhost/solr/admin/cores?action=CREATE&name=newcore&i

Making search results more stable as index is updated

2009-11-13 Thread Chris Harris
If documents are being added to and removed from an index (and commits are being issued) while a user is searching, then the experience of paging through search results using the obvious solr mechanism (&start=100&Rows=10) may be disorienting for the user. For one example, by the time the user clic

Re: Index backup with new replication?

2009-09-29 Thread Chris Harris
The documentation could maybe be improved, but the basics of backup snapshots with the in-process (Java-based) replication handler actually seem pretty straightforward to me, now that I understand it: 1. You can make a snapshot whenever you want by hitting http://master_host:port/solr/replication?

FileNotFoundException in Java replication handler backups

2009-09-28 Thread Chris Harris
Thanks to Noble Paul, I think I now understand the Java replication handler's backup feature. It seems to work as expected on a toy index. When trying it out on a copy of my production index (300GB-ish), though, I'm getting FileNotFoundExceptions. These cancel the backup, and delete the snapshot.yy

Re: Use cases for ReplicationHandler's backup facility?

2009-09-28 Thread Chris Harris
2009/9/24 Noble Paul നോബിള്‍ नोब्ळ् : > Yes, the only reason to take a backup should be for restoration/archival > They should contain all the files required for the latest commit point. Ok, I think I get it now. I assumed "all the files required for the latest commit point" meant that the backup

Re: Use cases for ReplicationHandler's backup facility?

2009-09-24 Thread Chris Harris
2009/9/24 Noble Paul നോബിള്‍ नोब्ळ् : > On Fri, Sep 25, 2009 at 4:57 AM, Chris Harris wrote: >> The ReplicationHandler (http://wiki.apache.org/solr/SolrReplication) >> has support for "backups", which can be triggered in one of two ways: >> >> 1. in respo

Use cases for ReplicationHandler's backup facility?

2009-09-24 Thread Chris Harris
The ReplicationHandler (http://wiki.apache.org/solr/SolrReplication) has support for "backups", which can be triggered in one of two ways: 1. in response to startup/commit/optimize events (specified through the backupAfter tag specified in the handler's requestHandler tag in solrconfig.xml) 2. by

Re: Latest trunk locks execution thread in SolrCore.getSearcher()

2009-09-21 Thread Chris Harris
Solr now starts up fine in Tomcat with both JMX and ReplicationHandler enabled if I construct my working copy as follows: download solr-r815830 reverse merge r815587 ("SOLR-1427: fixed registry MBean issue") apply 1427.afterlatch.patch I haven't tried applying 1427.afterlatch.patch to the svn hea

Re: copyfield at search time?

2009-09-18 Thread Chris Harris
If the reason you're copying from member_of to member_of_facet is because faceting isn't allowed on multi-valued fields, then that's no longer true. See https://issues.apache.org/jira/browse/SOLR-475 which is in the trunk and which will be available in the 1.4 release. If you're running an ea

Re: Latest trunk locks execution thread in SolrCore.getSearcher()

2009-09-18 Thread Chris Harris
Forgot to answer this one. Yes, I do have a warming query to get the sort caches up to speed. I think it takes a while to run; my guess would be 30 seconds or so. 2009/9/18 Grant Ingersoll : > > Also, do you have warming queries setup? > > On Sep 17, 2009, at 12:46 PM, Chris Harris wr

Re: Latest trunk locks execution thread in SolrCore.getSearcher()

2009-09-18 Thread Chris Harris
wrote: > >> Can you try the patch I just put up on >> https://issues.apache.org/jira/browse/SOLR-1427 and let me know if it works >> when JMX is enabled? >> >> Also, do you have warming queries setup? >> >> On Sep 17, 2009, at 12:46 PM, Chris Harris wrot

Re: Latest trunk locks execution thread in SolrCore.getSearcher()

2009-09-17 Thread Chris Harris
execution thread in SolrCore.getSearcher() > > Interesting... I still haven't been able to reproduce a hang with either > jetty or tomcat. > I enabled replication and JMX... still nothing. > > -Yonik > http://www.lucidimagination.com > > > On Thu, Sep 17, 20

Re: Latest trunk locks execution thread in SolrCore.getSearcher()

2009-09-17 Thread Chris Harris
k.server.JkMain start INFO: Jk running ID=0 time=0/32 config=null Sep 17, 2009 12:01:22 PM org.apache.catalina.startup.Catalina start INFO: Server startup in 2709 ms 2009/9/17 Chris Harris : > I found what looks like the same issue when I tried to install r815830 > under Tomcat. (It works ok wit

Re: Latest trunk locks execution thread in SolrCore.getSearcher()

2009-09-17 Thread Chris Harris
I found what looks like the same issue when I tried to install r815830 under Tomcat. (It works ok with the normal Jetty example/start.jar.) I haven't checked the stack trace, but Tomcat would hang right after the message INFO: Adding debug component:org.apache.solr.handler.component.debugcompon..

Re: How to create a new index file automatically

2009-09-15 Thread Chris Harris
There are a few different ways to get data into Solr. XML is one way, and probably the most common. As far as Solr is concerned it doesn't matter whether you construct XML input by hand or write some kind of code to do it. Solr won't automatically create any files like the example .xml files for yo

Re: Is kill -9 safe or not?

2009-08-08 Thread Chris Harris
This should also be true of the various ways you might force quit in Windows, right? 2009/8/7 Yonik Seeley > Kill -9 will not corrupt your index, but you would lose any > uncommitted documents. > > -Yonik > http://www.lucidimagination.com > >

Picking Facet Fields by Frequency-in-Results

2009-08-03 Thread Chris Harris
One task when designing a facet-based UI is deciding which fields to facet on and display facets for. One possibility that I hope to explore is to determine which fields to facet on dynamically, based on the search results. In particular, I hypothesize that, for a somewhat heterogeneous index (hete

Using sfloat for price field in example schema.xml

2009-07-01 Thread Chris Harris
There seems to be a near-universal condemnation in best practices guides of using floating point types to store prices, and yet this is exactly what Solr's example schema.xml does. This leads to a couple of questions: 0. Is my above assessment just wrong? 1. Is there something unique about Solr

Re: Java OutOfmemory error during autowarming

2009-05-31 Thread Chris Harris
Solr offers no configuration for FieldCache, neither in solrconfig.xml nor anywhere else; rather, that cache gets populated automatically in the depths of Lucene when you do a sort (or also apparently, as Yonik says, when you use a field in a function query). >From the wiki: 'Lucene has a low leve

Cleanly shutting down Solr/Jetty on Windows

2009-05-20 Thread Chris Harris
I'm running Solr with the default Jetty setup on Windows. If I start solr with "java -jar start.jar" from a command window, then I can cleanly shut down Solr/Jetty by hitting Control-C. In particular, this causes the shutdown hook to execute, which appears to be important. However, I don't especia

Solr failed to write/read index files

2009-05-01 Thread Chris Harris
About a week and a half into simultaneously growing and querying a new Solr index, the index has gotten corrupted, as reflected by the following IOExceptions: * java.io.IOException: Cannot overwrite: E:\solr-10009\solr\filingcore\data\index\_1kir.tis * java.io.FileNotFoundException: E:\solr-1000

Re: How to create a query directly (bypassing the query-parser)?

2009-03-31 Thread Chris Harris
2009/3/31 Erik Hatcher : > > On Mar 31, 2009, at 2:13 PM, Development Team wrote: >> >> On the Lucene query parser syntax page ( >> http://lucene.apache.org/java/2_4_0/queryparsersyntax.html) linked to from >> the Solr query syntax page, they mention: >> "If you are programmatically generating a qu

Re: Increasing the number of results

2009-03-31 Thread Chris Harris
If you're using the GET interface, add the "rows" parameter to your URL. See http://wiki.apache.org/solr/CommonQueryParameters See also the "start" parameter. 2009/3/31 Sajith Weerakoon : > Hello all, > > > > I am writing an application using solr and I am having a problem in > increasing the nu

More Robust Search Timeouts (to Kill Zombie Queries)?

2009-03-27 Thread Chris Harris
I've noticed that some of my queries take so long (5 min+) that by the time they return, there is no longer any plausible use for the search results. I've started calling these zombie queries because, well, they should be dead, but they just won't die. Instead, they stick around, wasting my Solr bo

Re: Missing required field: id Using ExtractingRequestHandler

2009-03-19 Thread Chris Harris
Unless there's a regression in the ExtractingRequestHandler, then this should be caused because both A) you have an id field defined in your solr schema file that's marked as a required field and B) you did not specify an ID parameter when you submitted your document to the handler. If you don'

Re: DataImportHandler Robustness For Imports That Take A Long Time

2009-03-13 Thread Chris Harris
000). and modify your query to take to > consider that entry so that subsequent imports will start from there. > > DIH does not write the last_index_time unless the import completes > successfully. > > On Tue, Mar 10, 2009 at 1:54 AM, Chris Harris wrote: >> I have a datas

DataImportHandler Robustness For Imports That Take A Long Time

2009-03-09 Thread Chris Harris
I have a dataset (7M-ish docs each of which is maybe 1-100K) that, with my current indexing process, takes a few days or maybe a week to put into Solr. I'm considering maybe switching to indexing with the DataImportHandler, but I'm concerned about the impact of this on indexing robustness: If I u

Latest on DataImportHandler and Tika?

2009-02-04 Thread Chris Harris
Back in November, Shalin and Grant were discussing integrating DataImportHandler and Tika. Shalin's estimation about the best way to do this was as follows: ** I think the best way would be a TikaEntityProcessor which knows how to handle documents. I guess a typical use-case would be FileListEnti

Re: Store limited text

2009-01-27 Thread Chris Harris
If you're using a Solr build post-r721758, then copyfield has a maxChars property you can take advantage of. I'm probably misremembering some of the exact names of these elements/attributes, but you can basically have this in your schema.xml: Then anything you store in field f will get copied

Re: Results not appearing

2009-01-24 Thread Chris Harris
Without you stopping Solr itself, a solr client can remove all the documents in an index by doing a delete-by-query with the query "*:*" (without quotes). For XML interface clients, see http://wiki.apache.org/solr/UpdateXmlMessage. Solrj would have another way to do it. You'll need to do a commit a

Re: Results not appearing

2009-01-24 Thread Chris Harris
I should clarify that I misspoke before; I thought you had indexed="true" on Message-Id and Date, whereas you had it on Message-Id and Content. It sounds like you figured this out and interpreted my reply in a useful way nonetheless, though. So that's good. The post tool should be a valid way to c

Re: Results not appearing

2009-01-23 Thread Chris Harris
These might be obvious, but: * I assume you did a Solr commit command after indexing, right? * If you are using the fieldtype definitions from the default schema.xml, then your "string" fields are not being analyzed, which means you should expect search results only if you enter the entire, exact

Re: Solr stemming -> preserve original words

2009-01-23 Thread Chris Harris
It seems like what's desired is not so much a stemmer as what you might call a "canonicalizer", which would translate each source word not into its "stem" but into its "most canonical form". Critically, the latter, by definition, is always a legitimate word, e.g. "run". What's more, it's always the

Re: prefetching question

2009-01-13 Thread Chris Harris
Maybe it's just me, but I'm not sure what you mean by "prefetching". (I don't even know if you're talking about an indexing-time activity or a query-time activity.) My guess is that you'll get a more helpful reply if you can make your question more specific. Cheers, Chris On Tue, Jan 13, 2009 at

Re: Restricting results based on user authentication

2009-01-12 Thread Chris Harris
On Mon, Jan 12, 2009 at 9:31 PM, Manupriya wrote: > > Thanks Chris, > > I agree with your approach. I also dont want to add anything at the > application level. I want authentication to be handled internally at the > Solr level itself. The application layer needs to be involved somehow, right, be

Highlighting Trouble With Bigram Shingle Index

2009-01-12 Thread Chris Harris
I'm running into some highlighting issues that appear to arise only when I'm using a bigram shingle (ShingleFilterFactory) analyzer. I started with a bigram-free situation along these lines:

Re: Restricting results based on user authentication

2009-01-12 Thread Chris Harris
Hi Manu, I haven't made a custom request handler in a while, but I want to clarify that, if you trust your application code, you don't actually need a custom request handler to do this sort of authentication filtering. At indexing time, you can add a "role" field to each object that you index, as

Re: Error, when i update the rich text documents such as .doc, .ppt files.

2008-12-11 Thread Chris Harris
I don't have time to verify this now, but the RichDocumentHandler does not have a separate contrib directory and I don't think the RichDocumentHandler patch makes a jar particular to the handler; instead, the java files get dumped in the main solr tree (java/org/apache/solr) , and therefore they ge

Re: [VOTE] Community Logo Preferences

2008-11-25 Thread Chris Harris
https://issues.apache.org/jira/secure/attachment/12394282/solr2_maho_impression.png https://issues.apache.org/jira/secure/attachment/12394475/solr2_maho-vote.png

Re: Pagination with Solr

2008-11-21 Thread Chris Harris
Yo no suelo usar solrj, pero creo que quiere investigar SolrQuery.setStart() y SolrQuery.setRows(). Creo que el primero es para indicar el numero del primero resultado que quere obtener y el segundo es para indicar cuantos resultados quiere obtener. Si quere todos los resultados, probablemente va a

Highlighting Oddities

2008-11-05 Thread Chris Harris
I'm testing out the default (gap) fragmenter with some simple, single-word queries on a patched 1.3.0 release populated with some real-world data. (I think the primary quirk in my setup is that I'm using ShingleFilterFactory to put word bigrams (aka shingles) into my index. I was worried that this

Choosing Which Branch To Use

2008-11-04 Thread Chris Harris
My current pre-production Solr install is a 1.3 pre-release build, and I think I'm going to update to a more recent version before an upcoming product release. Actually, "release" is probably a bit of an exaggeration; it's more of an alpha test, or perhaps a beta test. Anyway, the question is which

Re: date range query performance

2008-10-29 Thread Chris Harris
Do you need to search down to the minutes and seconds level? If searching by date provides sufficient granularity, for instance, you can normalize all the time-of-day portions of the timestamps to midnight while indexing. (So index any event happening on Oct 01, 2008 as 2008-10-01T00:00:00Z.) That

Qsol (or surround or xmlqueryparser...) in Solr

2008-10-29 Thread Chris Harris
I was just looking at Mark Miller's Qsol parser for Lucene ( http://www.myhardshadow.com/qsol.php), and my users would really like to have a similar ability to combine proximity and boolean search in arbitrary, nested ways. The simplest use case I'm interested in is "phrase proximity", where you sa

Highlighting Unindexed Fields

2008-09-03 Thread Chris Harris
http://wiki.apache.org/solr/FieldOptionsByUseCase says that a field needs to be both stored and indexed for highlighting to work. Unless I'm very confused, though, I just tested and highlighting worked fine (on trunk) for a stored, *non-indexed* field. So is this info perhaps out of date? Assuming

"background merge hit exception"

2008-09-02 Thread Chris Harris
I've made some changes to my Solr setup, and now I'm getting the "background merge hit exception" pasted at the end of this message. The most notable changes I've made are: Update to r690989 (Lucene r688745) Change a few things in my schema. In particular, I was previously storing my main documen

Re: Best way to tell which Lucene revision a Solr build comes with?

2008-08-31 Thread Chris Harris
ecification Version: 2008-08-27_02-04-10 Lucene Implementation Version: 2008-08-27_02-04-10 ${svnversion} - 2008-08-27 02:13:53 I feel like I've previously seen ${svnversion} expanded into a concrete number on this page. On Sun, Aug 31, 2008 at 2:54 PM, Chris Harris <[EMAIL PROTEC

Best way to tell which Lucene revision a Solr build comes with?

2008-08-31 Thread Chris Harris
I'm getting to the point where building Solr involves: 1. Figuring out which Lucene revision the Solr build I downloaded was built against 2. Downloading/patching/building that revision of Lucene 3. Copying over the new Lucene jars into Solr's lib directory 4. Building Solr What's the best way to

Re: "Auto commit error" and java.io.FileNotFoundException

2008-08-28 Thread Chris Harris
inal "massive deletion" problem to > happen. I guess first try it without the infoStream change, since it's > possible the infoStream change prevented the issue from happening? > > Mike > > Chris Harris wrote: > >> I'll see about using a newer/

Re: "Auto commit error" and java.io.FileNotFoundException

2008-08-21 Thread Chris Harris
I'll see about using a newer/older JVM. In the meantime, according to the Solr admin page, which seems to get its info like so LucenePackage.class.getPackage().getImplementationVersion() what I've been testing is Lucene r652650. The Solr version is r654965, now modified of course to do some

Re: "Auto commit error" and java.io.FileNotFoundException

2008-08-21 Thread Chris Harris
up and post them, I'll dig through them > to see if they give any clues... > > Mike > > Chris Harris wrote: > >> Ok, I did what you suggested, giving each SolrIndexWriter its own >> "infoStream" log file, created in the init() method. The thing is, I >

Re: "Auto commit error" and java.io.FileNotFoundException

2008-08-20 Thread Chris Harris
IndexWriter's infoStream. > > I think you may need to modify the SolrIndexWriter.java sources, in the init > method, to add a call to setInfoStream(...). > > Can any Solr developers confirm this? > > Mike > > Chris Harris wrote: > >> I'm assuming that one w

Re: "Auto commit error" and java.io.FileNotFoundException

2008-08-18 Thread Chris Harris
I'm assuming that one way to do this would be to set the logging level to "FINEST" in the "logging" page in the solr admin tool, and then to make sure my logging.properties file is also set to record the FINEST logging level. Let me know if that won't enable to sort of debugging info you are talkin

Re: "Auto commit error" and java.io.FileNotFoundException

2008-08-16 Thread Chris Harris
On Sat, Aug 16, 2008 at 4:33 AM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > Can you try Lucene's CheckIndex tool on it and report what it says? > > On Aug 15, 2008, at 1:35 PM, Chris Harris wrote: > >> I have an index (different from the ones mentioned yesterday) tha

Using Shingles to Increase Phrase Search Performance

2008-08-16 Thread Chris Harris
Mike Klaas suggested last month that I might be able to improve phrase search performance by indexing word bigrams, aka bigram shingles. I've been playing with this, and the initial results are very promising. (I may post some performance data later.) I wanted to describe my technique, which I'm no

Re: "Auto commit error" and java.io.FileNotFoundException

2008-08-16 Thread Chris Harris
on within Solr alongside the normal analysis and indexing. Autocommit is: 10 180 > Can you try Lucene's CheckIndex tool on it and report what it says? Working on that now. It should take some time, though, due to the index size. > > On Aug 15,

Re: "Auto commit error" and java.io.FileNotFoundException

2008-08-15 Thread Chris Harris
ation, judging from people who reported this problem so far. (http://www.nabble.com/fnm-file-disappear-td1531775.html#a1531775) Again, a CFS index was indeed involved in my case, but my experience comes almost three years after Otis' message... On Fri, Aug 15, 2008 at 10:35 AM, Chris Harr

"Auto commit error" and java.io.FileNotFoundException

2008-08-15 Thread Chris Harris
I have an index (different from the ones mentioned yesterday) that was working fine with 3M docs or so, but when I added a bunch more docs, bringing it closer to 4M docs, the index seemed to get corrupted. In particular, now when I start Solr up, or when when my indexing process tries add a documen

Re: More files in index directory than expected

2008-08-14 Thread Chris Harris
On Thu, Aug 14, 2008 at 2:01 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > Chris Harris <[EMAIL PROTECTED]> wrote: >> It's my understanding that if my mergeFactor is 10, then there >> shouldn't be more than 11 segments in my index directory (10 segment

More files in index directory than expected

2008-08-14 Thread Chris Harris
It's my understanding that if my mergeFactor is 10, then there shouldn't be more than 11 segments in my index directory (10 segments, plus an additional segment if a merge is in progress). It would seem to follow that there shouldn't be more than 11 fdt files, 11 tis files, etc.. However, I'm looki

Re: Vote on a new solr logo

2008-07-22 Thread Chris Harris
How about releasing the preliminary results so we can see if a run-off is in order! On Tue, Jul 22, 2008 at 6:37 AM, Mark Miller <[EMAIL PROTECTED]> wrote: > My opinion: if its already a runaway, we might as well not prolong things. > If not though, we should probably give some time for any possib

Re: Big slowdown with phrase queries

2008-07-16 Thread Chris Harris
one as well. On Thu, Jul 3, 2008 at 3:04 PM, Chris Harris <[EMAIL PROTECTED]> wrote: > I was just running some performance tests against my Solr instance > (using the standard query language), and I discovered a shocking (at > least to me) speed difference between queries involving

Re: Synonyms list breaks solr

2008-07-13 Thread Chris Harris
Matt, If I understand you correctly, then the log you mention is what your servlet container / web server is logging, not what Solr is logging. Solr logging needs to be configured separately. See http://wiki.apache.org/solr/FAQ?highlight=(logging)#head-ffe035452f21ffdb4e4658c2f8f6553bd6ca If

Re: Big slowdown with phrase queries

2008-07-12 Thread Chris Harris
xed the same thing in dtSearch when in fact I've configured them with some wildly different settings.) Is this plausible? On Thu, Jul 3, 2008 at 5:30 PM, Mike Klaas <[EMAIL PROTECTED]> wrote: > > On 3-Jul-08, at 5:13 PM, Chris Harris wrote: > >>> That's pretty much

Re: estimating memory needed for solr instances...

2008-07-10 Thread Chris Harris
I didn't know what option was being referred to here, but I eventually figured it out. If anyone else was confused, the option is called useFilterForSortedQuery, you can set it via solrconfig.xml, and, at least according to Yonik in late 2006, you probably won't want to enable it even if you *do* s

Re: Big slowdown with phrase queries

2008-07-03 Thread Chris Harris
On Thu, Jul 3, 2008 at 4:35 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote: > On Thu, Jul 3, 2008 at 7:05 PM, Chris Harris <[EMAIL PROTECTED]> wrote: >> Ok, I only have one segment right now, so I've got one of each of these: >> >> .tis file: 730MB >> .

Re: Big slowdown with phrase queries

2008-07-03 Thread Chris Harris
u all's platforms) behave differently in this respect. Cheers, Chris > .tis and .frq is used to look up terms and what documents match those > terms. .prx files are used for the term positions in each document. On Thu, Jul 3, 2008 at 3:21 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote: >

Big slowdown with phrase queries

2008-07-03 Thread Chris Harris
I was just running some performance tests against my Solr instance (using the standard query language), and I discovered a shocking (at least to me) speed difference between queries involving phrase queries (i.e. stuff between quotation marks) and ones that don't. For instance, here are some log s

Can I add field compression without reindexing?

2008-06-24 Thread Chris Harris
I have an index that I eventually want to rebuild so I can set compressed=true on a couple of fields. It's not really practical to rebuild the whole thing right now, though. If I change my schema.xml to set compressed=true and then keep adding new data to the existing index, will this corrupt the i

Re: indexing pdf documents

2008-05-12 Thread Chris Harris
Solr does not have this support built in, but there's a patch for it: https://issues.apache.org/jira/browse/SOLR-284 On Mon, May 12, 2008 at 2:02 PM, Cam Bazz <[EMAIL PROTECTED]> wrote: > Hello, > > Before making a little program to extract the txt from my pdfs and feed it > into solr with xml,

Re: Unlimited number of return documents?

2008-05-08 Thread Chris Harris
This is just to satisfy my curiosity, but can you share what your use case is? On Thu, May 8, 2008 at 1:18 PM, Francisco Sanmartin <[EMAIL PROTECTED]> wrote: > What is the value to set to "rows" in solrconfig.xml in order not to have > any limitation about the number of returned documents? I've tr

Re: Caching of DataImportHandler's Status Page

2008-04-24 Thread Chris Harris
MAIL PROTECTED]> wrote: > Chris - what happens if you hit ctrl-R (or command-R on OSX)? That should > bypass the browser cache. > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > - Original Message > > From: Chris Harris &l

Caching of DataImportHandler's Status Page

2008-04-24 Thread Chris Harris
I'm playing with the DataImportHandler, which so far seems pretty cool. (I've applied the latest patch from JIRA to a fresh download of trunk revision 651344. I'm using the basic Jetty setup in the example directory.) The thing that's bugging me is that while the handler's status page (http://local

Re: Highlighting Quoted Phrases

2008-03-26 Thread Chris Harris
On Tue, Mar 25, 2008 at 4:25 PM, Brian Whitman <[EMAIL PROTECTED]> wrote: > > On Mar 25, 2008, at 6:31 PM, Chris Harris wrote: > > > working pretty well, but my testers have > > discovered something they find borderline unacceptable. If they search > &g

Re: Are 1.2 and 1.3/trunk indexes compatible?

2008-03-26 Thread Chris Harris
can't go back from > there. > > so make sure to backup before trying, but it should go smoothly... > > ryan > > > Chris Harris wrote: > > What are the odds that I can plop an index created in Solr 1.2 into a > > Solr 1.3 and/or Solr trunk install and have

Are 1.2 and 1.3/trunk indexes compatible?

2008-03-26 Thread Chris Harris
What are the odds that I can plop an index created in Solr 1.2 into a Solr 1.3 and/or Solr trunk install and have things work correctly? This would be more convenient than reindexing, but I'm wondering how dangerous it is, and hence how much testing is required.

Highlighting Quoted Phrases

2008-03-25 Thread Chris Harris
I'm using the standard Solr query language and the normal highlighting parameters documented at http://wiki.apache.org/solr/HighlightingParameters. Snippet generation and highlighting is working pretty well, but my testers have discovered something they find borderline unacceptable. If they search

Re: Updating and Appending

2008-01-23 Thread Chris Harris
On Jan 23, 2008 9:04 AM, Yonik Seeley <[EMAIL PROTECTED]> wrote: > On Jan 22, 2008 4:10 PM, Owens, Martin <[EMAIL PROTECTED]> wrote: > > We've got some memory constraint worries from using Java RMI, although I > > can see this problem could effect the xml requests too. The Java code > > doesn't s

Re: Out of heap space with simple updates

2008-01-23 Thread Chris Harris
I'm using java -Xms512M -Xmx1500M -jar start.jar which gives the Java VM a min heap of 512MB RAM and a max of 1500MB. I don't know if 1500MB is enough to fix your problem. I do know that when I try to increase it much beyond there using the standard Sun VM on Windows 2003 Server, Java refuses

Re: Transactions and Solr Was: Re: Delte by multiple id problem

2008-01-23 Thread Chris Harris
Suppose I wanted to use this log approach. Does anyone have suggestions about the best way to do it? The approach that first comes to mind is to store the log as a separate DB table, and to maintain that table using a DB trigger attached to the underlying source data table. This is clearly not the

Phrase-based (vs. Word-Based) Proximity Search

2007-11-12 Thread Chris Harris
I gather that the standard Solr query parser uses the same syntax for proximity searches as Lucene, and that Lucene syntax is described at http://lucene.apache.org/java/docs/queryparsersyntax.html#Proximity%20Searches This syntax lets me look for terms that are within x words of each other. The

Re: dataset parameters suitable for lucene application

2007-10-02 Thread Chris Harris
Hi There, Would you mind if I pasted your data onto the wiki page at http://wiki.apache.org/solr/SolrPerformanceData I think it would be helpful to get some more numbers on that page, so people can help decide if Solr is the right application for them. Thanks, Chris Harris, new Solr user On 9

Re: dataset parameters suitable for lucene application

2007-09-26 Thread Chris Harris
By "maxed out" do you mean that Solr's performance became unacceptable beyond 8.8M records, or that you only had 8.8M records to index? If the former, can you share the particular symptoms? On 9/26/07, Charlie Jackson <[EMAIL PROTECTED]> wrote: > My experiences so far with this level of data have

Re: Logging in the example solr+jetty setup

2007-09-23 Thread Chris Harris
-in message to be printed to the console immediately before printing the message actually passed to log.info(), log.warning(), or whatever. On 9/23/07, Chris Harris <[EMAIL PROTECTED]> wrote: > Hi There, > > I'm new to solr, and so far I've been impressed. One thing I'

Logging in the example solr+jetty setup

2007-09-23 Thread Chris Harris
Hi There, I'm new to solr, and so far I've been impressed. One thing I'm curious about, as a newbie, is the source of some of the log messages that show up in the example solr+jetty setup, found in the 1.2 distribution's example directory. I'm seeing two kinds of log messages. First there are one