RE: Sorting performance

2008-10-20 Thread Lance Norskog
Accd to previous posters on this topic, sorting requires an array with an entry per document in the entire index. Each entry has 32 bits for the 'int' type, and 32 bits plus the field representation length for other types. Not knowing Lucene internals I have a hard time believing that it really has

RE: error with delta import

2008-10-16 Thread Lance Norskog
If you make a database view with the query, it is easy to examine the data you want to index. Then, your solr import query would just pull the view. The Solr setup file is much simpler this way. -Original Message- From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:[EMAIL PROTECTED] Sent: Wednesda

RE: Multi-language solr1.3 what would you reckon?

2008-10-14 Thread Lance Norskog
The Distributed Search feature has nothing to do with the MultiCore feature. http://wiki.apache.org/solr/DistributedSearch Distributed Search is a "horizontal partition" on an index schema, meaning there are multiple indexes supplying different rec

RE: Solr has limit to number of returned results?

2008-10-10 Thread Lance Norskog
To select all, do star-colon-star *:* To select a negative clause do *:* AND -clause To select a wildcard, h* and h?* work fine. Star as the only character, or star or ? as the first character are not allowed. These blow up with "too many clauses": H*? and H*H and H*H*. And when they don't bl

RE: Controlling Length of Text Snippets Before and After Highlighted Term

2008-10-03 Thread Lance Norskog
You could handle this problem with an XSL script on the output. It would scan for the highlighting markers and munge the text. I've done a few things with the XsltResponseWriter and I do not envy you this coding task :) but it is possible. http://wiki.apache.org/solr/XsltResponseWriter -Ori

RE: Problem restarting Solr after shutting it down.

2008-10-01 Thread Lance Norskog
We send it a normal kill, wait 30 seconds, then use a "kill -9". This means we tell it to shut down, give it thirty seconds to do whatever it wants to, then forcefully kill it. I'm not sure we have ever seen the first 'normal kill' work, but we do it anyway. Lance -Original Message- From:

RE: Solr Using

2008-09-24 Thread Lance Norskog
Do these JSP pages compile under another servlet container? If the JSP pages have Java .15 or Java 1.6 syntax features, they will not compile under Jboss 4.0.2. The jboss 4.0.2 jsp compiler does the Java 1.4 language. I ran into this problem moving from a new tomcat to an older jboss. -Origin

RE: Snappuller taking up CPU on master

2008-09-24 Thread Lance Norskog
rsync has an option to limit the transfer rate. You give a maximum bandwidth for it to use in the transfer. (Please do not post the same thing if you don't get a response.) -Original Message- From: rahul_k123 [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 24, 2008 10:57 AM To: solr

RE: Some new SOLR features

2008-09-17 Thread Lance Norskog
My vote is for dynamically scanning a directory of configuration files. When a new one appears, or an existing file is touched, load it. When a configuration disappears, unload it. This model works very well for servlet containers. Lance -Original Message- From: [EMAIL PROTECTED] [mailto

RE: Searching for future or "null" dates

2008-09-16 Thread Lance Norskog
If the query stars with a negative clause Lucene returns nothing. endDate[NOW TO *] OR -endDate:[* TO *] Might work -Original Message- From: Kolodziej Christian [mailto:[EMAIL PROTECTED] Sent: Tuesday, September 16, 2008 12:01 AM To: solr-user@lucene.apache.org Subject: AW: Searching f

Solr 1.3 and Lucene 2.4 dev

2008-09-15 Thread Lance Norskog
Is it possible to run Solr 1.3 with Lucene 2.3.2, the last official release of Lucene? We're running into a problem with our very very large index and wonder if there is a bug in the development Lucene. Thanks, Lance Norskog

RE: Adding bias to Distributed search feature?

2008-09-15 Thread Lance Norskog
Distributed search feature? On Thu, Sep 11, 2008 at 10:31 PM, Lance Norskog <[EMAIL PROTECTED]> wrote: > Is it possible to add a bias to the ordering in the distributed search > feature? That is, if the search finds the same content in two > different indexes, it always favors the do

Adding bias to Distributed search feature?

2008-09-11 Thread Lance Norskog
? Thanks, Lance Norskog

RE: AW: Cross-context-forward to solr-instance

2008-09-08 Thread Lance Norskog
You can give a default core set by adding a default parameter to the query in solrconfig.xml. This is hacky, but it gives you a set of cores instead of just one core. -Original Message- From: David Smiley @MITRE.org [mailto:[EMAIL PROTECTED] Sent: Monday, September 08, 2008 7:54 AM To: so

RE: question about + and - in field queries

2008-08-28 Thread Lance Norskog
This is somewhere in the mail archives. The AND/OR/NOT syntax is binary. The +/- syntax is ternary: (+one -two three) means: "must have one, cannot have two, things with three have a higher score". -Original Message- From: Lyman Hurd [mailto:[EMAIL PROTECTED] Sent: Thursday, August 28, 20

RE: How to know if a field is null?

2008-08-25 Thread Lance Norskog
Has this been fixed in solr 1.3? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Monday, August 25, 2008 5:44 AM To: solr-user@lucene.apache.org Subject: Re: How to know if a field is null? On Mon, Aug 25, 2008 at 5:33 AM, Erik Hatcher

RE: How to know if a field is null?

2008-08-23 Thread Lance Norskog
And, a negative query does not work, so if this is the only clause, you have to say: *:* AND -field[* TO *] Where *:* is a special code for "all documents". It's like learning a language: there is the normal grammar, there are the unusual cases, and then there are the bizarre slang expressions.

RE: "Multicore" and snapshooter / snappuller

2008-08-22 Thread Lance Norskog
We looked into this awhile back. Apparently the ZFS (Silicon Graphics originally) is great for really huge files. The Reiser file systems are tuned for many many very small files. (Unfortunately Mr. Hans Reiser is in jail for 15 yrs, but the file systems live on: http://en.wikipedia.org/wiki/Hans

RE: shards and performance

2008-08-21 Thread Lance Norskog
We found that searching by itself was faster with the Distributed multicore search over three cores in the same servlet engine, than one just one core. Faceting and sorting use more memory than simple searches, and we could not do faceting on our one simple index. We needed this for data analysis.

RE: .wsdl for example....

2008-08-18 Thread Lance Norskog
Various Java web service libraries come with 'wsdl2java' and 'java2wsdl' programs. You just run 'java2wsdl' on the Java soap description. -Original Message- From: Ryan McKinley [mailto:[EMAIL PROTECTED] Sent: Monday, August 18, 2008 6:53 PM To: solr-user@lucene.apache.org Subject: Re: .w

RE: Administrative questions

2008-08-13 Thread Lance Norskog
I wrote shell tasks that start, stop, and heartbeat the server and run them from cron (unix). Heartbeat means: 1) is the tomcat even running, 2) does tomcat return the Solr admin page, 3) does Solr return a search. For an indexer, 4) does solr return from a commit. Stopping the server via the tomca

RE: Distributed Search Strategy / Shards

2008-08-08 Thread Lance Norskog
There are some known thread-locking bugs in java 1.5. They are somewhere in the Sun bug list. We switched to 1.6 and our lockup problems went away. Tuning memory use etc. made a big difference before that. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Y

BBC Radio project

2008-08-05 Thread Lance Norskog
http://www.bbc.co.uk/blogs/radiolabs/2008/06/wikipedia_plus_lucene_moreliket his.shtml A nice trick! He put all of Wikipedia directly into Lucene, one document per page, and then he does More-Like-This on it. Chris Sizemore, are you out there?

RE: Vote on a new solr logo

2008-07-31 Thread Lance Norskog
One of Parkinson's Laws is that the most trivial item on the agenda receives the most attention.

RE: Out of memory on Solr sorting

2008-07-29 Thread Lance Norskog
A sneaky source of OutOfMemory errors is the permanent generation. If you add this: -XX:PermSize=64m -XX:MaxPermSize=96m You will increase the size of the permanent generation. We found this helped. Also note that when you undeploy a war file, the old deployment has permanent storage that

RE: nested data structure definition

2008-07-28 Thread Lance Norskog
If you want to think of Solr in database terms, it has only one table. The fields in this table have very flexible type definitions. There can be many optional fields. They also can have various indexes which used together can search text in useful ways. If you want to model multiple tables, you

Simple mistake in Wiki

2008-07-24 Thread Lance Norskog
Should this refer to facet.mincount instead of facet.limit? "The default is true if facet.limit is greater than 0, false otherwise." http://wiki.apache.org/solr/SimpleFacetParameters facet.sort Set to "true", this parameter indicates that constraints should be sorted by their count. If "false

RE: UnicodeNormalizationFilterFactory

2008-06-24 Thread Lance Norskog
ISOLatin1AccentFilterFactory works quite well for us. It solves our basic euro-text keyboard searching problem, where "protege" should find protégé. ("protege" with two accents.) -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 24, 2008 4:05 PM To: sol

RE: scaling / sharding questions

2008-06-13 Thread Lance Norskog
ng the data in a backing store, but are storing all data in the index itself. We have found this "challenging". Cheers, Lance Norskog -Original Message- From: Jeremy Hinegardner [mailto:[EMAIL PROTECTED] Sent: Friday, June 13, 2008 3:36 PM To: solr-user@lucene.apache.org Subject

XSL scripting

2008-06-09 Thread Lance Norskog
This started out in the num-docs thread, but deserves its own. And a wiki page. There is a more complex and general way to get the number of documents in the index. I run a query against solr and postprocess the output with an XSL script. Install this xsl script as home/conf/xslt/numfound.xsl.

RE: Num docs

2008-06-07 Thread Lance Norskog
This appears in the stats.jsp page. Both the total of document 'slots' and the number of live documents. -Original Message- From: Marcus Herou [mailto:[EMAIL PROTECTED] Sent: Saturday, June 07, 2008 2:09 AM To: solr-user@lucene.apache.org Subject: Num docs Hi. Is there a way of retrieve

RE: How to describe 2 entities in dataConfig for the DataImporter?

2008-05-30 Thread Lance Norskog
You might try creating your whole transform as an SQL database view rather than with the Solr transformer toolkit. This would also make it easier to directly examine the data to be indexed. Lance -Original Message- From: Julio Castillo [mailto:[EMAIL PROTECTED] Sent: Thursday, May 29, 20

RE: Announcement of Solr Javascript Client

2008-05-27 Thread Lance Norskog
Nice! Another technique for the denial-of-service problem: you can regulate the number of simultaneous active servlets. Most servlet containers have a configuration for this somewhere. This will slow down legit users but will still avoid killing the server machine. -Original Message- From

RE: Indexing HTML Content

2008-05-22 Thread Lance Norskog
The HTMLStripReader tool worked very well for us. It handles garbled HTML well. The only hole we found was that it does not find alt-text attributes for images. Also, note that this code is written as a Java Reader class rather than a Solr class. This makes it useful for other projects. Given the

RE: SOLR OOM (out of memory) problem

2008-05-21 Thread Lance Norskog
We have had major OOM problems doing facet searches. Having 20 searches at once used up maybe 5G and one faceting request would blow up at 12. More important, when a facet request throws an OOM it seems like the memory is not released. When a normal search throws an OOM, the memory is released and

RE: Solr feasibility with terabyte-scale data

2008-05-09 Thread Lance Norskog
random because MD5 gives such a "perfectly" random hashcode. This should go on a wiki page 'SchemaDesignTips'. Cheers, Lance Norskog

RE: Multiple Index creation

2008-05-07 Thread Lance Norskog
To search against multiple Solrs, you can use http://wiki.apache.org/solr/DistributedSearch in Solr 1.3. This is not tied to the MultiCore feature. -Original Message- From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 06, 2008 9:28 PM To: solr-user@lucene.apache.org

RE: Help optimizing

2008-05-06 Thread Lance Norskog
There are two integer types, 'sint' and 'integer'. On an integer, you cannot do a range check (that makes sense). But! Lucene sort makes an array of integers for every record. On an integer field, it creates an integer array. On any other kind of field, each array item has a lot more. So, if you

RE: Help optimizing

2008-05-06 Thread Lance Norskog
One cause of out-of-memory is multiple simultaneous requests. If you limit the query stream to one or two simultaneous requests, you might fix this. No, Solr does not have an option for this. The servlet containers have controls for this that you have to dig very deep to find. Lance Norskog

MultiCore and Distributed Search

2008-05-01 Thread Lance Norskog
Is Distributed Search () in the main line yet? Is it considered useable? And, how closely does it match the Wiki entry? https://issues.apache.org/jira/browse/SOLR-303 http://wiki.apache.org/solr/DistributedSearch

MultiCore on Wiki

2008-04-30 Thread Lance Norskog
The MultiCore writeup on the Wiki (http://wiki.apache.org/solr/MultiCore) says: ... Configuration->core->dataDir The data directory for a given core. (optional) How can a core not have its own dataDir? What happens if this is not set? Cheers, Lance Norskog

RE: Solr with Auto-suggest

2008-04-25 Thread Lance Norskog
This what the spellchecker does. It makes a separate Lucene index of n-gram letters and searches those. Works pretty well and it is outside the main index. I did an experimental variation indexing word pairs as phrases, and it worked well too. Lance Norskog -Original Message- From: Ryan

Lucene Modules - LucQE [lucky] Lucene Query Expansion Module

2008-04-24 Thread Lance Norskog
http://lucene-qe.sourceforge.net/ This is a much smarter technique for doing query expansion with synonyms, using "Rocchio's Algorithm". Has anyone tried to shoehorn this into Solr? It's a little weird: it needs an analyser, a searcher, and a similarity function. It should be possible to refactor

Meta: Mail quirk of solr-user

2008-04-11 Thread Lance Norskog
Hi- When I reply to a solr-user mail, the To: address is the sender instead of solr-user. Didn't it used to be solr-user? Lance

Facet Query

2008-04-11 Thread Lance Norskog
What do facet queries do that is different from the regular query? What is a use case where I would use a facet.query in addition to the regular query? Thanks, Lance Norskog >From the wiki: http://wiki.apache.org/solr/SimpleFacetParameters#head-529bb9b985632b36cbd46 a37bde9753772e47

RE: synonyms

2008-03-28 Thread Lance Norskog
Lucas- Your examples are Portuguese and Spanish. You might find a Spanish-language stemmer that follows the very rigid conjugation in Spanish (and I'm assuming in Portuguese as well). Spanish follows conjugation rules that embed much more semantics than English, so a huge number of synonyms can b

RE: How to index multiple sites with option of combining results in search

2008-03-26 Thread Lance Norskog
In fact, 55m records works fine in Solr; assuming they are small records. The problem is that the index files wind up in the tens of gigabytes. The logistics of doing backups, snapping to query servers, etc. is what makes this index unwieldy, and why multiple shards are useful. Lance -Origina

RE: stopwords and phrase queries

2008-03-21 Thread Lance Norskog
d Of Music" will not bring up "Sound Of Music!" Cheers, Lance Norskog -Original Message- From: Phillip Farber [mailto:[EMAIL PROTECTED] Sent: Friday, March 21, 2008 11:11 AM To: solr-user@lucene.apache.org Subject: stopwords and phrase queries Am I correct that if I index

RE: Preferential boosting

2008-03-20 Thread Lance Norskog
u, Mar 20, 2008 at 3:13 PM, Lance Norskog <[EMAIL PROTECTED]> wrote: > Suppose I have a schema with an integer field called 'duration'. I > want to find all records, but if the duration is 3 I want those > records to be boosted. > > The index has 10 records, with d

Preferential boosting

2008-03-20 Thread Lance Norskog
cords with duration 3 above the others? These do not work (at least for me): *:* OR duration:3^2.0 duration:[* TO *] duration:3^2.0 duration:3^2.0 OR -duration:3 Thanks, Lance Norskog

RE: sort by index id descending?

2008-03-19 Thread Lance Norskog
... another "magic" field name like "score" ... This could be done with a separate "magic" punctuation like $score, $mean (the mean score), etc.so $docid would work. Cheers, Lance -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 18, 2008 9:01 PM T

Finding an empty field

2008-03-13 Thread Lance Norskog
rBase. java:77) at org.apache.solr.core.SolrCore.execute(SolrCore.java:658) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:1 91) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 159) Cheers, Lance Norskog

Empty fields - dynamic

2008-03-12 Thread Lance Norskog
s input. But is there anything available out of the box? Thanks, Lance Norskog

RE: Use of get instead of post may be causing some problems

2008-03-06 Thread Lance Norskog
I just switched to doing posts for queries. We have a bunch of filters etc. and Solr stopped working on tomcat. -Original Message- From: Benson Margulies [mailto:[EMAIL PROTECTED] Sent: Thursday, March 06, 2008 12:43 PM To: solr-user Subject: Use of get instead of post may be causing som

Fastest Solr query

2008-03-01 Thread Lance Norskog
The fastest solr query I can find is any query on unused dynamic field name: unused_dynamic_field_s:3 Is there another query style that should be faster? See this line in http://wiki.apache.org/solr/SolrConfigXml q=solr&version=2.0&start=0&rows=0 A better ping query would be q=un

RE: what's the schedule of the release of solr 1.3?

2008-03-01 Thread Lance Norskog
An alternative would be for someone to give a subversion checkout number against 1.3-dev which represents a solid working checkout. There are a lot of people using 1.3-dev in production, could you all please tell us what checkout number you are using? Cheers, Lance -Original Message-

RE: escaping special chars in query

2008-02-19 Thread Lance Norskog
You may also use Unicode escapes: \u for example. -Original Message- From: Reece [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 19, 2008 10:04 AM To: solr-user@lucene.apache.org Subject: Re: escaping special chars in query The bottom of the Lucene query syntax page: http://lucene

RE: Questions about filters and scoring

2008-02-18 Thread Lance Norskog
> 3) But then would not 'certificate anystopword found' match your phrase? I wound up making a separate index without stopwords just so that my phrase lookups would work. (I do not have the luxury of re-indexing, so now I'm stuck with this design even if there is a better one.) I also made one w

RE: solr to work for my web application

2008-02-13 Thread Lance Norskog
I strongly recommend that you switch from the latest nightly build to the Solr 1.2 release. Lance -Original Message- From: Thorsten Scherler [mailto:[EMAIL PROTECTED] Sent: Wednesday, February 13, 2008 4:03 AM To: solr-user@lucene.apache.org Subject: Re: solr to work for my web applica

RE: Performance help for heavy indexing workload

2008-02-12 Thread Lance Norskog
1) autowarming: it means that if you have a cached query or similar, and do a commit, it then reloads each cached query. This is in solrconfig.xml 2) sorting is a pig. A sort creates an array of N integers where N is the size of the index, not the query. If the sorted field is anything but an integ

RE: upgrading to lucene 2.3

2008-02-12 Thread Lance Norskog
What will this improve? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Tuesday, February 12, 2008 6:48 AM To: solr-user@lucene.apache.org Subject: Re: upgrading to lucene 2.3 On Feb 12, 2008 9:25 AM, Robert Young <[EMAIL PROTECTED]> wr

RE: range vs. filter queries

2008-02-11 Thread Lance Norskog
Is it not possible to make a grid of your boxes? It seems like this would be a more efficient query: grid:N100_S50_E250_W412 This is how GIS systems work, right? Lance -Original Message- From: Ryan McKinley [mailto:[EMAIL PROTECTED] Sent: Monday, February 11, 2008 6:13 PM To:

RE: Lucene index verifier

2008-02-08 Thread Lance Norskog
7;t tried it, so > I don't know what effect it has on performance/search/indexing. > > -Grant > > > On Feb 7, 2008, at 11:15 PM, Lance Norskog wrote: > > > (Sorry, my Lucene java-user access is wonky.) > > > > I would like to verify that my snapshots are not

Lucene index verifier

2008-02-07 Thread Lance Norskog
amount of time? Thanks, Lance Norskog

RE: Memory improvements

2008-02-07 Thread Lance Norskog
Solr 1.2 has a bug where if you say "commit after N documents" it does not. But it does honor the "commit after N milliseconds" directive. This is fixed in Solr 1.3. -Original Message- From: Sundar Sankaranarayanan [mailto:[EMAIL PROTECTED] Sent: Thursday, February 07, 2008 3:30 PM To:

RE: Query with literal quote character: 6'2"

2008-02-07 Thread Lance Norskog
Some people loathe UTF-8 and do all of their text in XML entities. This might work better for your punctuation needs. But it still won't help you with Prince :) -Original Message- From: Walter Underwood [mailto:[EMAIL PROTECTED] Sent: Thursday, February 07, 2008 9:25 AM To: solr-user@luc

RE: Indexing Japanese & English

2008-02-07 Thread Lance Norskog
Here are the comments for CJKTokenizer. First, is this what you want? Remember, there are three Japanese writing systems. /** * CJKTokenizer was modified from StopTokenizer which does a decent job for * most European languages. It performs other token methods for double-byte * Characters: the

RE: Querying multiple dynamicField

2008-02-04 Thread Lance Norskog
You can use the directive to copy all 'sentence_*' fields into one indexed field. You then have a named field that you can search against. Lance Norskog -Original Message- From: Renaud Delbru [mailto:[EMAIL PROTECTED] Sent: Friday, February 01, 2008 6:48 PM To:

RE: spellcheckhandler

2008-01-30 Thread Lance Norskog
We use Solr 1.2. I copied the 1.2 spellchecker and made an equivalent phrase pair index generator. Using this we can take an example spelling and find example words pairs for each suggestion. We have not deployed this. Lance Norskog -Original Message- From: Mike Klaas [mailto:[EMAIL

Log4j cookbook for request logging

2008-01-25 Thread Lance Norskog
Is it possible to log incoming requests? I'd love to have the incoming IP and request string. What is the exact set of class names for this? Thanks, Lance Norskog

RE: Solr feasibility with terabyte-scale data

2008-01-23 Thread Lance Norskog
We use two indexed copies of the same text, one with stemming and stopwords and the other with neither. We do phrase search on the second. You might use two different OCR implementations and cross-correlate the output. Lance -Original Message- From: Phillip Farber [mailto:[EMAIL PROT

RE: copyField limitation

2008-01-22 Thread Lance Norskog
A more interesting use case: Analyzing text and finding a number, like the mean word length or the mean number of repeated words. These are standard tools for spam detection. To create these, we would want to shovel text into a text processing chain that creates an integer. We then want to both st

RE: copyField limitation

2008-01-21 Thread Lance Norskog
://issues.apache.org/jira/browse/SOLR-464 Thanks for your time, Lance Norskog -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Thursday, January 17, 2008 2:53 PM To: solr-user@lucene.apache.org Subject: Re: copyField limitation On Jan 17, 2008 4:53 PM

RE: solr 1.3

2008-01-21 Thread Lance Norskog
Would somone please consider marking a label on the Subversion repository that says, "This is a clean version"? I only do HTTP requests and have no custom software, so I don't care about internal interfaces changing. Thanks, Lance Norskog -Original Message- From: Mik

Help - corrupted field in index

2008-01-21 Thread Lance Norskog
I have an 'integer' static field in my schema. Some the index for this field is corrupted. When I search on this field it works. When I use this field to sort against, I get this exception. Does this mean that there is a string in one of my entries? It is possible the field was not required or defa

copyField limitation

2008-01-17 Thread Lance Norskog
e describing transformations in directives and that is much too involved. The advantage of using between dissimilar types is that with defauting, you exactly duplicate the information without relying on your feeding software. With 'date' field formula syntax, this is the only way to have duplicate fields for different purposes. Thanks for your time, Lance Norskog

RE: LNS - or - "now i know we've succeeded"

2008-01-14 Thread Lance Norskog
Now that Microsoft is buying FAST (!!) the open source world needs a matching technology :) -Original Message- From: Walter Underwood [mailto:[EMAIL PROTECTED] Sent: Monday, January 14, 2008 7:42 AM To: solr-user@lucene.apache.org Subject: Re: LNS - or - "now i know we've succeeded" Yes

RE: field:(-null) returns records where field was not specified

2008-01-14 Thread Lance Norskog
find records with empty fields. Lance Norskog -Original Message- From: Karen Loughran [mailto:[EMAIL PROTECTED] Sent: Monday, January 14, 2008 7:51 AM To: solr-user@lucene.apache.org Cc: Erick Erickson Subject: Re: field:(-null) returns records where field was not specified Hi Erik, thanks fo

FW: Tomcat and Solr - out of memory

2008-01-08 Thread Lance Norskog
On Tomcat 5.5, an OutOfMemory on a query leaves the server in an OK state, and future queries work. But a facet query that runs out of ram does not free its undone state and all future requests get OutOfMemory. I have not tried the Solr 'luke' handler since it took 5 minutes to run on our index wh

RE: sizing/sanity check for huge(?) dataset

2007-12-28 Thread Lance Norskog
We have maybe 1.2k per record and a large index works fine. You want more RAM and more to the point fast disk I/O for reading. Striped/mirrored the more the better. Giant indexes fall down on sorting. Any sort creates an array with one integer for each record in the index for the field, that's 40

Non-sortable types in sample schema

2007-10-13 Thread Lance Norskog
The sample schema in Solr 1.2 supplies two variants of integers, longs, floats, doubles. One variant is sortable and one is not. What is the point of having both? Why would I choose the non-sorting variants? Do they store fewer bytes per record? Thanks, Lance Norskog

Re: solr tuple/tag store

2007-10-09 Thread Lance Norskog
y small data > sets) rather than selects). > > Without seeing the actual queries that are slow, it's difficult to > determine > what the problem is. Have you tried using EXPLAIN ( > http://dev.mysql.com/doc/refman/5.0/en/explain.html) to check if your > query > is using t

RE: solr tuple/tag store

2007-10-09 Thread Lance Norskog
You did not give your queries. I assume that you are searching against the 'entryID' and updating the tag list. MySQL has a "fulltext" index. I assume this is a KWIC index but do not know. A "fulltext" index on "entryID" should be very very fast since single-record results are what Lucene does be

RE: Spell Check Handler

2007-10-08 Thread Lance Norskog
Great! One comment: if I type a word that happens to be real, it may not be what I actually want. A spell checker should still recommend similar words. Computer programmers are all perfect spellers, and this can blind us to what matters to ordinary people :) Lance Norskog -Original

RE: Merging Fields

2007-10-05 Thread Lance Norskog
A gotcha here is that creates multiple values. Each field copied in becomes a separate field. If you wanted a single-valued field this will not work. Lance Norskog -Original Message- From: Keene, David [mailto:[EMAIL PROTECTED] Sent: Friday, October 05, 2007 10:50 AM To: solr-user

RE: Handling empty query

2007-10-04 Thread Lance Norskog
If a field is required, and always has data, this query will enumerate all documents: field:[* TO *] -Original Message- From: Guangwei Yuan [mailto:[EMAIL PROTECTED] Sent: Thursday, October 04, 2007 3:26 PM To: solr-user@lucene.apache.org Subject: Handling empty query Hi, Does Solr su

RE: how to make sure a particular query is ALWAYS cached

2007-10-04 Thread Lance Norskog
quot; will do nicely. Cheers, Lance Norskog -Original Message- From: Britske [mailto:[EMAIL PROTECTED] Sent: Thursday, October 04, 2007 1:38 PM To: solr-user@lucene.apache.org Subject: Re: how to make sure a particular query is ALWAYS cached hossman wrote: > > > : I want

RE: Searching combined English-Japanese index

2007-10-02 Thread Lance Norskog
Python does not do Unicode strings natively, you have to do them explicitly. It is possible that your python receiver is not doing the right thing with the incoming strings. Also, Jetty has problems with UTF-8; the Wiki has more on this. Lance -Original Message- From: Maximilian Hütter

Questions about unit test assistant TestHarness

2007-10-01 Thread Lance Norskog
bject[])'. Are all of these problems fixed in the Solr 1.3 trunk? Should I just grab whatever's there and use them with 1.2? Thanks, Lance Norskog

RE: Searching combined English-Japanese index

2007-10-01 Thread Lance Norskog
Some servlet containers don't do UTF-8 out of the box. There is information about this on the wiki. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Monday, October 01, 2007 9:45 AM To: solr-user@lucene.apache.org Subject: Re: Searching

RE: Index multiple languages with multiple analyzers with the same field

2007-09-28 Thread Lance Norskog
Other people custom-create a separate dynamic field for each language they want to support. The spellchecker in Solr 1.2 wants just one field to use as its word source, so this fits. We have a more complex version of this problem: we have content with both English and other languages. Searching

RE: dataset parameters suitable for lucene application

2007-09-26 Thread Lance Norskog
My limited experience with larger indexes is: 1) the logistics of copying around and backing up this much data, and 2) indexing is disk-bound. We're on SAS disks and it makes no difference between one indexing thread and a dozen (we have small records). Smaller returns are faster. You need to li

Geographical distance searching

2007-09-26 Thread Lance Norskog
It is a "best practice" to store the master copy of this data in a relational database and use Solr/Lucene as a high-speed cache. MySQL has a geographical database option, so maybe that is a better option than Lucene indexing. Lance (P.s. please start new threads for new topics.) -Original M

RE: Strange behavior when searching with accents

2007-09-21 Thread Lance Norskog
ior when searching with accents On Thu, 2007-09-20 at 11:13 -0700, Lance Norskog wrote: > English and French are messy, so heuristic methods are the only possible. > Spanish is rigorously clean, and stemming should be done from the > declension rules and irregular conjugation tables. Th

RE: Strange behavior when searching with accents

2007-09-20 Thread Lance Norskog
English and French are messy, so heuristic methods are the only possible. Spanish is rigorously clean, and stemming should be done from the declension rules and irregular conjugation tables. This involves large (fast) tables in ram rather than small (slow) string-shuffling. Lance Norskog

"Select distinct" in Solr

2007-09-19 Thread Lance Norskog
I believe I saw in the Javadocs for Lucene that there is the ability to return the unique values for one field for a search, rather than each record. Is it possible to add this feature to Solr? It is the equivalent of 'select distinct' in SQL. Thanks, Lance Norskog

RE: Triggering snapshooter through web admin interface

2007-09-19 Thread Lance Norskog
file name or extension. If a solr command to do a snapshot is implemented, please make sure that it is 100% consistent. Thanks, Lance Norskog -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: Tuesday, September 18, 2007 11:11 AM To: solr-user@lucene.apache.org

Formula for open file descriptors

2007-09-18 Thread Lance Norskog
Hi- In early June Mike Klass posted a formula for the number of file descriptors needed by Solr: For each segment, 7 + num indexed fields per segment. There should be log_{base mergefactor}(numDocs) * mergeFactor segments, approximately. Is this still true? Thanks, Lance

RE: Searching items with in the search results with SOLR

2007-09-18 Thread Lance Norskog
Question: if it is a filter query, it will be cached in the filter query cache? Follow-on questions if this is true: Is this the full results of the filter query? What exactly is cached? Thanks, Lance Norskog -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent

Wiki mistake in using 'curl'

2007-09-15 Thread Lance Norskog
In the wiki are various examples of using 'curl' to post data. Curl requires "-X POST" arguments to do this. The examples do not have this. Also the nice way to post a file to 'curl' is with '-T filename'. Will someone with superpowers please fix? Thanks, Lance Norskog

<    9   10   11   12   13   14   15   >