how to eliminating scoring from a query?

2010-07-14 Thread oferiko
in http://www.lucidimagination.com/files/file/LIWP_WhatsNew_Solr1.4.pdf http://www.lucidimagination.com/files/file/LIWP_WhatsNew_Solr1.4.pdf under the performance it mentions: "Queries that don’t sort by score can eliminate scoring, which speeds up queries" how exactly can i do that? If i don't

How to speed up solr search speed

2010-07-14 Thread marship
Hi. All. I got a problem with distributed solr search. The issue is I have 76M documents spread over 76 solr instances, each instance handles 1M documents. Previously I put all 76 instances on single server and when I tested I found each time it runs, it will take several times, most

Re: Multiple cores or not?

2010-07-14 Thread Otis Gospodnetic
Hello there, I'm guessing the sites will be searched separately. In that case I'd recommend a core for each site. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: "scr...@asia.com" >

Re: Solr index optimizing help

2010-07-14 Thread Otis Gospodnetic
Hi, The difference indicates deletes. Optimize the index (which expunges docs marked as deleted) and the difference disappears. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: Karthik

about warm up

2010-07-14 Thread Li Li
I want to load full text into an external cache, So I added so codes in newSearcher where I found the warm up takes place. I add my codes before solr warm up which is configed in solrconfig.xml like this: ... public void newSearcher(SolrIndexSearcher newSearcher, Sol

Re: Solr index optimizing help

2010-07-14 Thread Karthik K
yeah, that happened :( ,lost lot of data because of it. Can some one explain the terms numDocs and maxDoc ?? will the difference indicate the duplicates?? Thank you, karthik

Re: range faceting with integers

2010-07-14 Thread Chris Hostetter
: Subject: range faceting with integers : References: : In-Reply-To: http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change

Re: How to find first document for the ALL search

2010-07-14 Thread Chris Hostetter
: I have found that this search crashes: : : /solr/select?q=*%3A*&fq=&start=0&rows=1&fl=id Ouch .. that exception is kind of hairy. it suggests that your index may have been corrupted in some way -- do you have nay idea what happened? have you tried using hte CheckIndex tool to see what it s

Re: Strange "the" when search with dismax

2010-07-14 Thread Erick Erickson
If the other suggestions don't work, you need to show us the relevant portions of your schema.xml, and probably query output with &debug=on tacked on... Here are some pointers for getting help... http://wiki.apache.org/solr/UsingMailingLists Best Erick 2010/7/14 Jonathan Rochkind > "the" soun

Re: question on wild card

2010-07-14 Thread Erick Erickson
The best way to understand how things are parsed is to go to the solr admin page (Full interface link?) and click the "debug info" box and submit your query. That'll tell you exactly what happens. Alternatively, you can put &debugQuery=on on your URL... HTH Erick On Wed, Jul 14, 2010 at 8:48 AM,

Re: Using stored terms for faceting

2010-07-14 Thread Chris Hostetter
: is it possible to use the stored terms of a field for a faceted search? No, the only thing stored fields can be used for is document centric opterations (ie: once you have a small set of individual docIds, you can access the stored fields to return to the user, or highlight, etc...) : I mean

Re: stemmed terms and phrases in a combined query

2010-07-14 Thread Chris Hostetter
: My question is how do i query that? : q=text_clean:Nike's new text_orig:"running shoes" : seems like it would work, but not sure its the best way. that's a perfectly good way to do it. : Is there a way i can tell the parser(or extend it) so that every phrase : query it will use one field and f

Re: Solr index optimizing help

2010-07-14 Thread Erick Erickson
Does your schema have a unique id specified? If so, is it possible that you indexed many documents that had the same ID, thus deleting previous documents with the same ID? That would account for it, but it's a shot in the dark... Best Erick On Tue, Jul 13, 2010 at 6:20 AM, Karthik K wrote: > Th

Re: Solr search streaming/callback

2010-07-14 Thread Chris Hostetter
: I was wondering if anyone was aware of any existing functionality where : clients/server components could register some search criteria and be : notified of newly committed data matching the search when it becomes : available you can register a "postCommit" listener in your solrconfig.xml file

Re: Less convoluted way to query for an empty string?

2010-07-14 Thread Lukas Kahwe Smith
On 15.07.2010, at 00:09, Mat Brown wrote: > Hi all, > > I can't seem to find a way to query for an empty string that is > simpler than this: > > field_name:[* to ""] > > Things that don't work: > > field_name:"" > field_name["" TO ""] > > Is the one I'm using the simplest option? If so, is t

Re: csv response writer

2010-07-14 Thread Tommy Chheng
I fixed the path of the queryResponseWriter class in the example solrconfig.xml. This was successfully applied against solr 4.0 trunk. A few quirks: * When I didn't specify a default Delimiter, it printed out null as delimiter. I couldn't figure out why because init(NamedList args)

Less convoluted way to query for an empty string?

2010-07-14 Thread Mat Brown
Hi all, I can't seem to find a way to query for an empty string that is simpler than this: field_name:[* to ""] Things that don't work: field_name:"" field_name["" TO ""] Is the one I'm using the simplest option? If so, is there a particular reason the other ones I mention don't work? Just cur

Re: limiting the total number of documents matched

2010-07-14 Thread Paul
I thought of another way to do it, but I still have one thing I don't know how to do. I could do the search without sorting for the 50th page, then look at the relevancy score on the first item on that page, then repeat the search, but add score > that relevancy as a parameter. Is it possible to do

Re: limiting the total number of documents matched

2010-07-14 Thread Paul
I was hoping for a way to do this purely by configuration and making the correct GET requests, but if there is a way to do it by creating a custom Request Handler, I suppose I could plunge into that. Would that yield the best results, and would that be particularly difficult? On Wed, Jul 14, 2010

setting up clustering

2010-07-14 Thread Justin Lolofie
I'm trying to enable clustering in solr 1.4. I'm following these instructions: http://wiki.apache.org/solr/ClusteringComponent However, `ant get-libraries` fails for me. Before it tries to download the 4 jar files, it tries to compile lucene? Is this necessary? Has anyone gotten clustering worki

RE: limiting the total number of documents matched

2010-07-14 Thread Nagelberg, Kallin
So you want to take the top 1000 sorted by score, then sort those by another field. It's a strange case, and I can't think of a clean way to accomplish it. You could do it in two queries, where the first is by score and you only request your IDs to keep it snappy, then do a second query against

limiting the total number of documents matched

2010-07-14 Thread Paul
I'd like to limit the total number of documents that are returned for a search, particularly when the sort order is not based on relevancy. In other words, if the user searches for a very common term, they might get tens of thousands of hits, and if they sort by "title", then very high relevancy d

Multiple cores or not?

2010-07-14 Thread scrapy
Hi, We are planning to host on same server different website that will use solr. What will be the best? One core with a field i schema: site1, site2 etc... and then add this in every query Or one core per site? Thanks for your help

Re: Using hl.regex.pattern to print complete lines

2010-07-14 Thread Peter Spam
Any other thoughts, Chris? I've been messing with this a bit, and can't seem to get (?m)^.*$ to do what I want. 1) I don't care how many characters it returns, I'd like entire lines all the time 2) I just want it to always return 3 lines: the line before, the actual line, and the line after. 3

Re: date boosting and dismax

2010-07-14 Thread Jonathan Rochkind
Shawn Heisey wrote: [* TO NOW-2YEARS]^1.0 I also seem to remember seeing something about how to do "less than" in range queries as well as the "less than or equal to" implied by the above, but I cannot find it now. Ranges with square brackets [] are inclusive. Ranges with parens () are

Re: dismax and date boosts

2010-07-14 Thread Shawn Heisey
I have finally figured out how to turn this off in Thunderbird 3: Go to Tools, Options, Display, and turn off "Display emoticons as graphics". On 4/12/2010 12:04 PM, Shawn Heisey wrote: On 4/12/2010 11:55 AM, Shawn Heisey wrote: [NOW-6MONTHS TO NOW]^5.0 , [NOW-1YEARS TO NOW-6MONTHS]^3.0 [NO

Re: Foreign characters question

2010-07-14 Thread Robert Muir
On Wed, Jul 14, 2010 at 12:59 PM, Blargy wrote: > > Nevermind. Apparently my IDE (Netbeans) was set to "No encoding"... wtf. > Changed it to UTF-8 and recreated the file and all is good now. Thanks! > > fyi I created an issue with your example here: https://issues.apache.org/jira/browse/SOLR-2003

RE: date boosting and dismax

2010-07-14 Thread Tim Gilbert
Re: flexibility. This boost does decays over time, the further it gets from now the less of a boost it receives. You are right though, it doesn't allow a fine degree of control, particularly if you don't want to smoothly decay the boost. I hadn't considered your suggestion, so I'll keep it in mi

Re: date boosting and dismax

2010-07-14 Thread Shawn Heisey
One of the replies I got on a previous thread mentioned range queries, with this example: [NOW-6MONTHS TO NOW]^5.0 , [NOW-1YEARS TO NOW-6MONTHS]^3.0 [NOW-2YEARS TO NOW-1YEARS]^2.0 [* TO NOW-2YEARS]^1.0 Something like this seems more flexible, and into it, I read an implication that the perform

Re: Foreign characters question

2010-07-14 Thread Blargy
Nevermind. Apparently my IDE (Netbeans) was set to "No encoding"... wtf. Changed it to UTF-8 and recreated the file and all is good now. Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Foreign-characters-question-tp964078p967058.html Sent from the Solr - User mailing

Re: Foreign characters question

2010-07-14 Thread Blargy
How can I tell and/or create a UTF-8 synonyms file? Do I have to instruct solr that this file is UTF-8? -- View this message in context: http://lucene.472066.n3.nabble.com/Foreign-characters-question-tp964078p967037.html Sent from the Solr - User mailing list archive at Nabble.com.

RE: date boosting and dismax

2010-07-14 Thread Tim Gilbert
I used this before my search term and it works well: {!boost b=recip(ms(NOW,publishdate),3.16e-11,1,1)} Its enough that when I search for *:* the articles appear in chronological order. Tim -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Wednesday, July 14, 2010

date boosting and dismax

2010-07-14 Thread Shawn Heisey
I've started a couple of previous threads on this topic, but I did not have a good date field in my index to use at the time. I now have a schema with the document's post_date in tdate format, so I would like to actually do some implementation. Right now, we are not doing relevancy ranking a

Re: Foreign characters question

2010-07-14 Thread Robert Muir
is your synonyms file in UTF-8 encoding? On Wed, Jul 14, 2010 at 11:11 AM, Blargy wrote: > > Thanks for the reply but that didnt help. > > Tomcat is accepting foreign characters but for some reason when it reads > the > synonyms file and it encounters that character ñ it doesnt appear correctly

RE: Foreign characters question

2010-07-14 Thread Blargy
Thanks for the reply but that didnt help. Tomcat is accepting foreign characters but for some reason when it reads the synonyms file and it encounters that character ñ it doesnt appear correctly in the Field Analysis admin. It shows up as �. If I query exactly for ñ it will work but the synonyms

AW: MultiValue dynamicField and copyField

2010-07-14 Thread Jan Simon Winkelmann
I figured out where the problem was. The destination wildcard was actually matching the wrong field. I changed the fieldnames around a bit and now everything works fine. Thanks! > -Ursprüngliche Nachricht- > Von: kenf_nc [mailto:ken.fos...@realestate.com] > Gesendet: Mittwoch, 14. Juli 2

DIH: post-delta-import DB cleanup hook?

2010-07-14 Thread Joachim M
I'm updating my solr index using a "queue" table in my database. When records get updated, a row gets inserted into the queue table with pk, timestamp, deleted flag, and status. DIH made it easy to use this to identify new/udpated recods as well as deletes. I need to do some post processing how

Re: Strange "the" when search with dismax

2010-07-14 Thread Jonathan Rochkind
"the" sounds like it might be a stopword. Are you using stopwords in any of your fields covered by the dismax search? But not in some of the other fields covered by dismax? the combination of dismax and stopwords can result in unexpected behavior if you aren't careful. I wrote about this a bit her

Re: MultiValue dynamicField and copyField

2010-07-14 Thread kenf_nc
Yep, my schema does this all day long. -- View this message in context: http://lucene.472066.n3.nabble.com/MultiValue-dynamicField-and-copyField-tp965941p966536.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Strange "the" when search with dismax

2010-07-14 Thread kenf_nc
Sounds like you want the 'text' fieldType (or equivalent) and are using 'string' or 'lowercase'. Those must match all exactly (well, case insensitively in the case of 'lowercase'). The TextType field types (like 'text') do tokenizations so matches will occur under many more conditions. -- View t

RE: DataImporter

2010-07-14 Thread Amdebirhan, Samson, VF-Group
Hi Bilgin It's right I have the same primary key, but testing with the property "preImportDeleteQuery" into the tag entity of the data_config.xml. So now it is working in fact it deletes only the indexs/docs for which I make the full-import based on the field I decleare for the preImportDelete

question on wild card

2010-07-14 Thread Mark N
I have a database field = hello world and i am indexing to *text* field with standard analyzer ( text is a copy field of solr) Now when user gives a query text:"hello world%" , how does the query is interpreted in the background are we actually searchingtext: hello OR text: world%(

Re: DataImporter

2010-07-14 Thread Bilgin Ibryam
Is it possible that you have the same IDs in both entities? Could you show here your entity mappings? Bilgin Ibryam On Wed, Jul 14, 2010 at 11:48 AM, Amdebirhan, Samson, VF-Group < samson.amdebir...@vodafone.com> wrote: > Hi all, > > > > Can someone help me in this ? > > > > Importing 2 differen

DataImporter

2010-07-14 Thread Amdebirhan, Samson, VF-Group
Hi all, Can someone help me in this ? Importing 2 different entities one by one (specifying through the entity parameter) why is the second import deleting the previous created index for first entity and vice-versa? The documentation provided by the solr website reports that : "enti

Re: Cache full text into memory

2010-07-14 Thread findbestopensource
I doubt about it. Caching system is a key value store. You have to use some compression library to compress and decompress your data. Caching system helps to retrieve fast. Anyways please take a look of each of the caching system features. Regards Aditya www.findbestopensource.com On Wed, Jul 1

Re: ShingleFilter failing with more terms than index phrase

2010-07-14 Thread Ethan Collins
> Trying to analyze PositionFilter: didn't understand why earlier the > search of 'Nina Simone I Put' failed since atleast the phrase 'Nina > Simone' should have matched against title_0 field. Any clue? Please note that I have configure the ShingleFilter as bigrams without unigrams. [Honestly, I

Re: Cache full text into memory

2010-07-14 Thread Li Li
Thank you. I don't know which cache system to use. In my application, the cache system must support compression algorithm which has high compression ratio and fast decompression speed(because each time it get from cache, it must decompress). 2010/7/14 findbestopensource : > I have just provided yo

Re: ShingleFilter failing with more terms than index phrase

2010-07-14 Thread Ethan Collins
Hi Steve, Thanks, wrapping with PositionFilter actually worked the search and score -- I made a mistake while re-indexing last time. Trying to analyze PositionFilter: didn't understand why earlier the search of 'Nina Simone I Put' failed since atleast the phrase 'Nina Simone' should have matched

MultiValue dynamicField and copyField

2010-07-14 Thread Jan Simon Winkelmann
Hi everyone, i was wondering if the following was possible somehow: As in: using copyField to copy a multiValued field into another multiValued field. Cheers, Jan

Re: Ranking position in solr

2010-07-14 Thread Ahmet Arslan
> I sent this command: curl http://localhost:8081/solr/update -F stream.body=' > ', but it doesn't reload. > > It doesn't reload automatically after every commit or > optimize unless I add > new document then i commit. Hmm. May be there is an easier way to force it? (add empty/dummy doc) But if y

Re: Cache full text into memory

2010-07-14 Thread findbestopensource
I have just provided you two options. Since you already store as part of the index, You could try external caching. Try using ehcache / Membase http://www.findbestopensource.com/tagged/distributed-caching . The caching system will do LRU and is much more efficient. On Wed, Jul 14, 2010 at 12:39 PM

Re: ShingleFilter failing with more terms than index phrase

2010-07-14 Thread Ethan Collins
Hi Steve, Thanks for your kind response. I checked PositionfilterFactory (re-index as well) but that also didn't solve the problem. Interesting the problem is not reproduceable from Solr's Field Analysis page, it manifests only when it's in a query. I guess the subject for this post is not very c

Re: Cache full text into memory

2010-07-14 Thread Li Li
I have already store it in lucene index. But it is in disk and When a query come, it must seek the disk to get it. I am not familiar with lucene cache. I just want to fully use my memory that load 10GB of it in memory and a LRU stragety when cache full. To load more into memory, I want to compress

Re: Ranking position in solr

2010-07-14 Thread Chamnap Chhorn
I sent this command: curl http://localhost:8081/solr/update -F stream.body=' ', but it doesn't reload. It doesn't reload automatically after every commit or optimize unless I add new document then i commit. Any idea? On Tue, Jul 13, 2010 at 4:54 PM, Ahmet Arslan wrote: > > I'm using solr 1.4 a

Re: Cache full text into memory

2010-07-14 Thread findbestopensource
You have two options 1. Store the compressed text as part of stored field in Solr. 2. Using external caching. http://www.findbestopensource.com/tagged/distributed-caching You could use ehcache / Memcache / Membase. The problem with external caching is you need to synchronize the deletions and