Re: Using Customized sorting in Solr

2012-04-30 Thread solr user
Hi, Any suggestions, Am I trying to do too much with solr? Is there any other search engine, which should be used here? I am looking into solr codebase and planning to modify QueryComponent. Will this be the right approach? Regards, Shivam On Fri, Apr 27, 2012 at 10:48 AM, solr user

solr.WordDelimiterFilterFactory query time

2012-04-30 Thread abhayd
hi I am using solr.WordDelimiterFilterFactory for a text_en field during query time. my title for document is: blackberry torch 9810 My query : torch9810 works fine It splits alpha numeric and gets me the document. But when query is:blackberry9810 it splits to blackberry 9810 but I dont get

Re: Does Solr fit my needs?

2012-04-30 Thread G.Long
Hi :) Thank you all for your answers. I'll try these solutions :) Kind regards, Gary Le 27/04/2012 16:31, G.Long a écrit : Hi there :) I'm looking for a way to save xml files into some sort of database and i'm wondering if Solr would fit my needs. The xml files I want to save have a lot of

Re: Java out of memory - with fieldcache faceting

2012-04-30 Thread Dan Tuffery
You need to add more memory to the JVM that is running Solr: http://wiki.apache.org/solr/SolrPerformanceFactors#OutOfMemoryErrors Dan On Mon, Apr 30, 2012 at 9:43 AM, Yuval Dotan yuvaldo...@gmail.com wrote: Hi Guys I have a problem and i need your assistance I get an exception when doing

Lucene FieldCache - Out of memory exception

2012-04-30 Thread Rahul R
Hello, I am using solr 1.3 with jdk 1.5.0_14 and weblogic 10MP1 application server on Solaris. I use embedded solr server. More details : Number of docs in solr index : 1.4 million Physical size of index : 640MB Total number of fields in the index : 700 (99% of these are dynamic fields) Total

Re: Weird query results with edismax and boolean operator +

2012-04-30 Thread Vadim Kisselmann
Hi Jan, thanks for your response! My qf parameter for edismax is: title. My defaultSearchField=text in schema.xml. In my app i generate a query with qf=title,text, so i think the default parameters in config/schema should bei overridden, right? I found eventually 2 reasons for this behavior. 1.

Re: Dynamic creation of cores for this use case.

2012-04-30 Thread pprabhcisco123
Thanks kuli, for your response. We tried to implement as per the instruction. But the problem again is how to create index for every thirty customers sepertaley. is there any programmatic way out to do or do we need to create query in configuration file. Thanks Prabakarab.P -- View this

Re: How do create dynamic core using SOLRJ

2012-04-30 Thread ayyappan
It is seems to be working fine . But i have few question abt indexing 1)i want do index to each customer as well as partner. 2 )how do i create index to each partner (30 customers) ? As of now i am index all customer using data-config.xml document name=content entity name=customertable

Saravanan Chinnadurai/Actionimages is out of the office.

2012-04-30 Thread Saravanan . Chinnadurai
I will be out of the office starting 30/04/2012 and will not return until 01/05/2012. Please email to itsta...@actionimages.com for any urgent issues. Action Images is a division of Reuters Limited and your data will therefore be protected in accordance with the Reuters Group Privacy / Data

Re: Java out of memory - with fieldcache faceting

2012-04-30 Thread Yuval Dotan
Thanks for the fast answer One more question: Is there a way to know (some formula) what is the size of memory i need for these actions? Thanks Yuval On Mon, Apr 30, 2012 at 11:50, Dan Tuffery dan.tuff...@gmail.com wrote: You need to add more memory to the JVM that is running Solr:

FW: Unsubscribe does not appear to be working

2012-04-30 Thread Kevin Bootz
I continue to receive posts from the solr group even after submitting an unsubscribe per the instructions from the ezmlm app. Is there perhaps a delay after I confirm the unsubscribe request? 14 posts received so far today. At this point I have a delete rule to auto trash any received but

FW: unsubscribe

2012-04-30 Thread Kevin Bootz
BTW, The first request to unsubscribe was sent in February if that helps track this down Thx From: Kevin Bootz Sent: Friday, February 24, 2012 7:55 AM To: 'solr-user-uc.1330079879.acnmkgjcnnlfgdhmmlkn-kbootz=caci@lucene.apache.org' Subject: unsubscribe

Re: Java out of memory - with fieldcache faceting

2012-04-30 Thread Dan Tuffery
There's a Lucene/Solr memory size estimator spreadsheet in the SVN: http://svn.apache.org/repos/asf/lucene/dev/trunk/dev-tools/size-estimator-lucene-solr.xls Dan On Mon, Apr 30, 2012 at 11:39 AM, Yuval Dotan yuvaldo...@gmail.com wrote: Thanks for the fast answer One more question: Is there

Re: Weird query results with edismax and boolean operator +

2012-04-30 Thread Vadim Kisselmann
I tested it. With default qf=title text in solrconfig and mm=100% i get the same result(1) for nascar AND author:serg* and +nascar +author:serg*, great. With nascar +author:serg* i get 3500 matches, in this case the mm-parameter seems not to work. Here are my debug params for nascar AND

Re: commit fail

2012-04-30 Thread Erick Erickson
In the 3.6 world, LukeRequestHandler does some...er...really expensive things when you click into the admin/schema browser. This is _much_ better in trunk BTW. So, as Yonik says, LukeRequestHandler probably accounts for one of the threads. Does this occur when nobody is playing around with the

Re: change index/store at indexing time

2012-04-30 Thread Erick Erickson
Your idea of using a Transformer will work just fine, you have a lot more flexibility in a custom Transformer, see: http://wiki.apache.org/solr/DIHCustomTransformer You could also write a custom update handler that examined the document on the server side and implemented your logic, or even just

Re: Scaling Solr - Suggestions !!

2012-04-30 Thread Erick Erickson
I'd get to the root of why indexes are corrupt! This should be very unusual. If you're seeing this at all frequently, it indicates something is very wrong and starting bunches of JVMs up is a band-aid over a much more serious problem. Are you, by chance, doing a kill -9? or other hard-abort?

Re: Using Customized sorting in Solr

2012-04-30 Thread Erick Erickson
Consider writing a custom sort method or a custom function that you use for sorting. Be _very_ careful that anything you do here is very efficient, it'll be called a _lot_. Best Erick On Mon, Apr 30, 2012 at 2:10 AM, solr user solr.user...@gmail.com wrote: Hi, Any suggestions, Am I trying

Re: solr.WordDelimiterFilterFactory query time

2012-04-30 Thread Erick Erickson
Try attaching debugQuery=on to your query and seeing if that helps you understand what's going on. If that doesn't help, also look at admin/analysis. If all that doesn't help, post your schema definition for the field type and the results of debugQuery=on (you might look at:

Solr: extracting/indexing HTML via cURL

2012-04-30 Thread okayndc
Hello, Over the weekend I experimented with extracting HTML content via cURL and just wondering why the extraction/indexing process does not include the HTML tags. It seems as though the HTML tags either being ignored or stripped somewhere in the pipeline. If this is the case, is it possible to

Re: solr.WordDelimiterFilterFactory query time

2012-04-30 Thread Jack Krupansky
When WDF filters blackberry9810 it will treat it as a sequence of tokens but as if it were a phrase, like blackberry 9810, with the two terms adjacent, at least with the edismax query parser. I'm not sure what the other query parsers do. If you are using edismax, you can set the QS (query

Re: Java out of memory - with fieldcache faceting

2012-04-30 Thread Otis Gospodnetic
Hi, Tell us more about: * what you facet on * how many facet values are in each facet * how much RAM you have * 32 or 64 bit * -Xmx you are using * faceting method you are using * ... Otis  Performance Monitoring for Solr - http://sematext.com/spm/solr-performance-monitoring

Re: solr.WordDelimiterFilterFactory query time

2012-04-30 Thread abhayd
hi Erick, autoGeneratePhraseQueries=false is set for field type. And it works fine for standard query parser. Problem seem to be when i start using dismax. As u suggested i checked analysis tool and even after word delimiter is applied i see search term as blackberry 9801 so i dont think it

Re: solr.WordDelimiterFilterFactory query time

2012-04-30 Thread Erick Erickson
See Jack's comments about phrases, all your parsed queries are phrases, and your indexed terms aren't next to each other. Best Erick On Mon, Apr 30, 2012 at 10:54 AM, abhayd ajdabhol...@hotmail.com wrote: hi Erick, autoGeneratePhraseQueries=false is set for field type. And it works fine for

Newbie question on sorting

2012-04-30 Thread Jacek
Hello all, I'm facing this simple problem, yet impossible to resolve for me (I'm a newbie in Solr). I need to sort the results by score (it is simple, of course), but then what I need is to take top 10 results, and re-order it (only those top 10 results) by a date field. It's not the same as

Re: solr.WordDelimiterFilterFactory query time

2012-04-30 Thread Jack Krupansky
The qs=1 request parameter should work for the dismax query parser as well as edismax. -- Jack Krupansky -Original Message- From: Erick Erickson Sent: Monday, April 30, 2012 10:58 AM To: solr-user@lucene.apache.org Subject: Re: solr.WordDelimiterFilterFactory query time See Jack's

Re: solr.WordDelimiterFilterFactory query time

2012-04-30 Thread abhayd
hi jack erick, Thanks I do have qs set in solrconfig for query handler dismax settings. str name=qs10/str Still does not work abhay -- View this message in context: http://lucene.472066.n3.nabble.com/solr-WordDelimiterFilterFactory-query-time-tp3950045p3951038.html Sent from the Solr - User

Re: Solr: extracting/indexing HTML via cURL

2012-04-30 Thread Jack Krupansky
If by extracting HTML content via cURL you mean using SolrCell to parse html files, this seems to make sense. The sequence is that regardless of the file type, each file extraction parser will strip off all formatting and produce a raw text stream. Office, PDF, and HTML files are all treated

Re: solr.WordDelimiterFilterFactory query time

2012-04-30 Thread abhayd
hi jack, tried qs=10 but unfortunately it does not seem to help. Not sure what else could be wrong abhay -- View this message in context: http://lucene.472066.n3.nabble.com/solr-WordDelimiterFilterFactory-query-time-tp3950045p3951082.html Sent from the Solr - User mailing list archive at

Re: solr.WordDelimiterFilterFactory query time

2012-04-30 Thread abhayd
hi jack, tried qs=10 but unfortunately it does not seem to help. Not sure what else could be wrong abhay -- View this message in context: http://lucene.472066.n3.nabble.com/solr-WordDelimiterFilterFactory-query-time-tp3950045p3951083.html Sent from the Solr - User mailing list archive at

Re: solr.WordDelimiterFilterFactory query time

2012-04-30 Thread Jack Krupansky
Just to be clear, I used the Solr example schema and indexed two test documents, one with Blackberry 9810 and one with Blackberry torch 9810 in the sku field (which uses field type text_en_splitting_tight which uses WDF) and the following query returns both documents:

Re: change index/store at indexing time

2012-04-30 Thread Vazquez, Maria (STM)
Thanks Erick. I'm not concerned about the logic, all I want to achieve is sometimes storing/indexing a multi-valued field and sometimes not (same field with same name) based on some logic. In a transformer I cannot change the schema dynamically to do that, not that I know of at least. So if I

Re: Scaling Solr - Suggestions !!

2012-04-30 Thread Sujatha Arun
I was copying the indexes from webapp to cores ,when this happened .It could have been an error from my end ,but just worried that an issue with one core would reflect on webapp . Regards Sujatha On Mon, Apr 30, 2012 at 7:20 PM, Erick Erickson erickerick...@gmail.comwrote: I'd get to the root

Solr logo for print

2012-04-30 Thread Otis Gospodnetic
Hi, I'm trying to find a Solr logo in a vector or some other format suitable for print.  I found Lucene logo at http://svn.apache.org/repos/asf/lucene/site/publish/images/logo.eps , but can't find one for Solr.  Does anyone know where to find it? At the bottom of  

Re: change index/store at indexing time

2012-04-30 Thread Erick Erickson
OK, I took another look at what you were trying to accomplish and, I find the use-case kind of hard to figure out, but that's my problem G. But it is true that there's really no good way to _change_ the way the field is analyzed in Solr. Of course since Solr is built on Lucene, you could to a lot

Re: Ampersand issue

2012-04-30 Thread William Bell
One idea was to wrap the field with CDATA. Or base64 encode it. On Fri, Apr 27, 2012 at 7:50 PM, Bill Bell billnb...@gmail.com wrote: We are indexing a simple XML field from SQL Server into Solr as a stored field. We have noticed that the amp; is outputed as amp;amp; when using wt=XML.

Re: change index/store at indexing time

2012-04-30 Thread Lee Carroll
Vazquez, Sorry I don't have an answer but I'd love to know what you need this for :-) I think the logic is going to have to bleed into your search app. In short copy field and your app knows which to search in. lee c On 30 April 2012 20:41, Erick Erickson erickerick...@gmail.com wrote: OK, I

Re: Solr logo for print

2012-04-30 Thread Dan Tuffery
Try this one: http://www.lucidimagination.com/sites/default/files/image/solr_logo_rgb.png Dan On Mon, Apr 30, 2012 at 8:38 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, I'm trying to find a Solr logo in a vector or some other format suitable for print. I found Lucene logo at

post.jar failing

2012-04-30 Thread William Bell
I am getting a post.jar failure when trying to post the following CDATA field... It used to work on older versions. This is in SOlr 3.6. add doc field name=idSP2514N/field field name=nameSamsung SpinPoint P120 SP2514N - hard drive - 250 GB - ATA-133/field field name=manuSamsung Electronics

Re: Solr logo for print

2012-04-30 Thread Lukáš Vlček
Otis, I think there was some JIRA ticket (Logo contents or something like that) which might have all the logo proposals, including the winning one, attached. Regards, Lukas On Mon, Apr 30, 2012 at 9:38 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, I'm trying to find a Solr

RE: CJKBigram filter questons: single character queries, bigrams created across sript/character types

2012-04-30 Thread Burton-West, Tom
Thanks wunder and Lance, In the discussions I've seen of Japanese IR in the English language IR literature, Hiragana is either removed or strings are segmented first by character class. I'm interested in finding out more about why bigramming across classes is desirable. Based on my limited

correct XPATH syntax

2012-04-30 Thread Twomey, David
Is this possible in DataImportHandler I want the following XML to all collapse into one Author field AuthorList CompleteYN=Y Author ValidYN=Y LastNameSørlie/LastName ForeNameT/ForeName InitialsT/Initials /Author Author ValidYN=Y LastNamePerou/LastName ForeNameC M/ForeName

Re: solr.WordDelimiterFilterFactory query time

2012-04-30 Thread abhayd
hi jack, thanks, i figured out the issue. It was settings during query and index time -- View this message in context: http://lucene.472066.n3.nabble.com/solr-WordDelimiterFilterFactory-query-time-tp3950045p3951811.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: correct XPATH syntax

2012-04-30 Thread Twomey, David
Sorry hit send too soon. Continued the email below On 4/30/12 4:46 PM, Twomey, David david.two...@novartis.com wrote: Is this possible in DataImportHandler I want the following XML to all collapse into one mult-valued Author field AuthorList CompleteYN=Y Author ValidYN=Y

Upgrading to 3.6 broke cachedsqlentityprocessor

2012-04-30 Thread Brent Mills
I've read some things in jira on the new functionality that was put into caching in the DIH but I wouldn't think it should break the old behavior. It doesn't look as though any errors are being thrown, it's just ignoring the caching part and opening a ton of connections. Also I cannot find

Re: Solr logo for print

2012-04-30 Thread Otis Gospodnetic
Thanks Lukas.  Yeah, I looked there, but as far as I can tell, all attachments are PNGs/GIFs/JPGs :( Otis   Performance Monitoring for Solr - http://sematext.com/spm/index.html From: Lukáš Vlček lukas.vl...@gmail.com To: solr-user@lucene.apache.org; Otis

Re: CJKBigram filter questons: single character queries, bigrams created across sript/character types

2012-04-30 Thread Walter Underwood
You'll see katakana used with kanji in noun compounds where one of the words is foreign. In Japanese, Rice University is not written with the kanji word for rice. They use katakana for rice and kanji for university, like this: ライス大学. This is very common. I expect that President Obama uses

Re: solr.WordDelimiterFilterFactory query time

2012-04-30 Thread Jack Krupansky
Great. But could you tell us all what settings you had wrong and how you changed them so that somebody else with the problem searching the email archive will be able to see your solution? Thanks. -- Jack Krupansky -Original Message- From: abhayd Sent: Monday, April 30, 2012 4:51 PM

Setting margin in SimpleFragListBuilder from solrconfig

2012-04-30 Thread tobias roth
Hi, Can I set the constructor parameter margin of SimpleFragListBuilder from within solrconfig.xml? I would suspect that something has to be added to this configuration element in solrconfig.xml: fragListBuilder name=simple default=true

Proper commit behavior in a multi-writer environment

2012-04-30 Thread Adam Fields
If I have 40 writers all feeding the same index, do they all have to commit, or just one of them? Am I going to kill performance if they're all issuing individual commits, or would it be better to not have the individual writers commit at all and just have one process that does nothing but

Re: Solr: extracting/indexing HTML via cURL

2012-04-30 Thread okayndc
Great, thank you for the input. My understanding of HTMLStripCharFilter is that it strips HTML tags, which is not what I want ~ is this correct? I want to keep the HTML tags intact. On Mon, Apr 30, 2012 at 11:55 AM, Jack Krupansky j...@basetechnology.comwrote: If by extracting HTML content

Re: Proper commit behavior in a multi-writer environment

2012-04-30 Thread Otis Gospodnetic
Adam, This is where autocommit (see solrconfig.xml) comes in handy.  Don't have them all commit, no. :) Otis  Performance Monitoring for Solr - http://sematext.com/spm/index.html From: Adam Fields fie...@street86.com To: solr-user@lucene.apache.org

Re: correct XPATH syntax

2012-04-30 Thread Twomey, David
Answering my own question: I think I can do this by writing a script that concats the Lastname, Forname and Initials and adding that to xpath = /AuthorList/Author Yes? On 4/30/12 4:49 PM, Twomey, David david.two...@novartis.com wrote: Sorry hit send too soon. Continued the email below On

Re: extracting/indexing HTML via cURL

2012-04-30 Thread Jack Krupansky
I was thinking that you wanted to index the actual text from the HTML page, but have the stored field value still have the raw HTML with tags. If you just want to store only the raw HTML, a simple string field is sufficient, but then you can't easily do a text search on it. Or, you can have

core sleep/wake

2012-04-30 Thread oferiko
I have a multicore solr with a lot of cores that contains a lot of data (~50M documents), but are rarely used. Can i load a core from configuration, but have keep it in sleep mode, where is has all the configuration available, but it hardly consumes resources, and based on a query or an update, it

Re: dynamically create unique key

2012-04-30 Thread solr_noob
Hello Christopher I ran into the same problem. When I disable dedupe from the update handler, things worked fine. The problem is when i enable dedupe that I run into the multivalued error. I'm also using SolJ to add documents. Were you able to resolve this? If so, would you kindly post your

RE: CJKBigram filter questons: single character queries, bigrams created across sript/character types

2012-04-30 Thread Burton-West, Tom
Thanks wunder, I really appreciate the help. Tom

Re: Solr logo for print

2012-04-30 Thread Chris Hostetter
http://svn.apache.org/viewvc?rev=1332444view=rev : At the bottom of  http://wiki.apache.org/solr/PublicServers I found a : link : to https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/site/src/documentation/content/xdocs/images/ , : but that leads to 404. fixed. -Hoss

Re: Weird query results with edismax and boolean operator +

2012-04-30 Thread Jan Høydahl
Hi, I see that you have already commented on SOLR-2649 MM ignored in edismax queries with operators. So let's continue the way towards resolution there... -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 30. apr. 2012, at 14:28,

Re: hierarchical faceting?

2012-04-30 Thread Chris Hostetter
: Is there a tokenizer that tokenizes the string as one token? Using KeywordTokenizer at query time should do whta you want. -Hoss