Reading Solr Index directly

2010-11-17 Thread Sasank Mudunuri
Hi, I've been poking around the JavaDocs a bit, and it looks like it's possible to directly read the index using the Solr Java API. Hoping to clarify a couple of things -- 1) Do I need to read the index with Solr APIs, or can I use Lucene (PyLucene is particularly attractive...)? If so, how wary

Re: Extracting and indexing content from multiple binary files into a single Solr document

2010-11-17 Thread Gary Taylor
Jayendra, Brilliant! A very simple solution. Thank you for your help. Kind regards, Gary On 17 Nov 2010 22:09, Jayendra Patil wrote: The way we implemented the same scenario is zipping all the attachments into a single zip file which can be passed to the Extra

Re: Master/Slave High CPU Usage

2010-11-17 Thread Ofer Fort
anybody? On Wed, Nov 17, 2010 at 12:09 PM, Ofer Fort wrote: > > Hi, I'm working with Erez, > we experienced this again, and this time the slave index folder didn't > contain the index.XXX folder, only one index folder. > if we shutdown the slave, the CPU on the master was normal, as soon as we

Re: DateFormatTransformer issue with value 0000-00-00T00:00:00Z

2010-11-17 Thread Shanmugavel SRD
I tried this in JAVA where the SOLR runs. Is it problem due to the encoding? Code: GregorianCalendar calendar = new GregorianCalendar(TimeZone.getTimeZone("US/Eastern"), Locale.US); System.out.println(calendar.getTime()); calendar.set(0, 0, 0, 0, 0, 0); System.out.println(calendar.getTime()); Sy

Re: Save the file sent to the ExtractingRequestHandler locally on the server.

2010-11-17 Thread Kaustuv Royburman
A possible solution is to use a directory on the server to upload the files. Monitor the directory for new uploads and then post the documents to the solr using curl. If you are using a linux based server you can use inotifywatch to monitor the folder for new file uploads and then use the foll

Re: my index has 500 million docs ,how to improve solr search performance?

2010-11-17 Thread Lance Norskog
This is pretty standard. I think the problem is basic probabilities: when there are multiple shards, the query waits until the final shard responds, then does another query which may wait for more than one shard. The nature of probabilities is that there will be "stragglers" (late responses) an

Does edismax support wildcard queries?

2010-11-17 Thread Swapnonil Mukherjee
Hi Everybody, We have started to the use the dismax query handler but one serious limitation of it is that it does not support wild card queries? I think I have 2 ways to overcome this problem 1. Apply some old patches to the dismax parser itself from here https://issues.apache.org/jira/brows

Re: Save the file sent to the ExtractingRequestHandler locally on the server.

2010-11-17 Thread Lance Norskog
Upload the files independently of Solr. Solr is not a content management system. One problem is getting the links put together so that the link that comes out with the document can be turned into a link the user can open. Chad Salamon wrote: I would like to save files sent to the ExtractingReq

Dismax is failing with json response writer

2010-11-17 Thread sivaprasad
Hi, I am using dismax query parser.When i want the response in JSON, iam giving wt=json.Here it is throwing the below exception. HTTP Status 500 - null java.lang.NullPointerException at org.apache.solr.search.DocSlice$1.score(DocSlice.java:121) at org.apache.solr.request.JSONWriter.writeDocList(

Solr server with utf-8 support in jetty

2010-11-17 Thread Kaustuv Royburman
I am running solr from the examples folder using the command java -jar start.jar When i run the test_utf8.sh file from the exampledocs folder I get the following output Solr server is up. HTTP GET is accepting UTF-8 HTTP POST is accepting UTF-8 HTTP POST defaults to UTF-8 ERROR: HTTP GET is no

Re: case insensitive sort and LowerCaseFilterFactory

2010-11-17 Thread Scott Yeadon
Sorry, looks like it was a data-related issue, apologies for the noise (although if anyone spots anything dodgy in the config feel free to let me know). Scott. On 18/11/10 2:21 PM, Scott Yeadon wrote: Hi, I'm running solr-tomcat 1.4.0 on Ubuntu and have an issue with the sorting of results.

Re: DateFormatTransformer issue with value 0000-00-00T00:00:00Z

2010-11-17 Thread Dennis Gearon
I thought that that value was a perfectly valid one for ISO 9601 time? http://en.wikipedia.org/wiki/Year_zero Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not

case insensitive sort and LowerCaseFilterFactory

2010-11-17 Thread Scott Yeadon
Hi, I'm running solr-tomcat 1.4.0 on Ubuntu and have an issue with the sorting of results. According to this page http://web.archiveorange.com/archive/v/AAfXfzy5Tm1uDy5mYW3B I should be able to configure the LowerCaseFilterFactory to ensure results will be indexed and returned in a case insen

Re: Must require quote with single word token query?

2010-11-17 Thread Chamnap Chhorn
Thanks for your reply. Here is some other details: 1. Keyphrase field definition: 2. I'm using solr 1.4. 3. My dismax definition is the original configuration after install solr: dismax explicit 0.01 text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1

RE: Spell Checker

2010-11-17 Thread Eric Martin
Like a charm Dan, like a charm. I'm going to write this up and post it on Drupal. Thanks a ton! I have a much better idea of Solr and Did You Mean, Spell checker -Original Message- From: Dan Lynn [mailto:d...@danlynn.com] Sent: Tuesday, November 16, 2010 5:21 PM To: solr-user@lucene.apach

Re: Dismax - Boosting

2010-11-17 Thread Ahmet Arslan
> 2. How to use spell checker request handler along with > dismax? Just append this at the end of dismax request handler definition: spellcheck

Re: Dismax - Boosting

2010-11-17 Thread Ahmet Arslan
Wow you facet on many fields : author,pubyear,format,series,season,imprint,category,award,age,reading,grade,price The fields you facet on should be untokenized type: string, int, tint date etc. The fields you want full text search, e.g. the ones you specify in qf, pf parameter should be text t

RE: Per field facet limit

2010-11-17 Thread David Yang
Makes sense. The processing is already done and there is no reason to not return it, since it is wont explode into a horribly long list, unlike a field facet. Thanks! -Original Message- From: Jonathan Rochkind [mailto:rochk...@jhu.edu] Sent: Wednesday, November 17, 2010 6:21 PM To: solr-

Re: Per field facet limit

2010-11-17 Thread Jonathan Rochkind
I don't think a facet.limit or facet.mincount apply to facet queries, it's not applicable, whether global or field-specific. Keep in mind that a single facet query just returns ONE count, for the query you supplied. It's up to you to supply a query that will give the count you want, it won't u

RE: Per field facet limit

2010-11-17 Thread David Yang
Sorry for the typo, I meant mincount, not limit... :p Cheers, David -Original Message- From: David Yang [mailto:dy...@nextjump.com] Sent: Wednesday, November 17, 2010 6:15 PM To: solr-user@lucene.apache.org Subject: RE: Per field facet limit Thanks! Is there any way to apply this to fa

RE: Per field facet limit

2010-11-17 Thread David Yang
Thanks! Is there any way to apply this to facet queries as well? (I could just apply a f.field.facet.limit to each and every field, and then apply a global facet.limit for facet queries.) Cheers david -Original Message- From: Jonathan Rochkind [mailto:rochk...@jhu.edu] Sent: Wednesday,

Re: Per field facet limit

2010-11-17 Thread Jonathan Rochkind
f.name_of_field.facet.limit The f.name_of_field.original_value thing is a common pattern in Solr, but, yeah, sometimes it's hard to find it in the documentation. So same with any of the other facet parameters. f.name_of_field.facet.mincount, whatever. David Yang wrote: Hi, The wiki on

Per field facet limit

2010-11-17 Thread David Yang
Hi, The wiki on facet.limit (http://wiki.apache.org/solr/SimpleFacetParameters#facet.limit) says "This parameter can be specified on a per field basis to indicate a separate limit for certain fields." But it is not specified how to specify a specific field. How do you do this? I tried

Save the file sent to the ExtractingRequestHandler locally on the server.

2010-11-17 Thread Chad Salamon
I would like to save files sent to the ExtractingRequestHandler on the server processing it, and provide a link to the file in the solr document. I currently am running a solr core as a part of a larger web app, and I would like to publish the files as a part of that same web app. This way, both so

Re: Extracting and indexing content from multiple binary files into a single Solr document

2010-11-17 Thread Jayendra Patil
The way we implemented the same scenario is zipping all the attachments into a single zip file which can be passed to the ExtractingRequestHandler for indexing and included as a part of single Solr document. Regards, Jayendra On Wed, Nov 17, 2010 at 6:27 AM, Gary Taylor wrote: > Hi, > > We're t

Re: Dismax - Boosting

2010-11-17 Thread Solr User
Ahmet, Thanks for the reply and it was very helpful. The query that I used before changing to dismax was: /solr/tradecore/spell/?q=curious&wt=json&rows=9&facet=true&facet.limit=-1&facet.mincount=1&facet.field=author&facet.field=pubyear&facet.field=format&facet.field=series&facet.field=season&fac

Re: Multi Word searches in Solr

2010-11-17 Thread Matthew Hall
Yeah, I actually don't use the default field at all. Well I learned something new and good today ^^ I just need to recheck my assumptions on how Solr works versus how core lucene worked and I think I'll be fine. The way solr is doing it makes sense too in a way, so I just need to readjust m

Re: Multi Word searches in Solr

2010-11-17 Thread Erick Erickson
Nope, you've got it right. Parenthesis are what's necessary. This is actually similar to the Lucene world if you consider in your config to be equivalent to specifying a default field when you instantiate a parser. But that's a stretch. That said, it is surprising that you are getting the exa

Re: Is it possible to filter on particular field using terms component?

2010-11-17 Thread bbarani
Shalin, Thanks a lot for your reply. I will try using facetcomponent to achieve the term suggest.. Thanks, Barani -- View this message in context: http://lucene.472066.n3.nabble.com/Is-it-possible-to-filter-on-particular-field-using-terms-component-tp1918148p1919314.html Sent from the Solr -

Re: Multi Word searches in Solr

2010-11-17 Thread Matthew Hall
Oh and to clarify what I expect to see. I expect to see the term in a multiword query to be put together with OR's (Unless I've set the default to be AND's in the solrconfig.xml) I'm guessing that what I'm going to need to do here is place all of my queries in parentheses for the fields that

Re: Multi Word searches in Solr

2010-11-17 Thread Matthew Hall
I'm getting the result set that matches what it would be if I just searched for the first word in the query. So I'm getting the results for mouse. And yes, abstract: is the name of the field. So a search for abstract: mouse would yield 69103 results abstract: mouse anythingelseIputhere yiel

Re: Multi Word searches in Solr

2010-11-17 Thread kenf_nc
Multi word queries is the bread and butter of Solr/Lucene, so I'm not sure I understand the complete issue here. For clarity, is 'abstract' the name of your default text field, or is your query q=abstract: mouse genome if the latter, my thought was is it possible that the query is being convert

Multi Word searches in Solr

2010-11-17 Thread Matthew Hall
Good afternoon, We are running some queries against a default query field (of type text) that can be expected to be multiple words. For example, after parsing the query form I'm left with something something like this: abstract: mouse genome informatics The strange behavior that I am seein

Re: Is it possible to filter on particular field using terms component?

2010-11-17 Thread Shalin Shekhar Mangar
On Wed, Nov 17, 2010 at 11:23 AM, bbarani wrote: > > Hi, > > I am using terms component for auto suggest feature and it works great on > the complete index. > > I am more interested in using terms component for particular subset of > index.. something like I want to add a field filter criteria so

Re: occasional exception

2010-11-17 Thread j...@nuatech.net
Thanks a million Robert. On 17 November 2010 11:36, Robert Muir wrote: > Thank you, > > Looks like the problem was > https://issues.apache.org/jira/browse/SOLR-1667. I backported it to > the 1.4 branch: > http://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4/ > > On Wed, Nov 17, 2010 a

Re: Issue with copyField when updating document

2010-11-17 Thread Erick Erickson
How are you looking at the document? You mention using admin, are you searching? Because if you're looking at *terms* rather then the document, you should be aware that deleting a document does NOT remove the terms from the index, it just marks the doc as deleted. An optimize will remove the dele

Is it possible to filter on particular field using terms component?

2010-11-17 Thread bbarani
Hi, I am using terms component for auto suggest feature and it works great on the complete index. I am more interested in using terms component for particular subset of index.. something like I want to add a field filter criteria so that the terms components returns the terms corresponding to th

Re: ranged and boolean query

2010-11-17 Thread Ken Stanley
On Wed, Nov 17, 2010 at 11:00 AM, Peter Blokland wrote: > hi, > > On Wed, Nov 17, 2010 at 10:54:48AM -0500, Ken Stanley wrote: > >> > pubdate:([* TO NOW] OR (NOT *)) > >> Instead of using NOT, try simply prefixing the field name with a minus >> sign. This tells SOLR to exclude the field. Otherwise

Re: ranged and boolean query

2010-11-17 Thread Peter Blokland
hi, On Wed, Nov 17, 2010 at 10:54:48AM -0500, Ken Stanley wrote: > > pubdate:([* TO NOW] OR (NOT *)) > Instead of using NOT, try simply prefixing the field name with a minus > sign. This tells SOLR to exclude the field. Otherwise, the word NOT > would be treated as a term, and would be applied

Re: ranged and boolean query

2010-11-17 Thread Ken Stanley
On Wed, Nov 17, 2010 at 10:39 AM, Peter Blokland wrote: > hi. > > i'm using solr and am trying to limit my resultset to documents > that either have a publication date in the range * to now, or > have no publication date set at all (field is not present). > however, using this : > > (pubdate:[* TO

Re: DIH full-import failure, no real error message

2010-11-17 Thread Erik Fäßler
Hi Tommaso, I'm not sure I saw exactly that but there was a Solr-UIMA-contribution a few months ago and I had a look. I didn't go into details, because our search engine isn't upgraded to Solr yet (but is to come). But I will keep your link, perhaps this will proof useful to me, thank you!

ranged and boolean query

2010-11-17 Thread Peter Blokland
hi. i'm using solr and am trying to limit my resultset to documents that either have a publication date in the range * to now, or have no publication date set at all (field is not present). however, using this : (pubdate:[* TO NOW]) OR ( NOT pubdate:*) gives me only the documents in the range *

Re: DIH full-import failure, no real error message

2010-11-17 Thread Tommaso Teofili
Hi Erik 2010/11/17 Erik Fäßler > . But until this point it is necessary to retrieve the full documents, > otherwise I'd have to re-evaluate and partly rewrite our UIMA-Pipelines. Did you see https://issues.apache.org/jira/browse/SOLR-2129 for enhancing docs with UIMA pipelines just before they

Re: Solr context search

2010-11-17 Thread Peter Karich
take a look if the 'more like this' handler can solve your problem. Hi. I wonder is it possible in built-in way to make context search in Solr? I have about 50k documents (mainly 'name' of char(150)), so i receive a content of a page and should show found documents. Of course i can

Re: How do I format this query with 2 search terms?

2010-11-17 Thread Jón Helgi Jónsson
Thanks a lot for that! I wanted to use dismax but hit a wall because I require trailing wildcards in some instances. Methods 1 and 3 do not work in my case. However upon further thinking I realized in the cases I required wildcard I'm only searching one field. So I'll just turn dismax on and off a

Re: Must require quote with single word token query?

2010-11-17 Thread Erick Erickson
Try qt=dismax or deftype=dismax, I was also getting 0 results with defType on 1.4.1. I'll see what's up with that... But if that doesn't work... May we see your dismax definition too? You shouldn't need the quotes, so something's wrong somewhere What version of Solr are you using? Also, ple

Re: How to limit result rows by field types?

2010-11-17 Thread Erick Erickson
It's already in trunk, so if you can use one of the nightly builds you could start using it now, see: https://hudson.apache.org/hudson/job/Solr-trunk/ Best Erick On Tue, Nov 16, 2010 at 9:30 PM, Peter Wang wrote: > Peter Wang writes: > > > repl

Re: How do I format this query with 2 search terms?

2010-11-17 Thread Ken Stanley
2010/11/17 Jón Helgi Jónsson : > I'm using index time boosting and need to specify every field I want > to search (not use copy fields) or else the boosting wont work. > > This query with 1 saerchterm works fine, boosts look good: > > http://localhost:8983/solr/select/? > q=companyName:foo > +descr

How do I format this query with 2 search terms?

2010-11-17 Thread Jón Helgi Jónsson
I'm using index time boosting and need to specify every field I want to search (not use copy fields) or else the boosting wont work. This query with 1 saerchterm works fine, boosts look good: http://localhost:8983/solr/select/? q=companyName:foo +descriptionTxt:verslun &fl=*%20score&rows=10&start

Solr context search

2010-11-17 Thread Denis Kuzmenok
Hi. I wonder is it possible in built-in way to make context search in Solr? I have about 50k documents (mainly 'name' of char(150)), so i receive a content of a page and should show found documents. Of course i can just join by OR and submit a search, but an accuracy would be not so goo

Re: occasional exception

2010-11-17 Thread Robert Muir
Thank you, Looks like the problem was https://issues.apache.org/jira/browse/SOLR-1667. I backported it to the 1.4 branch: http://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4/ On Wed, Nov 17, 2010 at 4:48 AM, j...@nuatech.net wrote: > Hi Richard, > My full schema.xml is below (and att

Re: DateFormatTransformer issue with value 0000-00-00T00:00:00Z

2010-11-17 Thread Shanmugavel SRD
Thanks for your reply. I am indexing docs in XML using DIH. I am not using MySQL to import data. -- View this message in context: http://lucene.472066.n3.nabble.com/DateFormatTransformer-issue-with-value--00-00T00-00-00Z-tp1910644p1916712.html Sent from the Solr - User mailing list archive a

Extracting and indexing content from multiple binary files into a single Solr document

2010-11-17 Thread Gary Taylor
Hi, We're trying to use Solr to replace a custom Lucene server. One requirement we have is to be able to index the content of multiple binary files into a single Solr document. For example, a uniquely named object in our app can have multiple attached-files (eg. Word, PDF etc.), and we want

Problem with DIH delta-import delete.

2010-11-17 Thread Matti Oinas
Solr does not delete documents from index although delta-import says it has deleted n documents from index. I'm using version 1.4.1. The schema looks like uuid Relevant fields from database tables: TABLE: blogs and entries both have Field: id Type: int(11)

Re: sort desc and out of memory exception

2010-11-17 Thread Peter Karich
You are applying the sort against a (tokenized) text field? You should better sort against a number or a string. Probably using the copyField directive. Regards, Peter. hi all: I configure a solr application and there is a field of type text,and some kind like this 123456, that is a string

Re:Re: Updating Solr index - DIH delta vs. task queues

2010-11-17 Thread kafka0102
Does anyone care about this? I use task queue for now. I think DIH delta cannot handle changed data very well. For aim db,it needs not only a last_index_time collum. If a row is deleted, DIH delta cannot know it. So it need another boolean collum marking whether the row is deleted.However,thus h

Re: Unique ID with shared content field

2010-11-17 Thread Thyago
Hi, Yes, maybe i can use a field multivalued , with all my contents with a unique id. My field "code" maybe can be a field multivalued , where each code is a diferente reference for the same content. Can i directly update or remove one code inside a multi-valued field ? Like this query: - Need a

Re: Possibilities of (near) real time search with solr

2010-11-17 Thread Peter Sturge
* I believe the NRT patches are included in the 4.x trunk. I don't think there's any support as yet in 3x (uses features in Lucene 3.0). * For merging, I'm talking about commits/writes. If you merge while commits are going on, things can get a bit messy (maybe on source cores this is ok, but I hav

Re: DateFormatTransformer issue with value 0000-00-00T00:00:00Z

2010-11-17 Thread gwk
On 11/16/2010 1:41 PM, Shanmugavel SRD wrote: Hi, I am having a field as below in my feed. -00-00T00:00:00Z I have configured the field as below in data-config.xml. But after indexing, the field value becomes like this 0002-11-30T00:00:00Z I want to have the value as '

Re: my index has 500 million docs ,ho w to improve solr search performance?

2010-11-17 Thread lu.rongbin
thanks,Lance Norskog-2. I've tested the EBS, but it's not better. so ,maybe I have to optimize my solr config for ec2 m2.4xlarge.this kind computer config is : cpu units: 26 ECUs cpu cores: 8 memery: 68G solrconfig.xml content: ${solr.abortOnConfigurationError:true}

Re: Master/Slave High CPU Usage

2010-11-17 Thread Ofer Fort
Hi, I'm working with Erez, we experienced this again, and this time the slave index folder didn't contain the index.XXX folder, only one index folder. if we shutdown the slave, the CPU on the master was normal, as soon as we started the slave again, the CPU went up to 100% again. thanks for any hel

Re: my index has 500 million docs ,ho w to improve solr search performance?

2010-11-17 Thread lu.rongbin
thanks Toke,Once I've used "EBS" , I think that it can improve the I/O performence, but it's not obvious better.so, maybe I/O is not the important problem. thanks for your answer. -- View this message in context: http://lucene.472066.n3.nabble.com/my-index-has-500-million-docs-how-to-improve-sol

Master/Slave High CPU Usage

2010-11-17 Thread Erez Zarum
Hi all, We've been seeing this for the second time already. I have a solr (1.4.1) master and a slave. both are located on the same machine (16GB RAM, 4GB allocated to the slave and 3GB to the master) All our updates are going towards the master, and all the queries are towards the slave. Once in

Re: DIH full-import failure, no real error message

2010-11-17 Thread Erik Fäßler
Yes, I knew index and storing would pose a heavy load but I wanted to give it a try. The storing has to be for the goal I'd like to archive. We use a UIMA NLP-Pipeline to process the Medline documents and we already have a Medline-XML reader. Everything's fine with all this except until now we

Re: Issue with copyField when updating document

2010-11-17 Thread Lance Norskog
This very definitely should not happen. Can you remove the index and reindex everything? And then do these iterative tests? Which version of Solr are you running? If these changes do not cure the problem, can you post your schema.xml and solrconfig.xml? Pramod Goyal wrote: Hi, I am