Exception when using File based and Index based SpellChecker

2013-07-18 Thread smanad
I am trying to use Filebased and index based spell checker and getting this exception All checkers need to use the same StringDistance. They work fine as expected individually but not together. Any pointers? -Manasi -- View this message in context:

Re: Doc's FunctionQuery result field in my custom SearchComponent class ?

2013-07-18 Thread Tony Mullins
Eric , In freq:termfreq(product,'spider') , freq is alias for 'termfreq' function query so I could have that field with name 'freq' in document response. this is my code which I am using to get document object and there is no termfreq field in its fields collection. DocList docs =

Configuring Tomcat 6 with Solr431 with multiple cores

2013-07-18 Thread PeterKerk
Thanks to Sandeep in this post: http://lucene.472066.n3.nabble.com/HTTP-Status-503-Server-is-shutting-down-td4065958.html#a4078567 I was able to setup Tomcat 6 with Solr 431. However, I need a multicore implementation and am now stuck on how to do so. Here is what I did based on Sandeeps

Inconsistent solrcloud search

2013-07-18 Thread Vladimir Poroshin
Hi, I have a strange behavior while searching my solrcloud cluster: for a query like this http://localhost/solr/my_collection/select?q=my+query; http://10.1.1.193:7006/solr-madaptive/collection_mapi/select?q=%22Sairauden+sanoma%22 solr responses sometimes with one document and sometimes with

boost docs if token matches happen in the first 5 words

2013-07-18 Thread Anatoli Matuskova
I've a set of documents with a WhiteSpaceTokenize field. I want to give more boost when the match of the query happens in the first 3 token positions of the field. Is there any way to do that (don't want to use payloads as they mean on more seek to disk so lower performance) -- View this

RE: boost docs if token matches happen in the first 5 words

2013-07-18 Thread Markus Jelsma
You must implement a SpanFirst query yourself. These are not implemented in any Solr query parser. You can easily expand the (e)dismax parsers and add support for it. -Original message- From:Anatoli Matuskova anatoli.matusk...@gmail.com Sent: Thursday 18th July 2013 11:54 To:

Custom RequestHandlerBase XML Response Issue

2013-07-18 Thread Vineet Mishra
Hi all I am using a Custom RequestHandlerBase where I am querying from multiple different Solr instance and aggregating their output as a XML Document using DOM, now in the RequestHandler's function handleRequestBody(SolrQueryRequest req, SolrQueryResponse resp) I want to output this XML Document

RE: boost docs if token matches happen in the first 5 words

2013-07-18 Thread Anatoli Matuskova
Thanks for the quick answer Markus. Could you give me a a guideline or point me where to check in the solr source code to see how to get it done? -- View this message in context: http://lucene.472066.n3.nabble.com/boost-docs-if-token-matches-happen-in-the-first-5-words-tp4078786p4078792.html

Re: Custom RequestHandlerBase XML Response Issue

2013-07-18 Thread Shalin Shekhar Mangar
This isn't a Solr issue. Maybe ask on the xerces list? On Thu, Jul 18, 2013 at 3:31 PM, Vineet Mishra clearmido...@gmail.comwrote: Hi all I am using a Custom RequestHandlerBase where I am querying from multiple different Solr instance and aggregating their output as a XML Document using

Re: Custom RequestHandlerBase XML Response Issue

2013-07-18 Thread Vineet Mishra
Thanks for your response Shalin, so does that mean that we can't return a XML object in SolrQueryResponse through Custom RequestHandler? On Thu, Jul 18, 2013 at 4:04 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: This isn't a Solr issue. Maybe ask on the xerces list? On Thu, Jul

RE: boost docs if token matches happen in the first 5 words

2013-07-18 Thread Markus Jelsma
You'll need the import org.apache.lucene.search.spans package in Solr's ExtendedDismaxQParserPlugin and add SpanFirstQuery's to the main query. Something like: query.add(new SpanFirstQuery(new SpanTermQuery(field, clause), distance), BooleanClause.Occur.SHOULD); -Original message-

Re: Custom RequestHandlerBase XML Response Issue

2013-07-18 Thread Shalin Shekhar Mangar
Solr's response writers support only a few known types. Look at the writeVal method in TextResponseWriter: https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/response/TextResponseWriter.java On Thu, Jul 18, 2013 at 4:08 PM, Vineet Mishra

Re: autoCommit and performance

2013-07-18 Thread Aditya
Hi It totally depends upon your affordability. If you could afford go for bigger RAM, SSD drive and 64 Bit OS. Benchmark your application, with certain set of docs, how much RAM it takes, Indexing time, Search time etc. Increase the document count and perform benchmarking tasks again. This will

Re: autoCommit and performance

2013-07-18 Thread Ayman Plaha
Thanks Shawn and Aditya. Really appreciate your help. Based on your advice and reading the SolrPerformance article Shawn linked me to, I ended up getting Intel Dual Core (2 Core) i3 3220 3.3Ghz with 36GB RAM with 2 x 125GB SSD drives for 227$ per month. It's still expensive for me but I got it

Re: Doc's FunctionQuery result field in my custom SearchComponent class ?

2013-07-18 Thread Jack Krupansky
As detailed in previous email, termfreq is not a field - it is a transformer or function. Technically, it is actually a ValueSource. If you look at the TextResponseWriter.writeVal method you can see you it kicks off the execution of transformers for writing documents. -- Jack Krupansky

Re: Custom RequestHandlerBase XML Response Issue

2013-07-18 Thread Vineet Mishra
But it seems it even have something called XML ResponseWriter https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/response/XMLResponseWriter.java Wont it be appropriate in my case? Although I have not implemented it yet but how come there couldn't be any way to

Re: Custom RequestHandlerBase XML Response Issue

2013-07-18 Thread Jack Krupansky
It would probably be better to integrate the responses (document lists.) Solr response writers do a lot of special processing of the response data, so you can't just throw random objects into the response. You may need to explain your use case a little more clearly. -- Jack Krupansky

Re: Custom RequestHandlerBase XML Response Issue

2013-07-18 Thread Vineet Mishra
So does that mean there is no way that we can write a XML or JSON object to the SolrQueryResponse and expect it to be formatted?

Re: Custom RequestHandlerBase XML Response Issue

2013-07-18 Thread Shalin Shekhar Mangar
Okay, let me explain. If you construct your combined response (why are you doing that again?) in the form a Solr NamedList or SolrDocumentList then the XMLResponseWriter (which btw uses TextResponseWriter) has no problem writing it down as XML. The problem here is that you are giving it an object

Sort by document similarity counts

2013-07-18 Thread zygis
Hi, Is it possible to sort search results based on the count of similar documents a document has? Say we have a document A which has 4 other similar documents in the index and document B which has 10. Then the order solr returns them should be B, A. Sorting on moreLikeThis counts for each

Re: Custom RequestHandlerBase XML Response Issue

2013-07-18 Thread Vineet Mishra
My case is like, I have got a few Solr Instances and querying them and getting their xml response, out of that xml I have to extract a group of specific xml nodes, later I am combining other solr's response into a single xml and making a DOM document out of it. So as you mentioned in your last

Re: Sort by document similarity counts

2013-07-18 Thread Koji Sekiguchi
I have tried doing this via custom SearchComponent, where I can find all similar documents for each document in current search result, then add a new field into document hoping to use sort parameter (q=*sort=similarityCount). I don't understand this part very well, but: But this will not

Re: How can I learn the total count of how many documents indexed and how many documents updated?

2013-07-18 Thread Furkan KAMACI
Hi Shawn; This is what I see when I look at mbeans: lst name=UPDATEHANDLERlst name=updateHandlerstr name=classorg.apache.solr.update.DirectUpdateHandler2/strstr name=version1.0/strstr name=descriptionUpdate handler that efficiently directly updates the on-disk main lucene index/strstr

RE: How can I learn the total count of how many documents indexed and how many documents updated?

2013-07-18 Thread Markus Jelsma
Not your updateHandler, that only shows number about what it's doing and it can be restarted. Check your cores: host:port/solr/admin/cores -Original message- From:Furkan KAMACI furkankam...@gmail.com Sent: Thursday 18th July 2013 15:46 To: solr-user@lucene.apache.org Subject: Re:

Re: Custom RequestHandlerBase XML Response Issue

2013-07-18 Thread Shalin Shekhar Mangar
This sounds like a bad idea. You could have done this much simply inside your own application using libraries that you know well. That being said, instead of creating a DOM document, create a solr NamedList object which can be serialized by XMLResponseWriter. On Thu, Jul 18, 2013 at 6:48 PM,

Getting a large number of documents by id

2013-07-18 Thread Brian Hurt
I have a situation which is common in our current use case, where I need to get a large number (many hundreds) of documents by id. What I'm doing currently is creating a large query of the form id:12345 OR id:23456 OR ... and sending it off. Unfortunately, this query is taking a long time,

Re: Clearing old nodes from zookeper without restarting solrcloud cluster

2013-07-18 Thread Luis Carlos Guerrero Covo
Hey andre, that isn't a possibility for us right now since we are terminating nodes using aws autoscaling policies. We'll have to either change our policies so that we can have some kind of graceful shutdown where we get the possibility to unload cores or update zookeeper's cluster state every

Two-steps queries with different sorting criteria

2013-07-18 Thread Fabio Amato
Hi all, I need to execute a Solr query in two steps, executing in the first step a generic limited-results query ordered by relevance, and in the second step the ordering of the results of the first step according to a given sorting criterion (different from relevance). This two-steps query is

Re: How can I learn the total count of how many documents indexed and how many documents updated?

2013-07-18 Thread Furkan KAMACI
Hi Markus; It doesn't give me how many documents updated from last commit. 2013/7/18 Markus Jelsma markus.jel...@openindex.io Not your updateHandler, that only shows number about what it's doing and it can be restarted. Check your cores: host:port/solr/admin/cores -Original

RE: How can I learn the total count of how many documents indexed and how many documents updated?

2013-07-18 Thread Markus Jelsma
No nothing will. If you must know, you'll have to do it on the client side and make sure autocommit is disabled. -Original message- From:Furkan KAMACI furkankam...@gmail.com Sent: Thursday 18th July 2013 17:01 To: solr-user@lucene.apache.org Subject: Re: How can I learn the total

Re: Getting a large number of documents by id

2013-07-18 Thread Alexandre Rafalovitch
You could start from doing id:(12345 23456) to reduce the query length and possibly speed up parsing. You could also move the query from 'q' parameter to 'fq' parameter, since you probably don't care about ranking ('fq' does not rank). If these are unique every time, you could probably look at not

Re: Getting a large number of documents by id

2013-07-18 Thread Jack Krupansky
Solr really isn't designed for that kind of use case. If it happens to work well for your particular situation, great, but don't complain when you are well outside the normal usage for a search engine (10, 20, 50, 100 results paged at a time, with modest sized query strings.) If you must get

Re: Getting a large number of documents by id

2013-07-18 Thread Michael Della Bitta
Brian, Have you tried the realtime get handler? It supports multiple documents. http://wiki.apache.org/solr/RealTimeGet Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY

Re: Solr with Hadoop

2013-07-18 Thread Matt Lieber
Rajesh, If you require to have an integration between Solr and Hadoop or NoSQL, I would recommend using a commercial distribution. I think most are free to use as long as you don't require support. I inquired about the Cloudera Search capability, but it seems like that far it is just preliminary:

Re: Getting a large number of documents by id

2013-07-18 Thread Roman Chyla
Look at speed of reading the data - likely, it takes long time to assemble a big response, especially if there are many long fields - you may want to try SSD disks, if you have that option. Also, to gain better understanding: Start your solr, start jvisualvm and attach to your running solr. Start

RE: Solr with Hadoop

2013-07-18 Thread Saikat Kanjilal
I'm familiar with and have used both the DSE cluster as well as am in the process of evaluating cloudera search, in general cloudera search has tight integration with hdfs and takes care of replication and sharding transparently by using the pre-existing hdfs replication and sharding, however

Re: Getting a large number of documents by id

2013-07-18 Thread Alexandre Rafalovitch
And I guess, if only a subset of fields is being requested but there are other large fields present, there could be the cost of loading those extra fields into memory before discarding them. In which case, using enableLazyFieldLoading may help. Regards, Alex. Personal website:

XInclude and Document Entity not working on schema.xml

2013-07-18 Thread Elodie Sannier
Hello, I am using the solr nightly version 4.5-2013-07-18_06-04-44 and I want to use Document Entity in schema.xml, I get this exception : java.lang.RuntimeException: schema fieldtype string(org.apache.solr.schema.StrField) invalid arguments:{xml:base=solrres:/commonschema_types.xml} at

Re: solr autodetectparser tikaconfig dataimporter error

2013-07-18 Thread Andreas Owen
i have now changed some things and the import runs without error. in schema.xml i haven't got the field text but contentsExact. unfortunatly the text (from file) isn't indexed even though i mapped it to the proper field. what am i doing wrong? data-config.xml: dataConfig dataSource

Luke's analysis of Trie Dates

2013-07-18 Thread JohnRodey
I have a TrieDateField dynamic field setup in my schema, pretty standard... dynamicField name=*_tdt type=tdate indexed=true stored=false/ fieldType name=tdate class=solr.TrieDateField omitNorms=true precisionStep=6 positionIncrementGap=0/ In my code I only set one field, creation_tdt and

Re: Luke's analysis of Trie Dates

2013-07-18 Thread Yonik Seeley
On Thu, Jul 18, 2013 at 12:53 PM, JohnRodey timothydd...@yahoo.com wrote: I have a TrieDateField dynamic field setup in my schema, pretty standard... dynamicField name=*_tdt type=tdate indexed=true stored=false/ fieldType name=tdate class=solr.TrieDateField omitNorms=true

Re: JVM Crashed - SOLR deployed in Tomcat

2013-07-18 Thread neoman
Thanks for your reply. Yes, it worked. No more crashes after switching to 1.6.0_30 -- View this message in context: http://lucene.472066.n3.nabble.com/JVM-Crashed-SOLR-deployed-in-Tomcat-tp4078439p4078906.html Sent from the Solr - User mailing list archive at Nabble.com.

Indexing into SolrCloud

2013-07-18 Thread Beale, Jim (US-KOP)
Hey folks, I've been migrating an application which indexes about 15M documents from straight-up Lucene into SolrCloud. We've set up 5 Solr instances with a 3 zookeeper ensemble using HAProxy for load balancing. The documents are processed on a quad core machine with 6 threads and indexed

Re: Getting a large number of documents by id

2013-07-18 Thread Brian Hurt
Thanks everyone for the response. On Thu, Jul 18, 2013 at 11:22 AM, Alexandre Rafalovitch arafa...@gmail.comwrote: You could start from doing id:(12345 23456) to reduce the query length and possibly speed up parsing. I didn't know about this syntax- it looks useful. You could also move

Auto-sharding and numShard parameter

2013-07-18 Thread Flavio Pompermaier
Hi to all, Probably this question has a simple answer but I just want to be sure of the potential drawbacks..when I run SolrCloud I run the main solr instance with the -numShard option (e.g. 2). Then as data grows, shards could potentially become a huge number. If I hadstio to restart all nodes

Need ideas to perform historical search

2013-07-18 Thread SolrLover
I am trying to implement Historical search using SOLR. Ex: If I search on address 800 5th Ave and provide a time range, it should list the name of the person who was living at the address during the time period. I am trying to figure out a way to store the data without redundancy. I can do a

Spellcheck questions

2013-07-18 Thread smanad
Exploring various SpellCheckers in solr and have a few questions, 1. Which algorithm is used for generating suggestions when using IndexBasedSpellChecker. I know its Levenshtein (with edit distance=2 - default) in DirectSolrSpellChecker. 2. If i have 2 indices, can I setup multiple

Re: Spellcheck questions

2013-07-18 Thread SolrLover
check the below link to get more info on IndexBasedSpellCheckers http://searchhub.org/2010/08/31/getting-started-spell-checking-with-apache-lucene-and-solr/ -- View this message in context: http://lucene.472066.n3.nabble.com/Spellcheck-questions-tp4078985p4079000.html Sent from the Solr -

additional requests sent to solr

2013-07-18 Thread alxsss
Hello, I send to solr( to server1 in the cluster of two servers) the folowing request

Solr 4.3 open a lot more files than solr 3.6

2013-07-18 Thread Zhang, Lisheng
Hi, After upgrading solr from 3.6 to 4.3, we found that solr opened a lot more files compared to solr 3.6 (when core is open). Since we have many cores (more than 2K and still grow), we would like to reduce the number of open files. We already used shareSchema and sharedLib, we also shared

Re: add to ContributorsGroup - Instructions for setting up SolrCloud on jboss

2013-07-18 Thread Erick Erickson
Thank you for adding to the wiki! It's always appreciated... On Wed, Jul 17, 2013 at 5:18 PM, Ali, Saqib docbook@gmail.com wrote: Thanks Erick! I have added the instructions for running SolrCloud on Jboss: http://wiki.apache.org/solr/SolrCloud%20using%20Jboss I will refine the

Re: Need ideas to perform historical search

2013-07-18 Thread Alexandre Rafalovitch
Why do you care about redundancy? That's the search engine's architectural tradeoff (as far as I understand). And, the tokens are all normalized under the covers, so it does not take as much space as you expect. Specifically regarding your issue, maybe you should store 'occupancy' as the record.