customize posting size(or block size) ...

2016-11-14 Thread Jason
Hi, Out searching patterns mostly use SpanNearQuery with PrefixQuery. In addition, single search query includes a lot of PrefixQuery. Actually, we don't have constraint using PrefixQuery. For this reason, JVM heap memory usage is often high. In this time, other queries also hangs. I'd like to

Re: Sorl shards: very sensitive to swap space usage !?

2016-11-14 Thread Chetas Joshi
Thanks everyone! The discussion is really helpful. Hi Toke, can you explain exactly what you mean by "the aggressive IO for the memory mapping caused the kernel to start swapping parts of the JVM heap to get better caching of storage data"? Which JVM are you talking about? Solr shard? I have

Re: Parallelize Cursor approach

2016-11-14 Thread Chetas Joshi
I got it when you said form N queries. Just wanted to try the "get all cursorMark first" approach but just realized it would be very inefficient as you said since cursor mark is serialized version of the last sorted value you received and hence still you are reading the results from solr although

Re: Parallelize Cursor approach

2016-11-14 Thread Erick Erickson
You're executing all the queries to parallelize before even starting. Seems very inefficient. My suggestion doesn't require this first step. Perhaps it was confusing because I mentioned "your own cursorMark". Really I meant bypass that entirely, just form N queries that were restricted to N

Re: Parallelize Cursor approach

2016-11-14 Thread Chetas Joshi
Thanks Joel for the explanation. Hi Erick, One of the ways I am trying to parallelize the cursor approach is by iterating the result set twice. (1) Once just to get all the cursor marks val q: SolrQuery = new solrj.SolrQuery() q.set("q", query) q.add("fq", query) q.add("rows",

Re: Editing schema and solrconfig files

2016-11-14 Thread Erick Erickson
Oh, and of course there's the whole managed schema capabilities where you use API end points to modify the schema file and a similar for some parts of solrconfig.xml. That said, though, for any kind of serious installation I'd still be pulling the modified configs off of ZK and putting them in

Re: RTF Rich text format

2016-11-14 Thread Alexandre Rafalovitch
The logical place to do that (if you cannot do outside of Solr) would be in an UpdateRequestProcessor. Unfortunately, there is no TikaExtract URP though other similar ones exist (e.g. for language guessing). The full list is here: http://www.solr-start.com/info/update-request-processors/ But you

Re: Editing schema and solrconfig files

2016-11-14 Thread Reth RM
There's a way to add/update/delete schema fields, this is helpful. https://jpst.it/Pqqz although no way to add field-Type On Wed, Nov 9, 2016 at 2:20 PM, Erick Erickson wrote: > We had the bright idea of allowing editing of the config files through > the UI... but the

how to tell SolrHttpServer client to accept/ignore all certs?

2016-11-14 Thread Robert Hume
I'm using HttpSolrServer (in Solr 3.6) to connect to a Solr web service and perform a query. The certificate at the other end has expired and so connections now fail. It will take the IT at the other end too many days to replace the cert (this is out of my control). How can I tell the

Re: index and data directories

2016-11-14 Thread Erick Erickson
Theoretically, perhaps. And it's quite true that stored data for fields marked stored=true are just passed through verbatim and compressed on disk while the data associated with indexed=true fields go through an analysis chain and are stored in a much different format. However these different data

RE: index and data directories

2016-11-14 Thread Prateek Jain J
By data, I mean documents which are to be indexed. Some fields can be stored="true" but that doesn’t matter. For example: App1 creates an object (AppObj) to be indexed and sends it to SOLR via solrj. Some of the attributes of this object can be declared to be used for storage. Now, my

Re: index and data directories

2016-11-14 Thread Erick Erickson
The question is pretty opaque. What do you mean by "data" as opposed to "indexes"? Are you talking about where Lucene puts stored="true" fields? If not, what do you mean by "data"? If you are talking about where Lucene puts the stored="true" bits the no, there's no way to segregate that our from

Re: Filtering a field when some of the documents don't have the value

2016-11-14 Thread Erick Erickson
You want something like: name:x=population:[10 TO *] OR (*:* -population:*:*) Best, Erick On Mon, Nov 14, 2016 at 10:29 AM, Gintautas Sulskus wrote: > Hi, > > I have an index with two fields "name" and "population". Some of the > documents have the "population"

Filtering a field when some of the documents don't have the value

2016-11-14 Thread Gintautas Sulskus
Hi, I have an index with two fields "name" and "population". Some of the documents have the "population" field empty. I would like to search for a value X in field "name" with the following condition: 1. if the field is empty - return results for name:X 2. else set the minimum value for the

Re: RTF Rich text format

2016-11-14 Thread Sergio García Maroto
Thanks for the response. I am afraid I can't use the DataImportHandler. I do the indexation using an Indexation Service joining data from several places. I have a final xml with plenty of data and one of them is the rtf field. That's the xml I send to Solr using the /update. I am guessing if it

Re: RTF Rich text format

2016-11-14 Thread Alexandre Rafalovitch
I think DataImportHandler with nested entity (JDBC, then Tika with FieldReaderDataSource) should do the trick. Have you tried that? Regards, Alex. Solr Example reading group is starting November 2016, join us at http://j.mp/SolrERG Newsletter and resources for Solr beginners and

RTF Rich text format

2016-11-14 Thread marotosg
Hi, I have a use case where I need to index information coming from a database where there is a field which contains rich text format. I would like to convert that text into simple plain text, same as tika does when indexing documents. Is there any way to achive that having a field only where i

Re: sorting by date not working on dates earlier than EPOCH

2016-11-14 Thread marotosg
Hi there. I have found a possible solution for this issue. -- View this message in context: http://lucene.472066.n3.nabble.com/sorting-by-date-not-working-on-dates-earlier-than-EPOCH-tp4303456p4305770.html Sent from the Solr - User mailing list archive at Nabble.com.

RE: index and data directories

2016-11-14 Thread Prateek Jain J
Hi Alex, I am unable to get it correctly. Is it possible to store indexes and data separately? Regards, Prateek Jain -Original Message- From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] Sent: 14 November 2016 03:53 PM To: solr-user Subject: Re:

Re: DIH problem with multiple (types of) resources

2016-11-14 Thread Alexandre Rafalovitch
On 15 November 2016 at 02:19, Peter Blokland wrote: > > Attribute names are case sensitive as far as I remember. Try 'dataSource' for the second definition. Regards, Alex. Solr Example reading group is starting November 2016, join us at http://j.mp/SolrERG

Re: index and data directories

2016-11-14 Thread Alexandre Rafalovitch
solr.xml also has a bunch of properties under the core tag: You can get the Reference Guide for your specific version here: http://archive.apache.org/dist/lucene/solr/ref-guide/ Regards, Alex. Solr Example reading group is starting November 2016, join us at

index and data directories

2016-11-14 Thread Prateek Jain J
Hi All, We are using solr 4.8.1 and would like to know if it is possible to store data and indexes in separate directories? I know following tag exist in solrconfig.xml file C:/del-it/solr/cm_events_nbi/data Regards, Prateek Jain

DIH problem with multiple (types of) resources

2016-11-14 Thread Peter Blokland
hi, I'm porting an old data-import configuratie from 4.x to 6.3.0. a minimal config is this : http://site/nl/${page.pid}; format="text"> when I try to do a full import with this, I get : 2016-11-14 12:31:52.173 INFO

Suggestions

2016-11-14 Thread Arkadi Colson
Is there a chance that suggestions will be generated at indexing time and not afterwards based on indexed data? This will make it possible to suggest on fields which are not "stored". Or is there another way to make suggestion like behavior possible? Thx! Arkadi

Re: price sort

2016-11-14 Thread Emir Arnautovic
Hi Midas, You can boost result by reciprocal value of price, but that does not guaranty that there will not be irrelevant result first because of it is cheap. Emir On 14.11.2016 11:19, Midas A wrote: Thanks for replying , i want to maintain relevancy along with price sorting \ for

Re: price sort

2016-11-14 Thread Midas A
Thanks for replying , i want to maintain relevancy along with price sorting \ for example if i search "nike shoes" According to relevance "nike shoes" come first then tshirt (other product) from nike . and now if we sort the results tshirt from nike come on the top . this is some thing

Collection sincronization and AWS instace autoscale

2016-11-14 Thread Iván Martínez Castro
Hi, I have a SolrCloud 4.9.1 setup with 4 nodes, 50 collections /1 shard and 4 replicas per collection Question one: What happens with collection data when I shutdown one node? When I start this node again, ZK would update the collection data? Question two: If I setup an auto scale load based

Re: spell checking on query

2016-11-14 Thread Emir Arnautovic
Hi Midas, You can use Solr's spellcheck component: https://cwiki.apache.org/confluence/display/solr/Spell+Checking Emir On 14.11.2016 08:37, Midas A wrote: How can we do the query time spell checking with help of solr . -- Monitoring * Alerting * Anomaly Detection * Centralized Log

Re: price sort

2016-11-14 Thread Emir Arnautovic
Hi Midas, Sorting by price means that score (~relevancy) is ignored/used as second sorting criteria. My assumption is that you have long tail of false positives causing sort by price to sort cheap, unrelated items first just because they matched by some stop word. Or I missed your question?

Re: facet query performance

2016-11-14 Thread Toke Eskildsen
On Mon, 2016-11-14 at 11:36 +0530, Midas A wrote: > How to improve facet query performance 1) Don't shard unless you really need to. Replicas are fine. 2) If the problem is the first facet call, then enable DocValues and re-index. 3) Keep facet.limit <= 100, especially if you shard. and most