Re: solr OOM Crash

2014-01-09 Thread Sébastien Michel
Hi Sandra, Excuse me for the late reply. We use lotsofcores (http://wiki.apache.org/solr/LotsOfCores) Solr feature, around 100 simultaneous loaded cores. But the issue is reproducible with few less cores. We also have a high rate of indexing, and also reindexing (atomic update). We are indexing

Re: Zookeeper as Service

2014-01-09 Thread Karthikeyan.Kannappan
I am hosting in windows OS -- View this message in context: http://lucene.472066.n3.nabble.com/Zookeeper-as-Service-tp4110396p4110413.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Shard splitting error: cannot uncache file=_1.nvm

2014-01-09 Thread rafal janik
Greg Preston wrote [qtp243983770-60] ERROR org.apache.solr.core.SolrCore – java.io.IOException: cannot uncache file=_1.nvm: it was separately also created in the delegate directory at org.apache.lucene.store.NRTCachingDirectory.unCache(NRTCachingDirectory.java:297) at

Re: PeerSync Recovery fails, starting Replication Recovery

2014-01-09 Thread Anca Kopetz
Hi, We tried to understand why we get a Connection reset exception on the leader when it tries to foward the documents to one of its replica. We analyzed the GC logs and we did not see any long GC pauses around the time the exception was thrown. For 24 hours of gc logs, the max full gc pause

solr text analysis showing a red bar error

2014-01-09 Thread Umapathy S
Hi, I am a new to solr/lucene. I am trying to do a text analysis on my index. The below error (screenshot) is shown when I increase the field value length. I have tried searching in vain for any length specific restrictions in solr.TextField. There is no error text/exception thrown. [image:

Re: solr text analysis showing a red bar error

2014-01-09 Thread Aruna Kumar Pamulapati
Can you copy paste the error, for some reason I can not see the image of the screenshot you posted. On Thu, Jan 9, 2014 at 7:52 AM, Umapathy S nsupat...@gmail.com wrote: Hi, I am a new to solr/lucene. I am trying to do a text analysis on my index. The below error (screenshot) is shown

Checking for similar text (duplicates)

2014-01-09 Thread Cristian Bichis
Hi, I have one app where the search part is based currently on something else than Solr. However, as the scale/demand and complexity grows I am looking at Solr for a potential better fit, including for some features currently implemented into scripting layer (so which are not on search

Re: solr text analysis showing a red bar error

2014-01-09 Thread Umapathy S
Thanks. Actually there is no error thrown. Just a red bar appears on top. I have pasted it on http://snag.gy/U9IiJ.jpg On 9 January 2014 12:56, Aruna Kumar Pamulapati apamulap...@gmail.comwrote: Can you copy paste the error, for some reason I can not see the image of the screenshot you

Re: solr text analysis showing a red bar error

2014-01-09 Thread Aruna Kumar Pamulapati
Thanks, can you paste the text that you were trying to analyze? On Thu, Jan 9, 2014 at 8:10 AM, Umapathy S nsupat...@gmail.com wrote: Thanks. Actually there is no error thrown. Just a red bar appears on top. I have pasted it on http://snag.gy/U9IiJ.jpg On 9 January 2014 12:56, Aruna

Re: Checking for similar text (duplicates)

2014-01-09 Thread Mikhail Khludnev
Hello Cristian, Have you seen http://wiki.apache.org/solr/Deduplication ? On Thu, Jan 9, 2014 at 5:01 PM, Cristian Bichis cri...@imagis.ro wrote: Hi, I have one app where the search part is based currently on something else than Solr. However, as the scale/demand and complexity grows I am

Re: solr text analysis showing a red bar error

2014-01-09 Thread Umapathy S
I checked that before. I am using solr-4.6.0. maxFieldLength is not applicable. On 9 January 2014 13:23, Aruna Kumar Pamulapati apamulap...@gmail.comwrote: If you are using a Solr version before 4.0 you should look into. solrconfig.xml: maxFieldLength1/maxFieldLength What is

Re: solr text analysis showing a red bar error

2014-01-09 Thread Aruna Kumar Pamulapati
See if this helps: https://groups.google.com/forum/#!topic/lily-discuss/IaQLpNVJRi8 On Thu, Jan 9, 2014 at 8:33 AM, Umapathy S nsupat...@gmail.com wrote: I checked that before. I am using solr-4.6.0. maxFieldLength is not applicable. On 9 January 2014 13:23, Aruna Kumar Pamulapati

Re: Checking for similar text (duplicates)

2014-01-09 Thread Cristian Bichis
Hi Mikhail, I seen deduplication part as well but I have some concerns: 1. Is deduplication supposed to work as well into a check-only (not try to actually add new record to index) request ? So if I just check to see if could be some duplicates of some text ? 2. As far as I seen the

solr increase number of digits that tint fields can store

2014-01-09 Thread Hakim Benoudjit
Hi, I have a price field of type tint, from which I will generate a range facet. And I have now some items in my index that exceed tint type limit (max integer). How do I increase tint max integer value? Here is tint definition in schema.xml: fieldType name=tint class=solr.TrieIntField

Re: Zookeeper as Service

2014-01-09 Thread Charlie Hull
On 09/01/2014 09:44, Karthikeyan.Kannappan wrote: I am hosting in windows OS -- View this message in context: http://lucene.472066.n3.nabble.com/Zookeeper-as-Service-tp4110396p4110413.html Sent from the Solr - User mailing list archive at Nabble.com. There are various ways to 'servicify'

Re: Range queries with Grouping is slow?

2014-01-09 Thread Smiley, David W.
It won¹t hit the filter cache if you set {! cache=false} local-param. On 1/8/14, 12:18 PM, Kranti Parisa kranti.par...@gmail.com wrote: yes thats the key, these time ranges change frequently and hitting filtercache then is a problem. I will try few more samples and probably debug thru it.

Re: Range queries with Grouping is slow?

2014-01-09 Thread Mikhail Khludnev
Hello, Here is workaround for caching separate clauses in OR filters. http://blog.griddynamics.com/2014/01/segmented-filter-cache-in-solr.html No coding is required, just try to experiment with request parameters. On Wed, Jan 8, 2014 at 9:11 PM, Erick Erickson erickerick...@gmail.comwrote:

Re: Searchquery on field that contains space

2014-01-09 Thread PeterKerk
@Ahmet: Thanks, but I also need to be able to search via wildcard and just found that a - might be resulting in unwanted results. E.g. when using this query:

Re: How to boost documents ?

2014-01-09 Thread Anca Kopetz
Hi, I tested the BoostQueryParser and it works on the simplified example. But we need to keep the edismax Query parser, so I tried the following query and it seems to work (I defined a local bf='' for qq). q=beautiful Christmas tree mm=2 qf=title^12 description^2 defType=edismax

Re: Zookeeper as Service

2014-01-09 Thread Nazik Huq
From your email I gather your main concern is starting zookeeper on server startups. You may want to look at these non-native service oriented options too: Create a script( cmd or bat) to start ZK on server bootup. This method may not restart Zk if Zk crashes(not the server). Create C#

Re: Checking for similar text (duplicates)

2014-01-09 Thread Mikhail Khludnev
On Thu, Jan 9, 2014 at 5:39 PM, Cristian Bichis cri...@imagis.ro wrote: Hi Mikhail, I seen deduplication part as well but I have some concerns: 1. Is deduplication supposed to work as well into a check-only (not try to actually add new record to index) request ? So if I just check to see if

Re: Searchquery on field that contains space

2014-01-09 Thread PeterKerk
Basically a user starts typing the first letters of a city and I want to return citynames that start with those letters, case-insensitive and not splitting the cityname on separate words (whether the separator is a whitespace or a -). But although the search of a user is case-insensitive, I want

Re: solr increase number of digits that tint fields can store

2014-01-09 Thread Chris Hostetter
A TrieIntField field can never contain a value greater then java's Integer.MAX_VALUE -- it doesn't matter what settings you use. If you want to store larger values, you need to use a TrieLongField and re-index.

Re: Zookeeper as Service

2014-01-09 Thread Peter Keegan
There's also: http://www.tanukisoftware.com/ On Thu, Jan 9, 2014 at 11:18 AM, Nazik Huq nazik...@yahoo.com wrote: From your email I gather your main concern is starting zookeeper on server startups. You may want to look at these non-native service oriented options too: Create a script(

Re: Solr 4.6.0: DocValues (distributed search)

2014-01-09 Thread ku3ia
Today I setup a simple SolrCloud with tow shards. Seems the same. When I'm debugging a distributed search I can't catch a break-point at lucene codec file, but when I'm using faceted search everything looks fine - debugger stops. Can anyone help me with my question? Thanks. -- View this

Re: Searchquery on field that contains space

2014-01-09 Thread Ahmet Arslan
Hi Peter, Use KeywordTokenizerFactory instead of Whitespace tokenizer. Also you might interested in this :  http://www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/ Ahmet On Thursday, January 9, 2014 6:35 PM, PeterKerk vettepa...@hotmail.com wrote: Basically a user starts

Re: solr increase number of digits that tint fields can store

2014-01-09 Thread Hakim Benoudjit
Thanks that's the response I was searching for. And, I have confirmed that I need to reindex my data because tlong isnt compatible with tint. 2014/1/9 Chris Hostetter hossman_luc...@fucit.org A TrieIntField field can never contain a value greater then java's Integer.MAX_VALUE -- it doesn't

Re: Searchquery on field that contains space

2014-01-09 Thread PeterKerk
Hi Ahmet, Thanks. Also for that link, although it's too advanced for my usecase. I see that by using KeywordTokenizerFactory it almost works now, but when I search on: new y, no results are found, but when I search on new, I do get New York. So the space in the searchquery is still causing

Solr Cloud Query Scaling

2014-01-09 Thread Sir Gilligan
Question: Does adding replicas help with query load? Scenario: 3 Physical Machines. 3 Shards Query any machine, get results. Standard Solr Cloud stuff. Update Scenario: 6 Physical Machines. 3 Shards. M = Machine, S = Shard, -L = Leader M1S1-L M2S2 M3S3 M4S1 M5S2-L M6S3-L Incoming Query to

RE: Solr Cloud Query Scaling

2014-01-09 Thread Tim Potter
Absolutely adding replicas helps you scale query load. Queries do not need to be routed to leaders; they can be handled by any replica in a shard. Leaders are only needed for handling update requests. In general, a distributed query has two phases, driven by a controller node (what you called

Return only distinct combinations of 2 field values

2014-01-09 Thread PeterKerk
I'm searching on cities and returning city and province, some cities exist in different provinces, which is ok. However, I have some duplicates, meaning 2 cities occur in the same province. In that case I only want to return 1 result. I therefore need to have a distinct and unique city+province

Index size - to determine storage

2014-01-09 Thread Amit Jha
Hi, I would like to know if I index a file I.e PDF of 100KB then what would be the size of index. What all factors should be consider to determine the disk size? Rgds AJ

Re: Index size - to determine storage

2014-01-09 Thread Michael Della Bitta
Hi Amit, It really boils down to how much of that 100kb is actually text, and how you analyze and store the text. Meaning, it's really hard for us to say. You're probably going to need to experiment to figure out what the storage needs for your use case are. Michael Della Bitta Applications

Re: Range queries with Grouping is slow?

2014-01-09 Thread Kranti Parisa
Thank you, will take a look at it. Thanks, Kranti K. Parisa http://www.linkedin.com/in/krantiparisa On Thu, Jan 9, 2014 at 10:25 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Hello, Here is workaround for caching separate clauses in OR filters.

Re: Invalid version (expected 2, but 60) or the data in not in 'javabin' format exception while deleting 30k records

2014-01-09 Thread gpssolr2020
Thanks. We will try with more heap. And we noticed that zookeeper(open jdk) and Solr(sun jdk) is using different jvm. Will this really cause this OOM issue ?. -- View this message in context:

RE: Solr Cloud Query Scaling

2014-01-09 Thread Garth Grimm
As a follow-up question on this One would want to use some kind of load balancing 'above' the SolrCloud installation for search queries, correct? To ensure that the initial requests would get distributed evenly to all nodes? If you don't have that, and send all requests to M2S2 (IRT OP),

Re: Solr Cloud Query Scaling

2014-01-09 Thread Shawn Heisey
On 1/9/2014 4:09 PM, Garth Grimm wrote: As a follow-up question on this One would want to use some kind of load balancing 'above' the SolrCloud installation for search queries, correct? To ensure that the initial requests would get distributed evenly to all nodes? If you don't have

Re: need help on OpenNLP with Solr

2014-01-09 Thread Lance Norskog
There is no way to do these things with LUCENE-2899. On Mon, Jan 6, 2014 at 8:07 AM, rashi gandhi gandhirash...@gmail.comwrote: Hi, I have applied OpenNLP (LUCENE 2899.patch) patch to SOLR-4.5.1 for nlp searching and it is working fine. Also I have designed an analyzer for this:

Re: Solr Cloud Query Scaling

2014-01-09 Thread Joel Bernstein
You do need to load balance the initial query request across the SolrCloud nodes. Solj's CloudSolrServer and LBHttpSolrServer can perform the load balancing for you in the client. Or you can use a hardware load balancer. Joel Bernstein Search Engineer at Heliosearch On Thu, Jan 9, 2014 at 5:58

Re: Index size - to determine storage

2014-01-09 Thread Alexandre Rafalovitch
Try running PDF through standalone Tika and see what comes back. That's the size of the input. It usually be quite a small proportion of PDF size. Possibly down to metadata only and no text, if your PDF does not include text layer. Then, it depends on your storing and indexing options, your

Re: Searchquery on field that contains space

2014-01-09 Thread Alexandre Rafalovitch
On Thu, Jan 9, 2014 at 11:34 PM, PeterKerk vettepa...@hotmail.com wrote: Basically a user starts typing the first letters of a city and I want to return citynames that start with those letters, case-insensitive and not splitting the cityname on separate words (whether the separator is a

Copying Index

2014-01-09 Thread anand chandak
Hi, I am testing replication feature of solr 4.x with large index, unfortunately, that index that we had was for 3.x format. So I copied the index file and ran the upgrade index utility to convert it to 4.x format. The utility did, what it is suppose to do and I 4.x index (verified it with

Re: Searchquery on field that contains space

2014-01-09 Thread Ahmet Arslan
Hi Peter, Here are two different ways to do it. 1) Use phrase query q=yourField:new y with the following type. fieldType name=prefix_full class=solr.TextField positionIncrementGap=1 analyzer type=index tokenizer class=solr.KeywordTokenizerFactory /  filter class=solr.TrimFilterFactory /  filter