Re: SolrJ getHighlighting() does not return results in order

2017-03-22 Thread leoperezpulido
Hello, Yes, getHightlighting() returns a Map>, so I first get the map = response.getHightlighting(); Then I initialize a TreeMap with the map object just obtained above (new TreeMap<>(map)). I then get a collection-view of this treeMap object, like: set =

RE: Exception while integrating openNLP with Solr

2017-03-22 Thread aruninfo100
Hi, I applied the LUCENE-2899.patch which provide the openNLP capabilities to solr for nlp capabilities.One such feature it provides is lemmatization,which helps to match the root word.But integrating the same was too much time consuming(indexing). It provides you with POS,Sentence

Re: Regex Phrases

2017-03-22 Thread Erick Erickson
Susheel: That'll work, but the options you've specified for WordDelimiterFilterFactory pretty much make it so it's doing nothing. I realize it's commented out... That said, it's true that if you have a very specific pattern you want to recognize a Regex can do the trick. WDFF is a bit more

RE: Exception while integrating openNLP with Solr

2017-03-22 Thread Markus Jelsma
Hi - We don't use that OpenNLP patch, nor do we use such kind of lemmatizer. We just rely on POS-tagging via a CharFilter with custom trained maxent models and it is fast enough. So, do you really need that analyzer that is giving you a hard time? I don't know what that lemmatizer does but you

Re: SolrJ getHighlighting() does not return results in order

2017-03-22 Thread Bryan Bende
Hello, I believe getHighlighting() returns Map>> . Generally Maps are not expected to iterate in order unless you know the underlying implementation of the Map, for example LinkedHashMap will iterate in the insertion order and HashMap will not. You should be able to take the doc id from one of

Re: Custom FieldTypes

2017-03-22 Thread Ronald Wood
Thanks, Alex. I’ll experiment with it. -R On 3/22/17, 4:38 PM, "Alexandre Rafalovitch" wrote: You could provide the URP chain name (or individual URPs) when you index a particular document type, but that requires you to send all document types to put signature

Re: Custom FieldTypes

2017-03-22 Thread Alexandre Rafalovitch
You could provide the URP chain name (or individual URPs) when you index a particular document type, but that requires you to send all document types to put signature on together. Or you could have a custom URP that skips other ones (they are chained), though that's messier. And I think you want

Re: Concatenating streams in streaming expressions

2017-03-22 Thread Joel Bernstein
There isn't a cat function yet. The closest function we have currently is a merge function: https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions#StreamingExpressions-merge But I've been meaning to add a cat function so feel free to create the jira. Joel Bernstein

Re: Solr Delete By Id Out of memory issue

2017-03-22 Thread Rohit Kanchan
For commits we are relying on auto commits. We have define following in configs: 1 3 false 15000 One thing which I would like to mention is that we are not calling directly deleteById from client.

Re: Regex Phrases

2017-03-22 Thread Susheel Kumar
I have used PatternReplaceFilterFactory in some of these situations. e.g. below On Wed, Mar 22, 2017 at 2:54 PM, Mark Johnson wrote: > Awesome, thank you much! > > On Wed, Mar 22, 2017 at 2:38 PM, Erick Erickson > wrote: > > > Take a

Concatenating streams in streaming expressions

2017-03-22 Thread Matt Magnusson
Hello; Does anyone know of a way where I can concatenate source streams? For example if I have two searches search(prod,q="content:cat",fl="id,score",sort="score desc") search(prod,q="content:dog",fl="id,score",sort="score desc") Is there a way to have these come out as one stream. I've been

Re: Custom FieldTypes

2017-03-22 Thread Ronald Wood
Thanks. I had seen that page but had passed it over since I don’t want to do de-duping (text fields with the exact same text are possible and not cause for de-dupe). If I want just to store the signature, it looks like I define the signatureField in the configuration and set overwriteDupes to

Re: Custom FieldTypes

2017-03-22 Thread Alexandre Rafalovitch
You'd use CloneField URP http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/update/processor/CloneFieldUpdateProcessorFactory.html Then you do your custom algorithm. Or - as I just remembered - use one of the hash ones described in dedupe section:

Re: Custom FieldTypes

2017-03-22 Thread Ronald Wood
I suppose it could be, but the flexibility of using copy directives is appealing for handling multiple fields as defined in the schema. Since I have rarely looked at the UpdateRequestProcessor, I guess I don’t know if it could take multiple fields to hash, and if so how that would be expressed.

Re: Regex Phrases

2017-03-22 Thread Mark Johnson
Awesome, thank you much! On Wed, Mar 22, 2017 at 2:38 PM, Erick Erickson wrote: > Take a close look at WordDelimiterFilterFactory, it's designed to deal > with things like part numbers, phone numbers and the like, and the > example you gave is in the same class of

Re: Regex Phrases

2017-03-22 Thread Erick Erickson
Take a close look at WordDelimiterFilterFactory, it's designed to deal with things like part numbers, phone numbers and the like, and the example you gave is in the same class of problem I think. It'll take a bit to get your head around what it does, but it'll perfom better than regexes, assuming

Re: Custom FieldTypes

2017-03-22 Thread Alexandre Rafalovitch
Can this be done at the UpdateRequestProcessor stage? Regards, Alex On 22 Mar 2017 1:48 PM, "Ronald Wood" wrote: I have been mulling over the usefulness of a new Hash field type for being able to validate data that is indexed but not stored. Basically, I’d use copy

Tuple object implementing Serializable

2017-03-22 Thread Kiran Chitturi
Hi, Is there any reason that Tuple object does not implement Serializable like SolrDocumentBase which does implement

RE: Exception while integrating openNLP with Solr

2017-03-22 Thread aruninfo100
Hi, Thanks for the reply. Kindly find the filed type scghema i am using : Does the *opennlp_text* field be indexed="true"? Here the en-lemmatizer.txt is 7mb in size.Without lemmatization usually the whole indexing process takes on an average

Regex Phrases

2017-03-22 Thread Mark Johnson
Is it possible to configure Solr to treat text that matches a regex as a phrase? I have a database full of products, and the Title and Description fields are text_en, tokenized via the StandardTokenizerFactory. This works in most cases, but a number of products have names like: - Vitamin A -

Custom FieldTypes

2017-03-22 Thread Ronald Wood
I have been mulling over the usefulness of a new Hash field type for being able to validate data that is indexed but not stored. Basically, I’d use copy directives to copy all fields to be hashed to the new hash field and store a SHA-256 hash as a string. I’m still not sure how valuable it

RE: Exception while integrating openNLP with Solr

2017-03-22 Thread Markus Jelsma
Hi - We are not having large issues using OpenNLP for POS-tagging in Lucene. But you mention commits, a committing with or without POS payloads is hardly any different so commits should be unaffected. Maybe you have another issue? Perhaps use a sampler to pinpoint the problem. Markus

RE: Exception while integrating openNLP with Solr

2017-03-22 Thread aruninfo100
Hi I am really finding it difficult to index documents using openNLP lemmatizer.The indexing is taking too much time(including commit).Is there a way to optimize or increase the performance. Also it will be helpful in knowing different opennlp lemmatizer implementations which are also good

Re: Solr Delete By Id Out of memory issue

2017-03-22 Thread Chris Hostetter
: OK, The whole DBQ thing baffles the heck out of me so this may be : totally off base. But would committing help here? Or at least be worth : a test? ths isn't DBQ -- the OP specifically said deleteById, and that the oldDeletes map (only used for DBI) was the problem acording to the heap

Both Nodes in shard think they are leader

2017-03-22 Thread philippa griggs
Hello, I’m using Solr Cloud version 5.4.1. I have two cores in a shard (a leader and replica) every so often they both go into recovery/down and then come back up. However when they come back, they both think they are leader. I then have to manually step in, stop them both, start one and

Re: model building

2017-03-22 Thread Joel Bernstein
I did a review of the code and it was definitely written to support having multiple training sets in the same collection. So, it sounds like something is not working as designed. I planned on testing out model building with different types of training sets anyway, so I'll can comment on my

Re: model building

2017-03-22 Thread Joe Obernberger
Thank you Tim. I appreciated the tips. At this point, I'm just trying to understand how to use it. The 30 tweets that I've selected so far, are, in fact threatening. The things people say! My favorite so far is 'disingenuous twat waffle'. No kidding. The issue that I'm having is not

SolrJ getHighlighting() does not return results in order

2017-03-22 Thread leoperezpulido
Hi, Implementing highlighting with *SolrJ* does not return results in the proper order while I "page" through results. This not seems to be a problem with the RESTful API. // ... query.setQuery("text"); /* The problem is when I set start to get different "pages", the results returned by

Re: Stored value for highlighting from different field?

2017-03-22 Thread Matthew Caruana Galizia
An ICIJ engineer, Julien Martin, has since developed a patch for this. We’d appreciate any feedback and attention that might help get this integrated: https://issues.apache.org/jira/browse/SOLR-1105 > On 1 Mar 2017, at 17:03, Matthew Caruana

RE: Exception while integrating openNLP with Solr

2017-03-22 Thread Markus Jelsma
Hello - you need to increase the heap to work around the out of memory exception. There is not much you can to do increase the indexing speed using OpenNLP. Regards, Markus -Original message- > From:aruninfo100 > Sent: Wednesday 22nd March 2017 12:27 > To:

RE: Exception while integrating openNLP with Solr

2017-03-22 Thread aruninfo100
Hi, I was able to resolve the issue.But when I run the indexing process it is taking too long to index bigger documents and some times I get java heap memory exception. How can I improve the performance while using dictionary lemmmatizers. Thanks and Regards, Arun -- View this message in

RE: Exception while integrating openNLP with Solr

2017-03-22 Thread Markus Jelsma
Hello - there is an underlying SIOoBE causing you trouble: at java.lang.Thread.run(Thread.java:745) *Caused by: java.lang.ArrayIndexOutOfBoundsException: 1* at opennlp.tools.lemmatizer.SimpleLemmatizer.(SimpleLemmatizer.java:46) Regards,, Marks -Original message- >

Re: dataimport to a smaller Solr farm

2017-03-22 Thread Mikhail Khludnev
Hello, Dean. DIH is shard agnostic. How do you try to specify "a shard from the new collection"? On Tue, Mar 21, 2017 at 8:24 PM, deansg wrote: > Hello, > My team often uses the /dataimport & /dih handlers to move items from one > Solr collection to another. However, all the

Re: dynamic field sorting

2017-03-22 Thread Mikhail Khludnev
Since it hits heap, moving to docValues might make sense. On Wed, Mar 22, 2017 at 7:47 AM, Midas A wrote: > waiting for reply . Actually Heap utilization increases when we sort with > dynamic fields > > On Tue, Mar 21, 2017 at 10:37 AM, Midas A