Re: changed query parsing between 4.10.4 and 5.5.3?

2016-09-08 Thread Bernd Fehling
Hi Greg, thanks a lot, thats it. After setting q.op to OR it works _nearly_ as before with 4.10.4. But how stupid this? I have in my schema and also had q.op to AND to make sure my default _is_ AND, meant as conjunction between terms. But now I have q.op to OR and defaultOperator in schema to AN

Solr Collection Create API queries

2016-09-08 Thread Swathi Singamsetty
Hi Team, To implement the feature "Persist and use the replicationFactor,maxShardsPerNode at Collection&Shard level" am following the steps mentioned in the jira ticket https://issues.apache.org/jira/browse/SOLR-4808. I used the "smartCloud" and "autoManageCluster" properties to create a collecti

Re: High load, frequent updates, low latency requirement use case

2016-09-08 Thread Erick Erickson
Use the SolrJ CloudSolrClient class and use the client.add(doclist) form. Best, Erick On Thu, Sep 8, 2016 at 8:56 PM, Brent wrote: > Emir Arnautovic wrote >> There should be no problems with ingestion on 24 machines. Assuming 1 >> replication, that is roughly 40 doc/sec/server. Make sure you bul

Re: High load, frequent updates, low latency requirement use case

2016-09-08 Thread Brent
Emir Arnautovic wrote > There should be no problems with ingestion on 24 machines. Assuming 1 > replication, that is roughly 40 doc/sec/server. Make sure you bulk docs > when ingesting. What is bulking docs, and how do I do it? I'm guessing this is some sort of batch loading of documents? Thank

Re: changed query parsing between 4.10.4 and 5.5.3?

2016-09-08 Thread Greg Pendlebury
I forgot to mention the tickets: SOLR-2649 and SOLR-8812 On 9 September 2016 at 13:38, Greg Pendlebury wrote: > Under 4.10 q.op was ignored by the edismax parser and always forced to OR. > 5.5 is looking at the q.op=AND you requested. > > There are also some changes to the default values selecte

Re: changed query parsing between 4.10.4 and 5.5.3?

2016-09-08 Thread Greg Pendlebury
Under 4.10 q.op was ignored by the edismax parser and always forced to OR. 5.5 is looking at the q.op=AND you requested. There are also some changes to the default values selected for mm, but I doubt those apply here since you are setting it explicitly. On 8 September 2016 at 00:35, Mikhail Khlud

shingle query matching keyword tokenized field

2016-09-08 Thread Gandham, Satya
Can anyone help with this question that I posted on stackOverflow. http://stackoverflow.com/questions/39399321/solr-shingle-query-matching-keyword-tokenized-field Thanks in advance.

Re: Solr [Streaming Expressions/Parallel SQL Interface] Not supporting Multi Value using mapReduce option

2016-09-08 Thread Erick Erickson
It's the same problem. The "GROUP BY" has to sort the returned rows in order to partition the result docs (and thus do aggregations by group). We kind of skipped explaining that ;) Best, Erick On Thu, Sep 8, 2016 at 10:08 AM, Praveen Babu wrote: > Hi Erik/Joel, > > I am not sure , did I con

Re: Solr [Streaming Expressions/Parallel SQL Interface] Not supporting Multi Value using mapReduce option

2016-09-08 Thread Praveen Babu
Hi Erik/Joel, I am not sure , did I confused you guys. I was talking about "GROUP BY" on multivalued field .Not sorting Example : I have 1 TB data , I want to agg a multiValued field using stream api. aggmode=map_reduce Regards, S.Praveen Technical Architech LinkedIn: https://www.linkedin.com/

Re: Solr Grouping, Aggregations and Custom Functions

2016-09-08 Thread Roshni
Hi Joel, Thanks for responding. For full fledged data analytics powered by solr, group by and aggregations are needed. The basic aggregations are available- but we often have calculated fields like the one I mentioned sum (a)/sum(b). It will be cool to have these in solr. Such calculations cann

Re: Default stop word list

2016-09-08 Thread Walter Underwood
I recommend that you remove StopFilterFactor from every analysis chain. In the tf.idf scoring model, rare words are automatically weighted more than common words. I have an index with 11.6 million documents. “the” occurs in 9.9 million of those documents. “cat” occurs in 16,000 of those documen

MapReduceIndexerTool erroring with max_array_length

2016-09-08 Thread Darshan Pandya
Hello, While this may be a question for cloudera, I wanted to tap the brains of this very active community as well. I am trying to use the MapReduceIndexerTool to index data in a hive table to Solr Cloud / Cloudera Search. The tool is failing the job with the following error 1799 [main] INFO

Re: Solr [Streaming Expressions/Parallel SQL Interface] Not supporting Multi Value using mapReduce option

2016-09-08 Thread Erick Erickson
The basic problem is "what does sorting on a multi-valued field mean"? If you have a numeric field with values 1, 5, 7 how should sorting rank that doc? Use 1? 7? the average? Median? Sum? There is some limited ability in the rest of Solr to sort by min/max but that's it. Best, Erick On Thu,

Re: Solr Grouping, Aggregations and Custom Functions

2016-09-08 Thread Praveen Babu
Hi Joel Bernstein, Thanks for the update .If you guys get chance to provide that feature soon, it will be more benefit to the solr users. Regards, S.Praveen Technical Architech LinkedIn: https://www.linkedin.com/in/praveen-babu-73232889?trk=nav_responsive_tab_profile On Thu, Sep 8, 2016 at 5

Re: Default stop word list

2016-09-08 Thread Steven White
Hi Walter and all. Sorry for the late reply, I was out of town. Are you saying the list of stop words from the stop word file be remove? I understand the issues I will run into because of the stop word list, but all alone, my understanding of stop word list being in the stop word file is -- to e

Re: solr 5.5.2 dump threads - threads blocked in org.eclipse.jetty.util.BlockingArrayQueue

2016-09-08 Thread elisabeth benoit
Well, we rekicked the machine with puppet, restarted solr and now it seems ok. dont know what happened. 2016-09-08 11:38 GMT+02:00 elisabeth benoit : > > Hello, > > > We are perf testing solr 5.5.2 (with a limit test, i.e. sending as much > queries/sec as possible) and we see the cpu never goes o

Re: extract metadata

2016-09-08 Thread Alexandre Rafalovitch
That's what extract handler does. But look at the examples that ship with Solr. Including examples/files one. Or you can use Tina directly and send only extracted fields to Solr. Regards, Alex On 8 Sep 2016 8:39 PM, "KRIS MUSSHORN" wrote: > How would one get all metadata/properties from a

extract metadata

2016-09-08 Thread KRIS MUSSHORN
How would one get all metadata/properties from a .doc/pdf/xls etc into fields into solr?

AW: Wrong highlighting in stripped HTML field

2016-09-08 Thread Neumann, Dennis
Hello, thank you very much for your answers. As described in the SOLR-4686 issue, the problem only occurs when you use inline HTML tags (like or ). So in my case the solution is actually to use a block element and force it to be inline: bla highlighting: bla Cheers and thanks again, Dennis

Re: StrField with Wildcard Search

2016-09-08 Thread Ahmet Arslan
Hi, I think AutomatonQuery is used. http://opensourceconnections.com/blog/2013/02/21/lucene-4-finite-state-automaton-in-10-minutes-intro-tutorial/ https://lucene.apache.org/core/5_4_0/core/org/apache/lucene/search/AutomatonQuery.html Ahmet On Thursday, September 8, 2016 3:54 PM, Sandeep Khanzo

Custom Function-based Fields

2016-09-08 Thread Sandeep Khanzode
Hi, Can someone please direct me to some documentation that shows how to do this ... ? I need to write a non-trivial function that will return a new custom (not in schema) field but which is more complicated than a simple sum/avg/etc.  I want to create a function that looks at a few dateranges in

Re: StrField with Wildcard Search

2016-09-08 Thread Sandeep Khanzode
Hi, Okay. So it seems that the wildcard searches will perform a (sort-of) dictionary search where they will inspect every (full keyword) token at search time, and do a match instead of a match on pre-created index-time tokens with TextField. However, the wildcard/fuzzy functionality will still b

Re: Solr Grouping, Aggregations and Custom Functions

2016-09-08 Thread Joel Bernstein
Parallel SQL only supports the following functions currently: (SUM, AVG, MIN, MAX, COUNT). More functions and compound functions are on the roadmap. Joel Bernstein http://joelsolr.blogspot.com/ On Thu, Sep 8, 2016 at 12:11 AM, Praveen Babu wrote: > Hi All, > > I am also new to Solr and I have

Re: Solr [Streaming Expressions/Parallel SQL Interface] Not supporting Multi Value using mapReduce option

2016-09-08 Thread Joel Bernstein
Yes, sorting on multi-valued value fields isn't supported with Streaming Expressions. Multi-value fields can be exported but not used for sorting. There currently isn't a plan to add sorting on multi-value fields, but if other areas in Solr are supporting this perhaps we could use the same techni

Re: Streaming expression in solr doesnot support collection alias

2016-09-08 Thread Joel Bernstein
Getting aliases working is a high priority and fairly easy to do. We should have this in for Solr 6.3. Joel Bernstein http://joelsolr.blogspot.com/ On Thu, Sep 8, 2016 at 3:18 AM, Tali Finelt wrote: > Hi All, > > We saw there is an open issue regarding this subject: > https://issues.apache.org

Re: StrField with Wildcard Search

2016-09-08 Thread Ahmet Arslan
Hi, EdgeNGram and Wildcard may be used to achieve the same goal: prefix search or starts with search. Lets say, wildcard enumerates the whole inverted index, thus it may get slower for very large databases. With this one no index time manipulation is required. EdgeNGram does its magic at index

Re: Wrong highlighting in stripped HTML field

2016-09-08 Thread Alan Woodward
Hi, see https://issues.apache.org/jira/browse/SOLR-4686 - this is an ongoing point of contention! Alan Woodward www.flax.co.uk > On 8 Sep 2016, at 09:38, Duck Geraint (ext) GBJH > wrote: > > As far as I can tell, that is how it's currently s

solr 5.5.2 dump threads - threads blocked in org.eclipse.jetty.util.BlockingArrayQueue

2016-09-08 Thread elisabeth benoit
Hello, We are perf testing solr 5.5.2 (with a limit test, i.e. sending as much queries/sec as possible) and we see the cpu never goes over 20%, and threads are blocked in org.eclipse.jetty.util.BlockingArrayQueue, as we can see in solr admin interface thread dumps qtp706277948-757 (757) java.ut

StrField with Wildcard Search

2016-09-08 Thread Sandeep Khanzode
Hello, There are quite a few links that detail the difference between StrField and TextField. Also links that explain that, even though the field is indexed, it is not tokenized and stored as a single keyword, as can be verified by the debug analysis on Solr admin and CURL debugQuery options. Wh

Re: [JSON Faceting] Domain filter query

2016-09-08 Thread Alessandro Benedetti
another solution that jumped to my mind is to use stats : Given the field : product_id to be the collapsing field. For the facet i want the collapsed count I can do something like : { brands:{ terms : { // terms facet creates a bucket for each indexed term in the field field :

[JSON Faceting] Domain filter query

2016-09-08 Thread Alessandro Benedetti
Hi guys, was thinking to this problem : Given a set of flat documents I want to calculate facets on : 1) flat results set 2) collapsed result set Specifically some of my field facets will need to be on the flat results set and some of them will need to be calculated over a collapsed result set (

RE: Wrong highlighting in stripped HTML field

2016-09-08 Thread Duck Geraint (ext) GBJH
As far as I can tell, that is how it's currently set-up (does the same on mine at least). The HTML Stripper seems to exclude the pre tag, but include the post tag when it generates the start and end offsets of each text token. I couldn't say why though... (This may just have avoided needing to b

Streaming expression in solr doesnot support collection alias

2016-09-08 Thread Tali Finelt
Hi All, We saw there is an open issue regarding this subject: https://issues.apache.org/jira/browse/SOLR-9077 We would very much like to use this feature in our new production version. This issue currently prevents us from using streaming. We were wondering if there is any plan to fix this so