Re: Parallelize Cursor approach

2016-11-04 Thread Erick Erickson
Have you considered the /xport functionality? On Fri, Nov 4, 2016 at 5:56 PM, Yonik Seeley wrote: > No, you can't get cursor-marks ahead of time. > They are the serialized representation of the last sort values > encountered (hence not known ahead of time). > > -Yonik > > > On

Re: Parallelize Cursor approach

2016-11-04 Thread Yonik Seeley
No, you can't get cursor-marks ahead of time. They are the serialized representation of the last sort values encountered (hence not known ahead of time). -Yonik On Fri, Nov 4, 2016 at 8:48 PM, Chetas Joshi wrote: > Hi, > > I am using the cursor approach to fetch results

Parallelize Cursor approach

2016-11-04 Thread Chetas Joshi
Hi, I am using the cursor approach to fetch results from Solr (5.5.0). Most of my queries return millions of results. Is there a way I can read the pages in parallel? Is there a way I can get all the cursors well in advance? Let's say my query returns 2M documents and I have set rows=100,000.

Re: CodaHale metrics for Solr 6?

2016-11-04 Thread Jeff Wartes
Expanding on my comment on the ticket, I’m really quite happy with using codahale/dropwizard metrics with Solr. I don’t know if I’m comfortable just sharing a screenshot of the resulting grafana dashboard, but I’ve got, per-host: - Percentile latencies and rates for GET vs POST (which in

Re: Facets based on sampling

2016-11-04 Thread Yonik Seeley
Sampling has been on my TODO list for the JSON Facet API. How much it would help depends on where the bottlenecks are, but that in conjunction with a hashing approach to collection (assuming field cardinality is high) should definitely help. -Yonik On Fri, Nov 4, 2016 at 3:02 PM, John Davis

Re: Facets based on sampling

2016-11-04 Thread Jeff Wartes
https://issues.apache.org/jira/browse/SOLR-5894 had some pretty interesting looking work on heuristic counts for facets, among other things. Unfortunately, it didn’t get picked up, but if you don’t mind using Solr 4.10, there’s a jar. On 11/4/16, 12:02 PM, "John Davis"

Re: Facets based on sampling

2016-11-04 Thread Alexandre Rafalovitch
I believe that's what's JSON facet API does by default. Have you tried that? Regards, Alex. Solr Example reading group is starting November 2016, join us at http://j.mp/SolrERG Newsletter and resources for Solr beginners and intermediates: http://www.solr-start.com/ On 5 November 2016

Re: Custom user web interface for Solr

2016-11-04 Thread Erik Hatcher
What kind of graphical format? > On Nov 4, 2016, at 14:01, "tesm...@gmail.com" wrote: > > Hi, > > My search query comprises of more than one fields like search string, date > field and a one optional field). > > I need to represent these on the web interface to the users. >

Re: Custom user web interface for Solr

2016-11-04 Thread Alexandre Rafalovitch
Unless you secure Solr instance well, you should not be exposing your Solr directly to the client. Anyone who can see Admin UI or /browse handle can also delete all your documents. I am mentioning this just in case. So, you usually need a middleware that maps your requests to Solr. Either with

Re: Aggregate Values Inside a Facet Range

2016-11-04 Thread Furkan KAMACI
Yes, it works with hours too. You can run a sum function each hour facet which is named as bucket. On Nov 4, 2016 10:14 PM, "William Bell" wrote: > How about hours? > > NOW+1HR > NOW+2HR > NOW+12HR > NOW-4HR > > Can we add that? > > > On Fri, Nov 4, 2016 at 12:25 PM, Furkan

Re: Aggregate Values Inside a Facet Range

2016-11-04 Thread William Bell
How about hours? NOW+1HR NOW+2HR NOW+12HR NOW-4HR Can we add that? On Fri, Nov 4, 2016 at 12:25 PM, Furkan KAMACI wrote: > I have documents like that > > id:5 > timestamp:NOW //pseudo date representation > count:13 > > id:4 > timestamp:NOW //pseudo date representation

Re: Aggregate Values Inside a Facet Range

2016-11-04 Thread Furkan KAMACI
Seems that Solrj doesn't support JSON Facet API yet. On Fri, Nov 4, 2016 at 9:08 PM, Furkan KAMACI wrote: > Fantastic! Thanks Yonik, I could do the stuff that I want with JSON Facet > API. > > On Fri, Nov 4, 2016 at 8:42 PM, Yonik Seeley wrote: > >>

Re: Aggregate Values Inside a Facet Range

2016-11-04 Thread Furkan KAMACI
Fantastic! Thanks Yonik, I could do the stuff that I want with JSON Facet API. On Fri, Nov 4, 2016 at 8:42 PM, Yonik Seeley wrote: > On Fri, Nov 4, 2016 at 2:25 PM, Furkan KAMACI > wrote: > > I mean, I have to facet by dates and aggregate values

Facets based on sampling

2016-11-04 Thread John Davis
Hi, I am trying to improve the performance of queries with facets. I understand that for queries with high facet cardinality and large number results the current facet computation algorithms can be slow as they are trying to loop across all docs and facet values. Does there exist an option to

Re: Aggregate Values Inside a Facet Range

2016-11-04 Thread Yonik Seeley
On Fri, Nov 4, 2016 at 2:25 PM, Furkan KAMACI wrote: > I mean, I have to facet by dates and aggregate values inside that facet > range. Is it possible to do that without multiple queries at Solr? This (old) blog shows a percentiles calculation under a range facet:

Re: Custom user web interface for Solr

2016-11-04 Thread KRIS MUSSHORN
https://cwiki.apache.org/confluence/display/solr/Velocity+Search+UI You might be able to customize velocity. K - Original Message - From: "Binoy Dalal" To: solr-user@lucene.apache.org Sent: Friday, November 4, 2016 2:33:24 PM Subject: Re: Custom user web

Re: Indexing and Disk Writes

2016-11-04 Thread Andrew Dinsmore
Erick, We currently have ramBufferSizeMB at 1024M. For this indexing activity, the cluster is "offline" thus no queries coming in so not worried about any user impact or delays should Solr terminate and need to replay. The thinking was that increasing these values (ramBuffer, commit times, etc)

Re: Custom user web interface for Solr

2016-11-04 Thread Binoy Dalal
See this link for more details => https://lucidworks.com/blog/2015/12/08/browse-new-improved-solr-5/ On Sat, Nov 5, 2016 at 12:02 AM Binoy Dalal wrote: > Have you checked out the /browse handler? It provides a pretty rudimentary > UI for displaying the results. It is

Re: Custom user web interface for Solr

2016-11-04 Thread Binoy Dalal
Have you checked out the /browse handler? It provides a pretty rudimentary UI for displaying the results. It is nowhere close to what you would want to present to your users but it is a good place to start off. On Fri, Nov 4, 2016 at 11:32 PM tesm...@gmail.com wrote: Hi, My

Re: Aggregate Values Inside a Facet Range

2016-11-04 Thread David Santamauro
I believe your answer is in the subject => facet.range https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-RangeFaceting // On 11/04/2016 02:25 PM, Furkan KAMACI wrote: I have documents like that id:5 timestamp:NOW //pseudo date representation count:13 id:4 timestamp:NOW

Aggregate Values Inside a Facet Range

2016-11-04 Thread Furkan KAMACI
I have documents like that id:5 timestamp:NOW //pseudo date representation count:13 id:4 timestamp:NOW //pseudo date representation count:3 id:3 timestamp:NOW-1DAY //pseudo date representation count:21 id:2 timestamp:NOW-1DAY //pseudo date representation count:29 id:1 timestamp:NOW-3DAY

Custom user web interface for Solr

2016-11-04 Thread tesm...@gmail.com
Hi, My search query comprises of more than one fields like search string, date field and a one optional field). I need to represent these on the web interface to the users. Secondly, I need to represent the search data in graphical format. Is there some Solr web client that provides the above

Re: Solrj facet.date

2016-11-04 Thread Furkan KAMACI
Hi Shawn, You are right, ClientUtils.escapeQueryChars() breaks the functionality. My expectation was that: Solrj has addDateRangeFacet However there is not a direct method for facet.date query. Kind Regards, Furkan KAMACI On Fri, Nov 4, 2016 at 7:04 PM, Shawn Heisey

Re: Solrj facet.date

2016-11-04 Thread Shawn Heisey
On 11/4/2016 10:22 AM, Furkan KAMACI wrote: > I send a query to Solr to get information about each day of current week > via this way: > > =*:* > =type:dps > =0 > =true > =date > =NOW/DAY-6DAYS > =NOW/DAY%2B1DAY > =%2B1DAY > > I want to make that query over Solrj. This code would do it: /*

Solrj facet.date

2016-11-04 Thread Furkan KAMACI
Hi, I send a query to Solr to get information about each day of current week via this way: =*:* =type:dps =0 =true =date =NOW/DAY-6DAYS =NOW/DAY%2B1DAY =%2B1DAY I want to make that query over Solrj. This facet.date definition at source code (5.5.3): public static final String FACET_DATE =

Re: How-To: Secure Solr by IP Address

2016-11-04 Thread Fuad Efendi
*Deserves* to mention: I run Solr on 8080 port, and Firewall blocks *port* 8080. It is not indeed securing by IP address! “block by IP” vs. “block by port number” “block *all* services run on a machine by IP address” vs. “block only Jetty” and etc. Still need option for Jetty, it will

Re: How-To: Secure Solr by IP Address

2016-11-04 Thread Fuad Efendi
Yes we need that documented, http://stackoverflow.com/questions/8924102/restricting-ip-addresses-for-jetty-and-solr Of course Firewall is a must for extremely strong environments / large corporations, DMZ, and etc; IPTables is the simplest solution if you run Linux; my vendor 1and1.com 

Re: Indexing and Disk Writes

2016-11-04 Thread Erick Erickson
Every time your ramBufferSizeMB limit is exceeded, a segment is created that's eventually merged. In terms of _throughput_, making this large usually doesn't help much after about 100M (the default). It'd be interesting to see if it changes your I/O activity though. BTW, I'd hard commit

Re: Sitecore deleting Solr documents

2016-11-04 Thread Erick Erickson
Hmm, I'm not quite sure we can help you as this sounds like Sitecore-specific functionality. Here's my total guess anyway. The docs are somehow getting indexed directly to CD and CD is a slave to CM. So the next time a replication is triggered (see the settings in solrconfig.xml) the index from CM

Indexing and Disk Writes

2016-11-04 Thread Andrew Dinsmore
We are using Solr 5.4 to index TBs of documents in a bulk fashion to get the cluster up and running. Indexing is over HTTP round robin as directed by zookeeper. Each of the 13 nodes is receiving about 6-8 MB/s on the NIC but solr is writing around 20 to 25 thousand times per second (4k block

Re: Different Sorts based on Different Groups

2016-11-04 Thread Fuad Efendi
Hi Gustatec, Relevancy tuning is really *huge* area, check this book when you have a chance: https://www.manning.com/books/relevant-search Default Solr sorting is based on TF/IDF algorithm; and sorting is not necessarily ‘relevancy’ Trivial solution for clothes store domain would be this one,

Re: Search performance

2016-11-04 Thread Alessandro Benedetti
Seconding Shawn, if your queries will always aim the active documents you will see : High level this is what is going to happen : A) You need to run your query + a filter query that will retrieve only active documents. The filter query results will be cached. Solr will query over the entire

Re: Apache Solr Question

2016-11-04 Thread Chien Nguyen
Great! Thank you so much. ^^ -- View this message in context: http://lucene.472066.n3.nabble.com/Apache-Solr-Question-tp4304308p4304437.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Different Sorts based on Different Groups

2016-11-04 Thread Alessandro Benedetti
Hi Gustatec, your problem seems a fairly basic relevance problem. Instead of elevating documents, why don't you include the category as part of the main query ? To make it simple in Solr you have a query component which affect the score and the filter queries which don't. If in your case you add

Different Sorts based on Different Groups

2016-11-04 Thread Gustatec
Hello everyone! I'm currently using Solr in a project (pretty much an e-commerce POC) and came across with the following sort situation: I have two products one called Product1 and other one called Product2, both of them belongs to the same categories, Shirt(ID 1) and Tank-Top(ID 2) When i

Re: Search performance

2016-11-04 Thread Shawn Heisey
On 11/4/2016 8:22 AM, Vincenzo D'Amore wrote: > Given 2 collection A and B: > > - A collection have 5 M documents with an attribute active: true/false. > - B collection have only 2.5 M documents, but all the documents have > attribute active:true > - in any case, A or B, I can only search upon

RE: UpdateProcessor as a batch

2016-11-04 Thread Markus Jelsma
Thanks all for sharing your thoughts! -Original message- > From:Joel Bernstein > Sent: Friday 4th November 2016 1:28 > To: solr-user@lucene.apache.org > Subject: Re: UpdateProcessor as a batch > > This might be useful. In this scenario you load you content into

Search performance

2016-11-04 Thread Vincenzo D'Amore
Hi all, it's trivia time :) hope you enjoy the question. Given 2 collection A and B: - A collection have 5 M documents with an attribute active: true/false. - B collection have only 2.5 M documents, but all the documents have attribute active:true - in any case, A or B, I can only search upon

Re: How-To: Secure Solr by IP Address

2016-11-04 Thread David Smiley
Not to knock the other suggestions, but a benefit to securing Jetty like this is that *everyone* can do this approach. On Fri, Nov 4, 2016 at 9:54 AM john saylor wrote: > hi > > any firewall worth it's name should be able to do this. in fact, that is > one of several

Re: How-To: Secure Solr by IP Address

2016-11-04 Thread john saylor
hi any firewall worth it's name should be able to do this. in fact, that is one of several things that a firewall was designed to do. also, you are stopping this traffic at the application, which is good; but you'd prolly be better off stopping it at the network interface [using a firewall,

Re: How-To: Secure Solr by IP Address

2016-11-04 Thread GW
I run a small solrcloud on a set of internal IP address. I connect with a routed OpenVPN so I hit solr on 10.8.0.1:8983 from my desktop. Only my web clients are on public IPs and only those clients can talk to the inside cluster. That's how I manage things... On 4 November 2016 at 09:27, David

How-To: Secure Solr by IP Address

2016-11-04 Thread David Smiley
I was just researching how to secure Solr by IP address and I finally figured it out. Perhaps this might go in the ref guide but I'd like to share it here anyhow. The scenario is where only "localhost" should have full unfettered access to Solr, whereas everyone else (notably web clients) can

Re: Fields with stored=false are stored though

2016-11-04 Thread Alexandre Rafalovitch
docValues are enabled (in the type) and with the latest schema version, docvalues can be returned even if stored is off. You can disable docValues or disable them returning a value unless requested explicitly in fl param. Regards, Alex. P.s. I am not say that was a smart idea to do in the

Fields with stored=false are stored though

2016-11-04 Thread Reinhard Budenstecher
I'm using Solr 6.2.1. Schema is static (schema.xml) and some fields look like and so on. But when querying in web browser GUI I can see, that these fields are stored though and values are returned on query. How can this happen? Looking into web schema browser I can see fields with

Re: Local parameter query and multiple fields

2016-11-04 Thread Gintautas Sulskus
To add: I am passing parameter defType=edismax. On Fri, Nov 4, 2016 at 11:41 AM, Gintautas Sulskus < gintautas.suls...@gmail.com> wrote: > Hi, > > If I search for "London" with the following query, I get London city at > the top. > > name:London^10 > category:City^5 > category:Organization^1 > >

Local parameter query and multiple fields

2016-11-04 Thread Gintautas Sulskus
Hi, If I search for "London" with the following query, I get London city at the top. name:London^10 category:City^5 category:Organization^1 Now I would like to store this query in SearchHandler with a parameter $term instead of the hard-coded word "London". However, I am not sure how the query

Re: facet on dynamic field

2016-11-04 Thread Erik Hatcher
You'll have to enumerate them (see the Luke request handler) and specify them explicitly. > On Nov 4, 2016, at 03:40, Midas A wrote: > > i want to create facet on all dynamic field (by_*) . what should be the > query ?

Sitecore deleting Solr documents

2016-11-04 Thread Joshua Campbell
Hi All, I'm having an odd issue with Solr, and am looking for some help or suggestions. We're using Solr (on a Sitecore website) for search and some search-driven pages. CM is pointing to a sitecore_master_index in Solr, while CD is pointing to a sitecore_web_index. We're using the

facet on dynamic field

2016-11-04 Thread Midas A
i want to create facet on all dynamic field (by_*) . what should be the query ?