Re: Atomic Update (nested), Unified Highlighter and Lazy Field Loading => Invalid Index

2021-02-19 Thread Gus Heck
Actually I suspect it's there because the ability to exclude fields rather than include them is still pending... https://issues.apache.org/jira/browse/SOLR-3191 See also https://issues.apache.org/jira/browse/SOLR-10367 https://issues.apache.org/jira/browse/SOLR-9467 All of these and lazy field

Re: Congratulations to the new Apache Solr PMC Chair, Jan Høydahl!

2021-02-19 Thread Gus Heck
Congratulations :) On Fri, Feb 19, 2021 at 6:51 AM Juan Eduardo Hernandez < juaneduard...@gmail.com> wrote: > Congratulations Jan!! > > El vie, 19 feb 2021 a las 5:56, Atita Arora () > escribió: > > > Congratulations Jan! > > > > On Fri, Feb 19, 2021 at 9:41 AM Dawid Weiss > wrote: > > > > >

Re: Solr 8.6.3

2020-10-15 Thread Gus Heck
Shameless self plug,.. JesterJ (which I maintain) has a stax based xml extractor https://github.com/nsoft/jesterj/wiki/Document-Processors#staxextractingprocessor if you want to try it out. On Thu, Oct 15, 2020 at 4:32 PM Alexandre Rafalovitch wrote: > Why not do an XSLT transformation on it

Re: Recent and upcoming deprecations

2020-07-17 Thread Gus Heck
Deprecation announces an intention to remove. One of the main reasons given in the jira tickets I saw for deprecation sooner rather than later, is to ensure discussions happen and replacements, migrations and I assume even possibly decisions not to deprecate after all can be well sorted out and

Re: Developing update processor/Query Parser

2020-06-26 Thread Gus Heck
During the request, the parser plugin is retrieved from a PluginBag on the SolrCore object, so it should be reloaded at the same time as the update component (which comes from another PluginBag on SolrCore). If the components are deployed with consistent configuration in solrconfig.xml, any given

Re: Hardware requirements to host Apache Solr application

2019-07-15 Thread Gus Heck
And you could run a small single core on hardware much smaller than the latest smart phones if it didn't have to serve many requests... On Mon, Jul 15, 2019 at 11:24 AM Walter Underwood wrote: > One of our clusters got as large as 40 c4.8xlarge, another is happy with 4 > m4.xlarge and could

Re: SolrCloud: Configured socket timeouts not reflecting

2019-06-20 Thread Gus Heck
Hi Rahul, Did you try the patch int that issue? Also food for thought: https://issues.apache.org/jira/browse/SOLR-13457 -Gus On Tue, Jun 18, 2019 at 5:52 PM Rahul Goswami wrote: > Hello, > > I was looking into the code to try to get to the root of this issue. Looks > like this is an issue

Re: Enabling/disabling docValues

2019-06-11 Thread Gus Heck
On Mon, Jun 10, 2019 at 10:53 PM John Davis wrote: > You have made many assumptions which might not always be realistic a) > TextField is always tokenized Well, you could of course change configuration or code to do something else but this would be a very odd and misleading thing to do and we

Re: Unexpected behaviour when Solr 6 Admin UI pages are cached and server is Solr 8?

2019-06-05 Thread Gus Heck
Experiences that force the user to think about the browser cache are sub-par :). Anything that changes the URL will interrupt caching so just adding a query parameter &_v=8.1.1 (or whatever) to every request would probably do the trick, there's no need to mess with file names or file locations IF

Re: SolrCloud indexing triggers merges and timeouts

2019-06-05 Thread Gus Heck
Probably not a solution, but so.ething I notice off the bat... generally you want Xmx and Xms set to the same value so the jvm doesn't have to spend time asking for more and more memory, and also reduce the chance that the memory is not available by the time solr needs it. On Wed, Jun 5, 2019,

Re: not able to optimize

2019-06-04 Thread Gus Heck
Hi Midas, Your question will probably attract more useful answers if you provide better details. What version of solr, How many nodes, and any associated error messages from the logs. I see you asking questions that nobody can answer because we don't know the details of your system, or why you

Re: Compound Primary Keys

2019-04-24 Thread Gus Heck
Hi Vivek Solr is not a database, nor should one try to use it as such. You'll need to adjust your thinking some in order to make good use of Solr. In Solr there is normally an id field and it should be unique across EVERY document in the entire collection. Thus there's no concept of a primary

Re: How to prevent solr from deleting cores when getting an empty config from zookeeper

2019-04-10 Thread Gus Heck
Deleting data on a zookeeper hiccup does sound bad if it's really solr's fault. Can you work up a set of steps to reproduce? Something like install solr, index tech products example, shut down solr, perform some editing to zk, start solr, observe data gone (but with lots of details about exact

Re: CDCR issues

2019-03-24 Thread Gus Heck
This sounds worthy of a jira. Especially if you can cite steps to reproduce. On Fri, Mar 22, 2019, 10:51 PM Jay Potharaju wrote: > This might be causing the high CPU in 7.7.x. > > >

Re: edismax: sorting on numeric fields

2019-02-14 Thread Gus Heck
Hi Niclolas, Solr has no difficulty sorting on numeric fields if they are indexed as a numeric type. Just use "=weight asc" If you're field is indexed as text of course it won't sort properly, but then you should fix your schema. -Gus On Thu, Feb 14, 2019 at 4:10 PM David Hastings wrote: >

Under-utilization during streaming expression execution

2019-02-14 Thread Gus Heck
Hi Folks, I'm looking for ideas on how to speed up processing for a streaming expression. I can't post the full details because it's customer related, but the structure is shown here: https://imgur.com/a/98sENVT What that does is take the results of two queries, join them and push them back into

Re: Large Number of Collections takes down Solr 7.3

2019-01-28 Thread Gus Heck
Does it all have to be in a single cloud? On Mon, Jan 28, 2019, 10:34 PM Shawn Heisey On 1/28/2019 8:12 PM, Monica Skidmore wrote: > > I would have to negotiate with the middle-ware teams - but, we've used a > core per customer in master-slave mode for about 3 years now, with great > success.

Re: Solr admin UI new features

2019-01-27 Thread Gus Heck
Clear js cache. (Was solution for me) On Wed, Jan 23, 2019, 10:23 PM Dwane Hall Thanks Erick, very helpful as always ...we're up and going now. Before the > install I spun up a stand alone instance to check comparability and the > process did not shut down cleanly from the looks of things. I'm

Re: regarding debugging solr in eclipse

2019-01-26 Thread Gus Heck
A little late to this thread, but there's also a script here: https://issues.apache.org/jira/browse/SOLR-11492 for easy setup of a local cluster with debugging ports already open. Been waiting for some sign that someone other than me finds it useful before actually adding it to the project

Re: Curator in SOLR

2019-01-14 Thread Gus Heck
Unfortunately, the answer is no we don't quite have the same thing for TRA's yet. What's needed to bridge the gap are auto-scaling features that allow such migrations of older collections to different hardware take place automatically. Dave Smiley and I have definitely discussed the possibility

Re: Web Server HTTP Header Internal IP Disclosure SOLR port

2019-01-09 Thread Gus Heck
This sounds like something that might crop up if the admin UI were exposed to an alternate (or public) network space through a tunnel or proxy. The server knows nothing about the proxy/tunnel, and the cloud page has nice clickable machine names that point at the internal dns or ip names of the

Re: how to recover state.json files

2019-01-09 Thread Gus Heck
Not a direct solution, but manipulating data in Zookeeper can be made easier with https://github.com/rgs1/zk_shell On Wed, Jan 9, 2019 at 10:26 AM Erick Erickson wrote: > How did you "lose" the data? Exactly what happened? > > Where does the dataDir variable point in your > zoo.cfg file? By

Re: SOLR v7 Security Issues Caused Denial of Use - Sonatype Application Composition Report

2019-01-04 Thread Gus Heck
Hi Bob, Wrt licensing keep in mind that multi licensed software allows you to choose which license you are using the software under. Also there's some good detail on the Apache policy here: https://www.apache.org/legal/resolved.html#what-can-we-not-include-in-an-asf-project-category-x One has

Re: Query kills Solrcloud

2019-01-02 Thread Gus Heck
Are you able to re-index a subset into a new collection? For control of timeouts I would suggest Postman or curl, or some other non-browser client. On Wed, Jan 2, 2019 at 2:55 PM Webster Homer < webster.ho...@milliporesigma.com> wrote: > We are still having serious problems with our solrcloud

Re: How to debug empty ParsedQuery from Edismax Query Parser

2019-01-02 Thread Gus Heck
If you mean attach a debugger, solr is just like any other java program. Pass in the standard java options at start up to have it listen or connect as usual. The port is just a TCP port so ssh tunneling the debugger port can bridge the gap with a remote machine (or a vpn). That said the prior

Re: Solr Cloud: Zookeeper failure modes

2019-01-02 Thread Gus Heck
I thought jar files for custom code were meant to go into the '.system' collection, not zookeeper. Did I miss a new/old storage option? On Wed, Jan 2, 2019, 12:25 PM Erick Erickson 1> no. At one point, this could be done in the sense that the > collections would be reconstructed, (legacyCloud)

Re: How to access the Solr Admin GUI (2)

2019-01-02 Thread Gus Heck
If the keys line up nicely across jumps... On Wed, Jan 2, 2019, 10:49 AM Kay Wrobel In case of multiple "jumps", might I suggest the -J switch which allows > you to specify a jump host. > > Kay > > > On Jan 2, 2019, at 9:37 AM, Gus Heck wrote: > > > > I

Re: How to access the Solr Admin GUI (2)

2019-01-02 Thread Gus Heck
I typically resolve this sort of situation with a ssh proxy such as ssh -f user@123.456.789.012 -L :127.0.0.1:8983 -N Then I can access the solr GUI from localhost: on my machine, and all the traffic is secured by SSH. Pick your local port ( here) as desired of course. Sometimes I

Re: Removing words like "FONT-SIZE: 9pt; FONT-FAMILY: arial" from content

2019-01-01 Thread Gus Heck
Although Vincenzo and Alexandre's suggestions may be helpful in the right circumstances, there is a continuum of answers to the original question here. This continuum is mostly relevant if indexing and querying is likely to happen simultaneously or the data volume is large enough relative to the

Re: How to access the Solr Admin GUI

2019-01-01 Thread Gus Heck
Why would you want to expose the administration gui on the web? This is a very hazardous thing to do. Never mind that it normally also runs on 8983 and all it's functionality relies on the ability to interact with 8983 hosted api end points. What are you actually trying to solve? On Dec 31, 2018

Re: Federated / Distributed search using Solr

2018-12-29 Thread Gus Heck
If the indexes are on the same cluster, you can use an alias for querying. Updating via aliases doesn't work well (updates go to the first collection listed only) unless it's a time routed alias. http://lucene.apache.org/solr/guide/7_6/collections-api.html#createalias On Sat, Dec 29, 2018 at

Re: unsubscribe

2018-12-07 Thread Gus Heck
Not how you unsubscribe... please use the link on this page: http://lucene.apache.org/solr/community.html#mailing-lists-irc On Fri, Dec 7, 2018 at 1:47 PM John Santosuosso wrote: > Unsubscribe > > > Sent from Yahoo Mail for iPhone > > > On Friday, December 7, 2018, 9:57 AM, samuel kim > wrote:

Re: solr crashes

2018-12-05 Thread Gus Heck
3x heap is larger than usual, but significant RAM beyond heap is a good idea if you can't fit the whole index in 31 GB of memory, since the OS will cache files in ram. Note also the use of 32 GB through about 45 GB heap settings gives you LESS heap than 31 GB due to an increase in pointer sizes

Re: Time-Routed Alias Not Distributing Wrongly Placed Docs

2018-11-29 Thread Gus Heck
Hi John, TRA's really do require that you index via the alias. Internally the code is wrapping the Distributed Update Processor with an additional processor to handle the time routing when (and only when) the TRA alias is detected. If the alias is not used, none of the TRA code runs (by design,

Re: Streaming Expressions GET vs POST

2018-11-29 Thread Gus Heck
Related... https://issues.apache.org/jira/browse/SOLR-9759 On Sat, Nov 24, 2018, 3:38 PM Jan Høydahl Filed SOLR-13014 > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > > > 23. nov. 2018 kl. 16:03 skrev Jan

Re: PSA: Activate 2018 videos are now available

2018-11-28 Thread Gus Heck
I noticed some were out a few days ago, but I don't think they're all there yet (mine isn't) On Wed, Nov 28, 2018 at 12:46 PM Doug Turnbull < dturnb...@opensourceconnections.com> wrote: > Thanks Alex, and thanks to everyone who was part of organizing the > conference! > > On Wed, Nov 28, 2018 at

Re: Sort index by size

2018-11-21 Thread Gus Heck
Just as a sanity check, is this getting replicated many times, or further scaled up... it sounds like about $3.50/mo of disk space on AWS and it should all fit in ram on any decent sized server.. (i.e. any server that looks like half or quarter of a decent laptop) As a question, it's interesting

Re: Disabling jvm properties from ui

2018-11-08 Thread Gus Heck
That's an interesting feature, and it addresses X, but there are lots of ways to discover system properties. In a managed schema, enter a field name ${java.version} and you'll get a field named 1.8.0_144 (or whatever). I still think it's important to address Y they are trying to hide the system

Re: Disabling jvm properties from ui

2018-11-07 Thread Gus Heck
This sounds like an X Y problem . Why do you want to do that? Can you give more detail. What sort of information is exposed that you don't want someone to see, and who is that someone? Particularly, how is it they can use the admin UI which has the ability to delete all

Re: CloudSolrClient produces tons of CLUSTERSTATUS commands against single server in Cloud

2018-11-06 Thread Gus Heck
Tomáš, One thing that causes a clusterstatus call is alias resolution if the HttpClusterStateProvider is in use instead of the ZkClusterStateProvider. I've just been fixing spurious error messages generated by this in SOLR-12938. -Gus On Tue, Nov 6, 2018 at 1:08 PM Zimmermann, Thomas <

Re: streaming expressions substring-evaluator

2018-10-31 Thread Gus Heck
Probably ReplaceWithSubstringOperation (similar to ReplaceWithFieldOperation thought that would probably add another class be subject to https://issues.apache.org/jira/browse/SOLR-9661) On Wed, Oct 31, 2018 at 8:32 AM Joel Bernstein wrote: > I don't think there is a substring or similar

Re: Tesseract language

2018-10-21 Thread Gus Heck
Hi Martin, I wrote a framework (https://github.com/nsoft/jesterj) that is meant to help with small to medium custom solutions It's not (yet) ready for cases where you need multiple machines feeding data, but so long as a single box can do the work it should be useful. It has a basic Tika stage

Re: Type ahead functionality using complex phrase query parser

2018-08-16 Thread Gus Heck
Yes, that's a common strategy, and it's also fairly common to index two (or more) versions of the field, with different tokenizations (or not tokenized) if there is a need to perform different types of search the field. This duplication can be acheived either with in the schema,

Re: copy field

2018-07-12 Thread Gus Heck
XY question not withstanding, this is exactly the sort of thing one might want to do in their indexing pipeline. For example: https://github.com/nsoft/jesterj/blob/master/code/ingest/src/main/java/org/jesterj/ingest/processors/SimpleDateTimeReformatter.java On Thu, Jul 12, 2018 at 1:34 PM, Erick

Re: AddReplica to shard with lowest node count

2018-07-05 Thread Gus Heck
will move replicas from the most loaded nodes to the new node. That does > not take care of your use-case. Can you please open a Jira to add this > feature? > > On Thu, Jul 5, 2018 at 6:45 AM Gus Heck wrote: > > > Perhaps the rule based replica placement stuff would do the trick?

Re: AddReplica to shard with lowest node count

2018-07-04 Thread Gus Heck
Perhaps the rule based replica placement stuff would do the trick? https://lucene.apache.org/solr/guide/7_3/rule-based-replica-placement.html I haven't used it myself but I've seen lots of work going into it lately... On Wed, Jul 4, 2018 at 12:35 PM, Duncan, Adam wrote: > Hi all, > > Our team

Re: Solr - zoo with more than 1000 collections

2018-07-04 Thread Gus Heck
Hi Bertrand, Are you by any chance using the new Time Routed Aliases feature? You didn't mention it so I suspect not, but you might want to look... It's still pretty new, but it would be interesting to get your feedback on it if it looks like it would help. I'm wondering how you get to that many

Re: Indexing part of Binary Documents and not the entire contents

2018-07-04 Thread Gus Heck
You might consider using a free tool like JesterJ (www.jesterj.org) which can possibly also automate the acquisition of the documents and transmission to solr. As well as provide a framework for massaging the contents of the document in between (including Tika processing) (Disclaimer: I'm the

Re: Extracting top level URL when indexing document

2018-06-19 Thread Gus Heck
I don't understand the inclusion of 'n' in the character classes in this pattern... it's pretty clear that the broken examples in the OP were where the letter n occurred in the domain name. I expect a similar problem for user parts that contain n... ^https?://(?:[^@/n]+@)?(?:www.)?([^:/n]+) On

Re: solr query

2018-03-14 Thread Gus Heck
nitoring - Log Management - Alerting - Anomaly Detection > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > On 14 Mar 2018, at 16:55, Gus Heck <gus.h...@gmail.com> wrote: > > > > I think you can specify the current month

Re: solr query

2018-03-14 Thread Gus Heck
I think you can specify the current month with birthDate:[NOW/MONTH TO NOW/MONTH+1MONTH} does that work for you? On Wed, Mar 14, 2018 at 6:32 AM, Emir Arnautović < emir.arnauto...@sematext.com> wrote: > Actually you don’t have to add another field - there is function ms that > converts date to

Re: Adding a child doc incrementally

2018-01-17 Thread Gus Heck
If the document routing can be arranged such that the children and the parent are always co-located in the same shard, and share an identifier, the graph query can pull back the parent plus any arbitrary number of "children" that have been added at any time in any order. In this scheme "children"

Re: Spatial search - indexing WKT data

2018-01-17 Thread Gus Heck
It's been a while since I did it, but I'm pretty sure that when I indexed polygons a couple years ago, I just sent WKT text for the field value... I think i do recall some niggle where there was some slight mismatch in wkt accepted by the javascript library I wanted to use and solr. (One was

Re: Combine Results with Two different Collections.

2018-01-17 Thread Gus Heck
If you just want docs from both collections in the same results, create an alias across the 2 collections. https://lucene.apache.org/solr/guide/6_6/collections-api.html On Thu, Jan 11, 2018 at 11:12 PM, Suman Saurabh wrote: > Try using solr streaming api. >

Re: Ingestion not scaling horizontally as I add more cores to Solr

2018-01-10 Thread Gus Heck
Shashank > > On 1/10/18, 12:34 PM, "Gus Heck" <gus.h...@gmail.com> wrote: > > Ingested how? Sounds like your document sending mechanism is maxed, > not the > solr cluster... > > On Wed, Jan 10, 2018 at 2:58 PM, Shas

Re: Ingestion not scaling horizontally as I add more cores to Solr

2018-01-10 Thread Gus Heck
Ingested how? Sounds like your document sending mechanism is maxed, not the solr cluster... On Wed, Jan 10, 2018 at 2:58 PM, Shashank Pedamallu wrote: > Hi, > > > > I’m trying to find the upper thresholds of ingestion and I have tried the > following. In each of the

Re: SolrJ with Async Http Client

2018-01-05 Thread Gus Heck
s less memory usage and an > >> often better scaling. I would however expect that the main advantage > would > >> be on the server side. > >> > >> > >> On 02.01.2018 22:02, Gus Heck wrote: > >> > >>> It's not very clear (to me)

Re: SolrJ with Async Http Client

2018-01-02 Thread Gus Heck
It's not very clear (to me) what your use case is, but generally speaking, asynchronous requests can be achieved by using threads/executors/futures (java) or ajax (javascript). The link seems to be a scala project, I'm sure scala has analogous facilities. On Tue, Jan 2, 2018 at 10:31 AM, RAUNAK

Re: HOW DO I UNSUBSCRIBE FROM GROUP?

2017-10-16 Thread Gus Heck
lt;mailto:solr-user-h...@lucene.apache.org> > List-Unsubscribe: <mailto:solr-user-unsubscr...@lucene.apache.org> > List-Post: <mailto:solr-user@lucene.apache.org> > > of all messages posted to the list. > > > Original Message > > Date: Mo

Re: HOW DO I UNSUBSCRIBE FROM GROUP?

2017-10-16 Thread Gus Heck
While this has been the traditional response, and it's accurate and helpful, the user that complained about no unsubscribe link has a point. This is the normal expectation in this day and age. Maybe Apache should consider appending a "You are receiving this because you are subscribed to (list)

Re: Problem managing Solr configsets on Zookeeper

2017-02-22 Thread Gus Heck
Hi Chris, Are you perhaps using (by default) ManagedIndexSchemaFactory? https://cwiki.apache.org/confluence/display/solr/Schema+Factory+Definition+in+SolrConfig If so on first boot the schema.xml file is copied and then subsequently ignored in favor of the managed copy. If you do not wish to

Re: Best way to generate multivalue fields from streaming API

2016-09-22 Thread Gus Heck
Hi Mike, Bit late on this, but just saw it... Using streaming to ingest has occurred to me too but I think it's not really right for that except in fairly trivial cases. The very first big problem you will have in the example you give is that you won't be able to mark things as already ingested,

Re: Miserable Experience Using Solr. Again.

2016-09-14 Thread Gus Heck
While stack overflow is a great place, and the more good info that exists there, the merrier, I think Solr should have it's own complete docs, in addition to anything found on 3rd party sites. Each hop to a new location is a chance for the user to get lost, and the content on 3rd party sites could

Gradle Plugin

2016-08-19 Thread Gus Heck
The other day, I finally got around to automating solr config deployment with gradle, so now I can have this workflow: - Run a gradle task to get the current config from zookeeper, - See any changes not checked in (my ide happily lights these up if I had the files under source control)