SolrDeletionPolicy & Core Reload

2021-01-02 Thread John Davis
Hi, Does Core Reload pick up changes to SolrDeletionPolicy in solrconfig.xml or does the solr server needs to be restarted? And what would be the best way to check the current values of SolrDeletionPolicy (eg maxCommitsToKe

Blocking certain queries

2020-02-03 Thread John Davis
Hello, Is there a way to block certain queries in solr? For eg a delete for *:* or if there is a known query that causes problems, can these be blocked at the solr server layer.

Severe performance issues of Solr 6.6.0 with debug logging

2020-01-30 Thread Davis
I have recently observer severe performance issues of 1 collection, 2 shard, 4 server SolrCloud (Solr 6.6.0 running on Windows, using AdoptOpenJDK 1.8 JRE, NSSM was used to run Solr as Windows service). During recovery of a replica the network utilization of the server hosting the replica (that is

RE: Using Tesseract OCR to extract PDF files in EML file attachment

2019-10-11 Thread Davis, Daniel (NIH/NLM) [C]
Nuance and ABBYY provide OCR capabilities as well. Looking at higher level solutions, both indexengines.com and Comvault can do email remediation for legal issues. > -Original Message- > From: Retro > Sent: Friday, October 11, 2019 8:06 AM > To: solr-user@lucene.apache.org > Subject: Re

Solr Payloads

2019-09-20 Thread John Davis
We are using solr payload field and noticed the values extracted using payload() sometimes don't match the value stored in the field. Is there a lossy encoding for the payload value? fq=payload_field:*, fl=payload_field,payload(payload_field, 573131) "payload_field": "573131|*1568263581*"

Re: Enabling/disabling docValues

2019-06-11 Thread John Davis
time & resources, and if we empower power users to understand the system better it will help making more informed tradeoffs. On Tue, Jun 11, 2019 at 6:52 AM Gus Heck wrote: > On Mon, Jun 10, 2019 at 10:53 PM John Davis > wrote: > > > You have made many assumptions which might

Re: Enabling/disabling docValues

2019-06-10 Thread John Davis
ly happens…. > > Best, > Erick > > P.S. I _think_ Lucene tries to use the definition from the first segment, > but since whether the lists of segments to be merged don’t look at the > field definitions at all. Whether the first segment in the list has > SortableText or not

Re: Enabling/disabling docValues

2019-06-09 Thread John Davis
tructing low-level analysis chains. > > So I’d _strongly_ recommend you re-index your corpus to a new collection > with the current definition, then perhaps use CREATEALIAS to seamlessly > switch. > > Best, > Erick > > > On Jun 9, 2019, at 12:50 PM, John Davis >

Enabling/disabling docValues

2019-06-09 Thread John Davis
Hi there, We recently changed a field from TextField + no docValues to SortableTextField which has docValues enabled by default. Once I did this I do not see any facet values for the field. I know that once all the docs are re-indexed facets should work again, however can someone clarify the curren

Re: Solr Heap Usage

2019-06-07 Thread John Davis
gure out questions like number of shards/replicas, heap size, memory etc. > Hard data, good process and regular testing will trump guesswork every time > > Greg > > On Tue, Jun 4, 2019 at 9:22 AM John Davis > wrote: > > > You might want to test with softcommit of hours

Re: Solr Heap Usage

2019-06-04 Thread John Davis
overhead associated with it. On Tue, Jun 4, 2019 at 8:03 AM Erick Erickson wrote: > I need to update that, didn’t understand the bits about retaining internal > memory structures at the time. > > > On Jun 4, 2019, at 2:10 AM, John Davis > wrote: > > > > Erick -

Re: Solr Heap Usage

2019-06-04 Thread John Davis
settings, they’d be something like this: > Do a hard commit with openSearcher=false every 60 seconds. > Do a soft commit every 5 minutes. > > I’d actually be surprised if you were able to measure differences between > those settings and just hard commit with openSearcher=true every 60

RE: Using Solr as a Database?

2019-06-03 Thread Davis, Daniel (NIH/NLM) [C]
I think the sweet spot of Cassandra and Solr should be mentioned in this discussion. Cassandra is more scalable/clusterable than an RDBMS, without losing all of the structure that is desirable in an RDBMS. In contrast, if you use a full document store such as MongoDB, you lose some of the

Adding Multiple JSON Documents

2019-06-02 Thread John Davis
Hi there, I was looking at the solr documentation for indexing multiple documents via json and noticed inconsistency in the docs. Should the POST url be /update/*json/docs *instead of just /update. It does look like former does work, unless both will work just fine? https://lucene.apache.org/sol

Re: Solr Heap Usage

2019-06-02 Thread John Davis
see: https://issues.apache.org/jira/browse/SOLR-12962. > > In short, there’s not enough information until you dive in and test > bunches of stuff to tell. > > Best, > Erick > > > > On Jun 2, 2019, at 2:22 AM, John Davis > wrote: > > > > This makes sense

Re: Solr Heap Usage

2019-06-02 Thread John Davis
ments and does streaming merge it shouldn't matter? On Sat, Jun 1, 2019 at 9:24 AM Walter Underwood wrote: > > On May 31, 2019, at 11:27 PM, John Davis > wrote: > > > > 2. Merging segments - does solr load the entire segment in memory or > chunks > > of it? if

Solr Heap Usage

2019-05-31 Thread John Davis
I've read a bunch of the wiki's on solr heap usage and wanted to confirm my understanding of what all does solr use the heap for: 1. Indexing new documents - until committed? if not how long are the new documents kept in heap? 2. Merging segments - does solr load the entire segment in memory or c

Re: Facet count incorrect

2019-05-23 Thread John Davis
leValued or vice versa (particularly with docValues) > etc. are all “fraught”. > > My usual reply is “if you’re going to reindex everything anyway, why not > just do it to a new collection and alias when you’re done?” It’s much safer. > > Best, > Erick > > > On May 22,

Facet count incorrect

2019-05-22 Thread John Davis
Hi there - Our facet counts are incorrect for a particular field and I suspect it is because we changed the type of the field from StrField to TextField. Two questions: 1. If we do re-index all the documents in the index, would these counts get fixed? 2. Is there a "safe" way of changing field typ

Re: Optimizing fq query performance

2019-04-18 Thread John Davis
FYI https://issues.apache.org/jira/browse/SOLR-11437 https://issues.apache.org/jira/browse/SOLR-12488 On Thu, Apr 18, 2019 at 7:24 AM Shawn Heisey wrote: > On 4/17/2019 11:49 PM, John Davis wrote: > > I did a few tests with our instance solr-7.4.0 and field:* vs field:[* TO > >

Re: Optimizing fq query performance

2019-04-17 Thread John Davis
awn Heisey wrote: > On 4/17/2019 1:21 PM, John Davis wrote: > > If what you describe is the case for range query [* TO *], why would > lucene > > not optimize field:* similar way? > > I don't know. Low level lucene operation is a mystery to me. > > I have seen f

Re: Optimizing fq query performance

2019-04-17 Thread John Davis
If what you describe is the case for range query [* TO *], why would lucene not optimize field:* similar way? On Wed, Apr 17, 2019 at 10:36 AM Shawn Heisey wrote: > On 4/17/2019 10:51 AM, John Davis wrote: > > Can you clarify why field:[* TO *] is lot more efficient than field:* &

Re: Optimizing fq query performance

2019-04-17 Thread John Davis
Can you clarify why field:[* TO *] is lot more efficient than field:* On Sun, Apr 14, 2019 at 12:14 PM Shawn Heisey wrote: > On 4/13/2019 12:58 PM, John Davis wrote: > > We noticed a sizable performance degradation when we add certain fq > filters > > to the query even tho

Re: Optimizing fq query performance

2019-04-13 Thread John Davis
. > field1:* is slow in general for indexed fields because all terms for the > field need to be iterated (e.g. does term1 match doc1, does term2 match > doc1, etc) > One can optimize this by indexing a term in a different field to turn it > into a single term query (i.e. exists:field1) &

Optimizing fq query performance

2019-04-13 Thread John Davis
Hi there, We noticed a sizable performance degradation when we add certain fq filters to the query even though the result set does not change between the two queries. I would've expected solr to optimize internally by picking the most constrained fq filter first, but maybe my understanding is wron

Re: What causes new searcher to be created?

2019-03-10 Thread John Davis
at until a new searcher is created all the > > newly indexed docs will not be visible > > This should be the case. So regardless of what the admin says, _can_ > you see newly indexed documents? > > Best, > Erick > > > On Mar 9, 2019, at 7:24 PM, John Davis >

What causes new searcher to be created?

2019-03-09 Thread John Davis
Hi there, I couldn't find an answer to this in the docs: if openSearcher is set to false in the autocommit with no softcommits, what triggers a new one to be created? My assumption is that until a new searcher is created all the newly indexed docs will not be visible. Based on the solr admin consol

RE: Load balance writes

2019-02-11 Thread Davis, Daniel (NIH/NLM) [C]
I think that the container orchestration framework takes care of that for you, but I am not an expert. In Kubernetes, NGINX is often the Ingress controller, and as long as the services are running within the Kubernetes cluster, it can also serve as a load balancer, AFAICT. In Kubernetes, a "L

RE: what are the best client interface ?

2019-01-11 Thread Davis, Daniel (NIH/NLM) [C]
WordPress and Drupal both have ways to interface with Solr through plugins/modules. Not sure that describes your PHP website. I like Ruby on Rails "projectblacklight" for an easy and usable discovery layer. We are a Python/Django shop - we've had good luck with Django-haystack and pysolr. >

RE: [solr-solrcloud] How does DIH work when there are multiple nodes?

2019-01-04 Thread Davis, Daniel (NIH/NLM) [C]
DIH is also not designed to multi-thread very well. One way I've handled this is to have a DIH XML that breaks-up a database query into multiple processes by taking the modulo of a row, as follows: This allows me to do sub-queries within the entity, but it is often better to just write

RE: Solr OCR Support

2018-11-02 Thread Davis, Daniel (NIH/NLM) [C]
I think that you also have to process a PDF pretty deeply to decide if you want it to be OCR. I have worked on projects where all of the PDFs are really like faxes - images are encoded in JBIG2 black and white or similar, and there is really one image per page, and no text. I have also worke

RE: Solr cluster tuning

2018-10-24 Thread Davis, Daniel (NIH/NLM) [C]
Usually, responses are due to I/O waits getting the data off of the disk. So, to me, this seems more likely because as you bombard the server with queries, you cause more and more of the data needed to answer the query into memory. To verify this, I'd bombard your server with queries to warm i

RE: Securying ONLY the web interface console

2018-10-22 Thread Davis, Daniel (NIH/NLM) [C]
I think that it is not really Solr's job to solve this. I'm sure that there are many Java ways to solve this with Jetty configuration of JAAS, but the *safest* ways involve ports and rights. In other words, port 8983 and zookeeper ports are then for Solr nodes to communicate with each other

RE: How to restrict solr 7.4 to use TLS 1.2 only?

2018-10-10 Thread Davis, Daniel (NIH/NLM) [C]
Best Option - Put a load balancer/distributor in front of it. Other Option - Edit jetty.xml.Solr uses Jetty, and so the key is in the HTTPConfiguration for jetty. This file, in my installation is in solr-X.Y.Z/etc/jetty.xml There is some documentation at https://www.eclipse.org/jetty/java

Ignored fields and copyfield

2018-08-06 Thread John Davis
Hi there, If a field is set as "ignored" (indexed=false, stored=false) can it be used for another field as part of copyfield directive which might index/store it. John

Index size by document fields

2018-08-04 Thread John Davis
Hi, Is there a way to monitor the size of the index broken by individual fields across documents? I understand there are different parts - the inverted index and the stored fields - and an estimate would be good start. Thanks John

RE: Remove schema.xml in favor of managed-schema

2018-06-19 Thread Davis, Daniel (NIH/NLM) [C]
Elastic allows the mappings to be set all at once, either in the template or as index settings. That is an important feature because it allows the field definitions to be source code artifacts, which can be deployed very easily by an automatic script. Solr's Managed Schema API allows multiple

Re: Sort by payload value

2018-05-25 Thread John Davis
ts of the payload > calcs. > > FYI, ties are broken by the internal Lucene doc ID. If the theory that > you are getting > no matches, then your sort order is determined by this value which you > don't really > have much access to. > > Best, > Erick > > On

Sort by payload value

2018-05-24 Thread John Davis
Hello, We are trying to use payload values as described in [1] and are running into issues when issuing *sort by* payload value. Would appreciate any pointers to what we might be doing wrong. We are running solr 6.6.0. * Here's the payload value definition:

RE: Some performance questions....

2018-03-16 Thread Davis, Daniel (NIH/NLM) [C]
Deepak, A better test of multi-user support might be to vary the queries and try to simulate a realistic 'working set' of search data. I've made this same performance analysis mistake with the search index of www.indexengines.com, which I developed (in part). Somewhat different from Lucene,

RE: Resend: Authorization on 6.6.0

2018-03-13 Thread Davis, Daniel (NIH/NLM) [C]
I believe that Joe needs to be given some level of access for him to be able to see the collections, and joe should always be required to give his/her/its password to access any collection. -Original Message- From: Terry Steichen [mailto:te...@net-frame.com] Sent: Monday, March 12, 2018

RE: CDCR performance issues

2018-03-09 Thread Davis, Daniel (NIH/NLM) [C]
These are general guidelines, I've done loads of networking, but may be less familiar with SolrCloud and CDCR architecture. However, I know it's all TCP sockets, so general guidelines do apply. Check the round-trip time between the data centers using ping or TCP ping. Throughput tests may b

RE: SolrCloud: How best to do backups?

2018-02-08 Thread Davis, Daniel (NIH/NLM) [C]
I would suggest you have a separate EBS to save the backup from each server. These EBS volumes would be mounted all the time, but only modified by a backup. Then, you can create an AWS Lambda function that runs on a periodic trigger from CloudWatch, and does the following: - run the backu

Solr needs a restart to recover from "No space left on device"

2018-02-06 Thread John Davis
Hi there! We ran out of disk on our solr instance. However even after cleaning up the disk solr server did not realize that there is free disk available. It only got fixed after a restart. Is this a known issue? Or are there workarounds that don't require a restart? Thanks John

RE: Fusion or DIY w/Solr?

2018-02-06 Thread Davis, Daniel (NIH/NLM) [C]
Norconex filesystem collector should be able to handle XML output by Sovren very flexibly. I am a big fan. You can use a DOMSplitter to split a single large XML document into multiple smaller ones. I started with Norconex because I found Heritrix a bit of a pain to configure, as it is more

Matching within list fields

2018-01-29 Thread John Davis
Hi there! We have a use case where we'd like to search within a list field, however the search should not match across different elements in the list field -- all terms should match a single element in the list. For eg if the field is a list of comments on a product, search should be able to find

RE: SolrCloud installation troubles...

2018-01-29 Thread Davis, Daniel (NIH/NLM) [C]
Trying 127.0.0.1 could help. We kind of tend to think localhost is always 127.0.0.1, but I've seen localhost start to resolve to ::1, the IPv6 equivalent of 127.0.0.1. I guess some environments can be strict enough to restrict communication on localhost; seems hard to imagine, but it does hap

RE: SolrCloud installation troubles...

2018-01-29 Thread Davis, Daniel (NIH/NLM) [C]
To expand on that answer, you have to wonder what ports are open in the server system's port-based firewall.I have to ask my systems team to open ports for everything I'm using, especially when I move from localhost to outside. You should be able to "fake it out" if you set up your zookeeper

RE: Profanity

2018-01-08 Thread Davis, Daniel (NIH/NLM) [C]
Fun topic. Same complicated issues as normal search: Multilingual support?Is "Merde" profanity too, or just in French. Multi-word synonyms? Does "God Damn" becomes "goddamn", or do you treat "Damn" and "God damn" the same because you drop "God" "Me

Re: SolrCloud

2017-12-15 Thread John Davis
== > new_collection, basically all your routing is the same. You can create > aliases pointing to multiple collections or specify multiple > collections on the query, don't know if that fits your use case or not > though. > > > Best, > Erick > > On Fri, Dec 15, 2017 a

SolrCloud

2017-12-15 Thread John Davis
Hello, We are thinking about migrating to SolrCloud. Our current setup is: 1. Multiple replicas and shards. 2. Each query typically hits a single shard only. 3. We have an external system that assigns a document to a shard based on it's origin and is also used by solr clients when querying to find

Solr index size statistics

2017-12-02 Thread John Davis
Hello, Is there a way to get index size statistics for a given solr instance? For eg broken by each field stored or indexed. The only things I know of is running du on the index data files and getting counts per field indexed/stored, however each field can be quite different wrt size. Thanks John

RE: Anyone have any comments on current solr monitoring favorites?

2017-11-06 Thread Davis, Daniel (NIH/NLM) [C]
I have used Java Melody for this purpose on past Java based servers, but I haven't tried to embed it in Jetty. -Original Message- From: Petersen, Robert (Contr) [mailto:robert.peters...@ftr.com] Sent: Monday, November 06, 2017 4:50 PM To: solr-user@lucene.apache.org Subject: Re: Anyone h

Re: Facets based on sampling

2017-10-24 Thread John Davis
analysed field to multivalue string field - that > requires changes in indexing flow. > > > > HTH, > > Emir > > -- > > Monitoring - Log Management - Alerting - Anomaly Detection > > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > >

Re: Facets based on sampling

2017-10-23 Thread John Davis
Docvalues don't work for multivalued fields. I just started a separate thread with more debug info. It is a bit surprising why facet computation is so slow even when the query matches hundreds of docs. On Mon, Oct 23, 2017 at 6:53 AM, alessandro.benedetti wrote: > Hi John, > first of all, I may

Really slow facet performance in 6.6

2017-10-23 Thread John Davis
Hello, We are seeing really slow facet performance with new solr release. This is on an index of 2M documents. A few things we've tried: 1. method=uif however that didn't help much (the facet fields have docValues=false since they are multi-valued). Debug info below. 2. changing query (q=) that

Re: Facets based on sampling

2017-10-20 Thread John Davis
een on my TODO list for the JSON Facet API. > How much it would help depends on where the bottlenecks are, but that > in conjunction with a hashing approach to collection (assuming field > cardinality is high) should definitely help. > > -Yonik > > > On Fri, Nov 4, 2016 at 3

Schemaless detecting multivalued fields

2017-10-18 Thread John Davis
Hi, I know about the schemaless configuration defaulting to multivalued fields of the corresponding type. I was just wondering if there was a way to first detect if the incoming value is list or singleton, and based on it pick the corresponding types. Ideally if the value is an long then use tlong

RE: AEM SOLR integaration

2017-09-22 Thread Davis, Daniel (NIH/NLM) [C]
Gunalan, I think this depends on your system environment. It is a general "service discovery" issue. On-premise, my organization uses f5 BigIP as a load balancer, and so we merely have f5 LTM direct traffic from one name to any of a number of Solr instances. If they are all SolrCloud, it

RE: Customizing JSON response of a query

2017-09-07 Thread Davis, Daniel (NIH/NLM) [C]
ar I am not having any trouble querying >children/parent document since I have all of this stored using fully >qualified names in each document in the collection. > > > > > >Regards, > >Sarvo > > > >On Wed, Sep 6, 2017 at 3:52 PM, Rick Leir wrote: > >

RE: Customizing JSON response of a query

2017-09-06 Thread Davis, Daniel (NIH/NLM) [C]
It should be possible with a custom response handler. -Original Message- From: Sarvothaman Madhavan [mailto:relad...@gmail.com] Sent: Wednesday, September 06, 2017 10:17 AM To: solr-user@lucene.apache.org Subject: Customizing JSON response of a query Hello all, After a week of research I

RE: "What is Solr" in Google search results

2017-08-31 Thread Davis, Daniel (NIH/NLM) [C]
Wikipedia seems to be better now. Thank you, Peaceray. Honestly, though, by the numbers, I think the comment was correct. Elasticsearch has a much smoother on-ramp for IT developers, but it is much harder to customize relevancy and integrate with BigData pipelines. IT developers are the

RE: Solr config upgrade tool

2017-08-11 Thread Davis, Daniel (NIH/NLM) [C]
Hrishikesh Gadre, I'm interested in how that might integrate with continuous integration. I briefly worked on a tool to try a configuration out with SolrCloud, e.g. upload the config, create a collection, run some stuff, test some stuff. I got the first two working, but not the "run some stuf

RE: Need guidance solrcloud shardings with date interval

2017-07-20 Thread Davis, Daniel (NIH/NLM) [C]
ething similar is to limit the # of shards/replicas used for date-specific collections. Hope this helps, Dan Davis, Systems/Applications Architect (Contractor), Office of Computer and Communications Systems, National Library of Medicine, NIH -Original Message- From: Walter Underwood

RE: How are people using the ICUTokenizer?

2017-06-20 Thread Davis, Daniel (NIH/NLM) [C]
? Really curious about if this would to return the "interesting Phrases" On Tue, Jun 20, 2017 at 12:01 PM, Davis, Daniel (NIH/NLM) [C] < daniel.da...@nih.gov> wrote: > Joel, > > I think the issue is doing word-breaking according to ICU rules. So, if > you are trying to

RE: How are people using the ICUTokenizer?

2017-06-20 Thread Davis, Daniel (NIH/NLM) [C]
Joel, I think the issue is doing word-breaking according to ICU rules. So, if you are trying to make sure your index breaks words properly on eastern languages, just use ICU Tokenizer. Unless your text is already in an ICU normal form, you should always use the ICUNormalizer character filte

RE: Solr in NAS or Network Shared Drive

2017-05-19 Thread Davis, Daniel (NIH/NLM) [C]
uot;read only"/"listen" state to do no writing to the index, but keep referencing the index properties/version files. On Fri, May 19, 2017 at 1:26 PM, Davis, Daniel (NIH/NLM) [C] < daniel.da...@nih.gov> wrote: > Better off to just do Replication to the slave using th

RE: Solr in NAS or Network Shared Drive

2017-05-19 Thread Davis, Daniel (NIH/NLM) [C]
Better off to just do Replication to the slave using the replication handler. However, if there is no network connectivity, e.g. this is an offsite cold/warm spare, then here is a solution: The NAS likely supports some Copy-on-write/snapshotting capabilities. If your systems people will work

RE: Solr Query Performance benchmarking

2017-04-28 Thread Davis, Daniel (NIH/NLM) [C]
percentiles are $pct95" echo `date` ": full results are in ${test}" wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Apr 28, 2017, at 12:00 PM, Davis, Daniel (NIH/NLM) [C] > wrote: > > Walter, > > If you can s

RE: Solr Query Performance benchmarking

2017-04-28 Thread Davis, Daniel (NIH/NLM) [C]
Walter, If you can share a pointer to that JMeter add-on, I'd love it. -Original Message- From: Walter Underwood [mailto:wun...@wunderwood.org] Sent: Friday, April 28, 2017 2:53 PM To: solr-user@lucene.apache.org Subject: Re: Solr Query Performance benchmarking I use production logs to

RE: Import Handler using shell scripts

2017-04-28 Thread Davis, Daniel (NIH/NLM) [C]
Attached is a Python script I use, with slight redactions, on several data import jobs. The main points here are: * Watch the job until the import finishes * Always send email whether it succeeds or fails * Put the hostname, and whether it was a success, in the subject for quick removal * Alway

RE: Poll: Master-Slave or SolrCloud?

2017-04-28 Thread Davis, Daniel (NIH/NLM) [C]
I am also very surprised. Even though I am no longer using my solr-config-tool, the main thing I like about SolrCloud is how easy it is to bring up a new collection and set up the schema and fields that you want. I also like that I don't need to manage replication in the solr configuration.

RE: Does DIH queues up requests

2017-01-25 Thread Davis, Daniel (NIH/NLM) [C]
DIH is not multi-threaded, and so the idea of "queueing" up requests is a misnomer. You might be better off using something other than DataImportHandler. LogStash can pull what it calls "events" from a database and then push them into Solr, and you have some of the same row transformation capa

Re: Empty facets on TextField

2017-01-06 Thread John Davis
On Tue, Oct 18, 2016 at 11:02 PM, Yonik Seeley wrote: > > A delete-by-query of *:* may do it (because it special cases to > > removing the index). > > The underlying issue is when lucene merges a segment without docvalues > > with a segment that has them. > > -Yo

RE: Data Import Request Handler isolated into its own project - any suggestions?

2016-11-18 Thread Davis, Daniel (NIH/NLM) [C]
t this can often be handled on the RDBMS side by creating a view, maybe using functions to provide some rows. Many RDBMS systems also support federation and the import of XML from files, so that this brings XML processing into the picture. Hoping this helps, Dan Davis, Systems/Applications

RE: How to stop long running/memory eating query

2016-11-17 Thread Davis, Daniel (NIH/NLM) [C]
Mikhail, If the query is not asynchronous, it would certainly be OK to stop the long-running query if the client socket is disconnected. I know that is a feature of the niche indexer used in the products of www.indexengines.com, because I wrote it. We did not have asynchronous queries, and

RE: Multi word synonyms

2016-11-15 Thread Davis, Daniel (NIH/NLM) [C]
kenFilter is implemented and the rate of indexing you need to maintain. Depending on your experience, you can do this even if you are new to Solr, as you've mentioned. -Original Message- From: Davis, Daniel (NIH/NLM) [C] Sent: Tuesday, November 15, 2016 10:22 AM To: solr-user@lucene.apa

RE: Multi word synonyms

2016-11-15 Thread Davis, Daniel (NIH/NLM) [C]
I'm not as expert as some on this list, but reading the article suggested, https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/, what you do is this: - Have one field that takes text as normal - Copy that field to another fiel

Measuring the entropy of a field

2016-11-15 Thread Davis, Daniel (NIH/NLM) [C]
. I know more at this point about tackling this stuff from Python. Dan Davis, Systems/Applications Architect (Contractor), Office of Computer and Communications Systems, National Library of Medicine, NIH

Facets based on sampling

2016-11-04 Thread John Davis
Hi, I am trying to improve the performance of queries with facets. I understand that for queries with high facet cardinality and large number results the current facet computation algorithms can be slow as they are trying to loop across all docs and facet values. Does there exist an option to comp

RE: Apache Solr Question

2016-11-03 Thread Davis, Daniel (NIH/NLM) [C]
Case in point - https://collections.nlm.nih.gov/ has one index (core) for documents and another index (core) for pages within the documents. I think LOC (Library of Congress) does something similar from a presentation they gave at Lucene/DC Exchange. -Original Message- From: Doug Turnbul

RE: PDF writer

2016-10-21 Thread Davis, Daniel (NIH/NLM) [C]
If the PDF report is truly a report, I agree with this. We have a use-case with IBM InfoSphere Watson Explorer where our users want a PDF report on the results for their query to be generated on the fly. They can then save the query and have the report emailed to them :) Not only is Solr m

RE: Solr with logstash solr_http output plugin and geoip filter

2016-10-21 Thread Davis, Daniel (NIH/NLM) [C]
Don Tavoletti, I'm not sure you mean "me" by Daniel, despite that being my name. There is a LogStash output plugin to output to Solr: https://www.elastic.co/guide/en/logstash/current/plugins-outputs-solr_http.html For really simple use cases, there is also a LogStash input plugin for JDBC: htt

Re: Empty facets on TextField

2016-10-18 Thread John Davis
l exist in the index for this > field (just with no values), and that normal faceting would use those. > Forcing facet.method=enum forces the use of the index instead of > docvalues (or the fieldcache if the field is configured w/o > docvalues). > > -Yonik > > On Tue, Oct 18

Empty facets on TextField

2016-10-18 Thread John Davis
Hi, I have converted one of my fields from StrField to TextField and am not getting back any facets for that field. Here's the exact configuration of the TextField. I have tested it with 6.2.0 on a fresh instance and it repros consistently. From reading through past archives and documentation, it

RE: Solr and Drupal

2016-08-09 Thread Davis, Daniel (NIH/NLM) [C]
John/Rose, With Drupal 7, the module John pointed to was the module to use. With Drupal 8, I have no idea. -Original Message- From: John Bickerstaff [mailto:j...@johnbickerstaff.com] Sent: Tuesday, August 09, 2016 2:38 PM To: solr-user@lucene.apache.org Subject: Re: Solr and Drupal Rose

RE: Installing Solr with Ivy

2016-08-03 Thread Davis, Daniel (NIH/NLM) [C]
an unreasonable amount of load on the archive servers. I'd still love in theory to find a solution that's a little more future-proof than "build a URL and download from it," but for now, I think this will get me through. Thanks again! - Demian -----Original Message---

RE: Installing Solr with Ivy

2016-08-02 Thread Davis, Daniel (NIH/NLM) [C]
Yet it is very common to do that with database servers, and in fact doing this is a common way to avoid siloed applications.Unfortunately, HTTP auth is not quite good enough for me; but it is only my own fault I haven't contributed something more. Dan Davis, Systems/Applications A

RE: Access Solr via Apache's mod_proxy_balancer or mod_jk (AJP)

2016-07-06 Thread Davis, Daniel (NIH/NLM) [C]
Again I have to insert the larger company view: * if your company is largish, you may have a load balancer hardware already in use by systems. * If you are using a Cloud system for the Solr, then you can probably use a load balancer provided by the cloud provider, and this may be cheaper th

RE: deploy solr on cloud providers

2016-07-05 Thread Davis, Daniel (NIH/NLM) [C]
storage guys to just measure storage - you will want to care about three things for indexing primarily: Sequential Write Throughput Random Read Throughput Random Read Response Time/Latency Hope this helps, Dan Davis, Systems/Applications Architect (Contractor), Office

RE: Access Solr via Apache's mod_proxy_balancer or mod_jk (AJP)

2016-07-05 Thread Davis, Daniel (NIH/NLM) [C]
very commonly used for this sort of thing – and if you also have other things behind the balancer (such as WordPress or Drupal), then varnish is becoming a better a way to go. Hope this helps, Dan Davis, Systems/Applications Architect (Contractor), Office of Computer and Communications Systems

RE: Regarding CDCR SOLR 6

2016-06-14 Thread Davis, Daniel (NIH/NLM) [C]
I must chime in to clarify something - in case 2, would the source cluster eventually start a log reader on its own? That is, would the CDCR heal over time, or would manual action be required? -Original Message- From: Renaud Delbru [mailto:renaud@siren.solutions] Sent: Tuesday, June 1

RE: Help: Lucidwork Fusion documentation

2016-06-02 Thread Davis, Daniel (NIH/NLM) [C]
Is the Solr Reference Guide what you are looking for? https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/apache-solr-ref-guide-6.0.pdf I don't know how to find older versions. From: Aman Tandon [amantandon...@gmail.com] Sent: Thursday, June 02, 2

RE: Suspicious message with attachment

2016-05-16 Thread Davis, Daniel (NIH/NLM) [C]
I know the sender, he often posts to this list, and I don't download attachments until I've vetted them anyway. -Original Message- From: postmas...@ssww.com [mailto:postmas...@ssww.com] On Behalf Of h...@ssww.com Sent: Monday, May 16, 2016 11:54 AM To: solr-user@lucene.apache.org Subject

RE: Using Ping Request Handler in SolrCloud within a load balancer

2016-05-12 Thread Davis, Daniel (NIH/NLM) [C]
Shawn, that's a great idea for how to integrate f5 with Solr. I'd thought about having Apache httpd in-front of Solr, but I suppose I could just have f5 BigIP on its own. -Original Message- From: Sandy Foley [mailto:sandy.fo...@verndale.com] Sent: Thursday, May 12, 2016 2:38 PM To: so

RE: Using updateRequest Processor with DIH

2016-05-02 Thread Davis, Daniel (NIH/NLM) [C]
I don't know whether that works; but you can use the ScriptTransformer with DIH to achieve similar results. I've only used JavaScript (Rhino) scripts, but they worked for me. More recently, I've found that most of my transformations can be accomplished with the TemplateTransformer. -Orig

RE: Solr 5.2.1 on Java 8 GC

2016-04-30 Thread Davis, Daniel (NIH/NLM) [C]
Bram, on the subject of brute force - if your script is "clever" and uses binary first search, I'd love to adapt it to my environment. I am trying to build a truly multi-tenant Solr because each of our indexes is tiny, but all together they will eventually be big, and so I'll have to repeat thi

RE: dataimport db-data-config.xml

2016-04-29 Thread Davis, Daniel (NIH/NLM) [C]
Kishor, Data Import Handler doesn't know how to randomly access rows from the CSV to "JOIN" them to rows from the MySQL table at indexing time. However, both MySQL and Solr know how to JOIN rows/documents from multiple tables/collections/cores. Data Import Handler could read the CSV first, and

Remedial Map-Reduce logic

2016-04-20 Thread Davis, Daniel (NIH/NLM) [C]
ed me to stuff I should have read a long, long time ago: http://dl.acm.org/citation.cfm?doid=1629175.1629197 So, yes, Solr providing Streaming Expressions and JDBC is a powerful good thing. Dan Davis, Systems/Applications Architect (Contractor), Office of Computer and Communications Systems, Nat

RE: Streaming with facets

2016-04-19 Thread Davis, Daniel (NIH/NLM) [C]
ated on-the-fly as facet buckets are being streamed). -Yonik On Tue, Apr 19, 2016 at 4:48 PM, Davis, Daniel (NIH/NLM) [C] wrote: > So, can someone clarify how faceting works with streaming expressions? > > I can see how document search can return documents as it finds them, using &

  1   2   3   >