Re: Is there a way to force content extraction with a given encoding

2019-11-07 Thread Jörn Franke
I would convert them to UTF-8 before posting and use UTF-8 in your application. Most of the web and applications use UTF-8. If you use other encodings you will always run into problems. > Am 08.11.2019 um 07:47 schrieb lala : > > I am using the /update/extract request handler to push

Re: Using solr API to return csv results

2019-11-07 Thread Paras Lehana
Hi Rhys, There's already a JIRA for this: https://issues.apache.org/jira/browse/SOLR-2731. You can comment on the ticket. I also recommend you to read about /export handler. On Fri, 8 Nov 2019 at 01:39, rhys J wrote: > If I

Is there a way to force content extraction with a given encoding

2019-11-07 Thread lala
I am using the /update/extract request handler to push documents into solr, but some text documents, that are encoded as windows-1255 (arabic texts) are not extracted properly, the text given is not readable. I searched in the web, and solr documentation and found nothing. I need to send the file

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-07 Thread Paras Lehana
Hi Guilherme By accident, I ended up querying the using the default handler (/select) > and it worked. You've just found the culprit. Thanks for giving the material I requested. Your analysis chain is working as expected. I don't see any issue in either StopWordFilter or your boosts. I also use

Re: ConcurrentModificationException in SolrInputDocument writeMap

2019-11-07 Thread Shawn Heisey
On 11/6/2019 8:17 AM, Tim Swetland wrote: I'm currently running into a ConcurrentModificationException ingesting data as we attempt to upgrade from Solr 8.1 to 8.2. It's not every document, but it definitely appears regularly in our logs. We didn't run into this problem in 8.1, so I'm not sure

Re: ConcurrentModificationException in SolrInputDocument writeMap

2019-11-07 Thread Edward Ribeiro
You probably hit https://issues.apache.org/jira/projects/SOLR/issues/SOLR-8028 Regards, Edward Em qua, 6 de nov de 2019 13:23, Mikhail Khludnev escreveu: > Hello, Tim. > Please confirm my understanding. Does exception happens in standalone Java > ingesting app? > If, it's so, Does it reuse

Re: Cursor mark page duplicates

2019-11-07 Thread Chris Hostetter
: I'm using Solr's cursor mark feature and noticing duplicates when paging : through results. The duplicate records happen intermittently and appear : at the end of one page, and the beginning of the next (but not on all : pages through the results). So if rows=20 the duplicate records would

Re: Good Open Source Front End for Solr

2019-11-07 Thread A Adel
It depends on the use case. There are several front-ends that works with Solr, each one has its own use cases and vary in how integrative it is. Banana (https://github.com/lucidworks/banana) is a visualization frontend that works only with Solr. It allows creating interactive, real-time dashboards

Using solr API to return csv results

2019-11-07 Thread rhys J
If I am using the Solr API to query the core, is there a way to tell how many documents are found if i use wt=CSV? Thanks, Rhys

Re: Good Open Source Front End for Solr

2019-11-07 Thread David Hastings
well thats pretty slick On Thu, Nov 7, 2019 at 1:59 PM Erik Hatcher wrote: > Blacklight: http://projectblacklight.org/ > > ;) > > > > > On Nov 6, 2019, at 11:16 PM, Java Developer > wrote: > > > > Hi, > > > > What is the best open source front-end for Solr >

Re: Good Open Source Front End for Solr

2019-11-07 Thread Erik Hatcher
Blacklight: http://projectblacklight.org/ ;) > On Nov 6, 2019, at 11:16 PM, Java Developer wrote: > > Hi, > > What is the best open source front-end for Solr > > Thanks

Re: Solr healthcheck fails all the time

2019-11-07 Thread Houston Putman
Hello, Could you provide some more information about your cloud, for example: - The number of requests that it handles per minute - How much data you are indexing - If there is any memory pressure The ping handler merely sends a query to the collection and makes sure that it responds

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-07 Thread Walter Underwood
I normally use a weight of 8 for the most important field, like title. Other fields might get a 4 or 2. I add a “pf” field with the weights doubled, so that phrase matches have a higher weight. The weight of 8 comes from experience at Infoseek and Inktomi, two early web search engines. With

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-07 Thread Guilherme Viteri
Hi Wunder, My indexer takes quite a few hours to be executed I am shortening it to run faster, but I also need to make sure it gives what we are expecting. This implementation's been there for >4y, and massively used. > In your edismax handlers, weights of 20, 50, and 100 are extremely high. I

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-07 Thread David Hastings
Ha, funny enough i still use qf/pf boosts starting at 100 and go down, gives me room to add boosting to more fields but not equal. maybe excessive but haven't noticed a performance issue On Thu, Nov 7, 2019 at 9:44 AM Walter Underwood wrote: > Thanks for posting the files. Looking at

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-07 Thread Walter Underwood
Thanks for posting the files. Looking at schema.xml, I see that you still are using StopFilterFactory. The first advice we gave you was to remove that. Remove StopFilterFactory everywhere and reindex. You will continue to have problems matching stopwords until you do that. In your edismax

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-07 Thread Guilherme Viteri
Hi Paras, everyone Thank you again for your inputs and suggestions. I sorry to hear you had trouble with the attachments I will host it somewhere and share the links. I don't tweak my index, I get the data from the graph database, create a document as they are and save to solr. So, I am

Re: Cursor mark page duplicates

2019-11-07 Thread Erick Erickson
Dwane: Nice writeup. This is puzzling. First, theoretically the two replicas shouldn’t have any effect. Shawn’e comment was more that somehow two _different_ shards had a duplicate ID. Do both replicas have exactly the same document count? You can find this out by

Re: Query regarding truncated Date Sort

2019-11-07 Thread Erick Erickson
The easiest and most efficient would be to store the date (or a copy) at day resolution and sort on that field instead. > On Nov 7, 2019, at 3:00 AM, Paras Lehana wrote: > > Hi Inderjeet, > > Wouldn't sorting on the default format will yield documents date-wise > sorted? The time won't impact

Cursor mark page duplicates

2019-11-07 Thread Dwane Hall
Hey Solr community, I'm using Solr's cursor mark feature and noticing duplicates when paging through results. The duplicate records happen intermittently and appear at the end of one page, and the beginning of the next (but not on all pages through the results). So if rows=20 the duplicate

Re: Query regarding truncated Date Sort

2019-11-07 Thread Paras Lehana
Hi Inderjeet, Wouldn't sorting on the default format will yield documents date-wise sorted? The time won't impact the date order or do you have different timezones also? On Thu, 7 Nov 2019 at 12:52, Inderjeet Singh wrote: > Hi > > I am currently using solr 7.1.0. I have indexed a few