RE: Unstemmed searching

2015-02-27 Thread Markus Jelsma
Hello Robert. Unstemmed terms have slightly higher IDF so they gain more weight, but stemmed tokens usually have slightly higher TF, so differences are marginal at best, especially when using standard TFIDFSimilarity. However, by setting a payload for stemmed terms, you can recognize them at sea

RE: Integrating Solr with Nutch

2015-03-01 Thread Markus Jelsma
Hello Baruch! You are not pointing to a directory of segments, not a specific segment. You must either point to a directory with the -dir option: bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb -linkdb crawl/linkdb -dir crawl/segments/ Or point to a segment: bin/nutch solrin

RE: Log numfound, qtime, ...

2015-03-04 Thread Markus Jelsma
Hello - This patch may be more straightforward https://issues.apache.org/jira/browse/SOLR-4018 -Original message- > From:Ahmed Adel > Sent: Wednesday 4th March 2015 19:39 > To: solr-user@lucene.apache.org > Subject: Re: Log numfound, qtime, ... > > Hi, I believe a better approach than

RE: Cores and and ranking (search quality)

2015-03-05 Thread Markus Jelsma
Hello - facetting will be the same and distributed more like this is also possible since 5.0, and there is a working patch for 4.10.3. Regular search will work as well since 5.0 because of distributed IDF, which you need to enable manually. Behaviour will not be the same if you rely on average d

RE: Delimited payloads input issue

2015-03-06 Thread Markus Jelsma
Well, the only work-around we found to actually work properly is to override the problem causing tokenizer implementations on by one. Regarding the WordDelimiterFilter, the quickest fix is enabling keepOriginal, if you don't want the original to stick around, the filter implementation must be mo

4.10.4 - nodes up, shard without leader

2015-03-08 Thread Markus Jelsma
Hello - i stumbled upon an issue i've never seen earlier, a shard with all nodes up and running but no leader. This is on 4.10.4. One of the two nodes emits the following error log entry: 2015-03-08 05:25:49,095 WARN [solr.cloud.ElectionContext] - [Thread-136] - : cancelElection did not find el

RE: backport Heliosearch features to Solr

2015-03-09 Thread Markus Jelsma
Ok, so what's next? Do you intend to open issues and send the links over here so interested persons can follow them? Clearly some would like to see features to merge. Let's see what the PMC thinks about it :) Cheers, M. -Original message- > From:Yonik Seeley > Sent: Monday 9th March

RE: Delimited payloads input issue

2015-03-11 Thread Markus Jelsma
gt; state when introducing a new token than to clear it! > > ~ David Smiley > Freelance Apache Lucene/Solr Search Consultant/Developer > http://www.linkedin.com/in/davidwsmiley > > On Fri, Mar 6, 2015 at 1:16 PM, Markus Jelsma > wrote: > > > Well, the o

RE: SSD endurance

2015-03-12 Thread Markus Jelsma
Thanks for sharing Toke! Reliability should not be a problem for a Solr cloud environment. A corrupted index cannot be loaded due to exceptions so the core should not enter an active state. However, what would happen if parts of the data become corrupted but can still be processed by the codec

RE: [Poll]: User need for Solr security

2015-03-12 Thread Markus Jelsma
Jan - we don't really need any security for our products, nor for most clients. However, one client does deal with very sensitive data so we proposed to encrypt the transfer of data and the data on disk through a Lucene Directory. It won't fill all gaps but it would adhere to such a client's gui

RE: SSD endurance

2015-03-12 Thread Markus Jelsma
--- > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: > http://www.solr-start.com/ > > > On 12 March 2015 at 18:39, Markus Jelsma wrote: > > Thanks for sharing Toke! > > > > Reliability should not be a problem for a Solr cloud environment. A >

RE: backport Heliosearch features to Solr

2015-03-12 Thread Markus Jelsma
t; > SOLR-7216 JSON Request API > > > > Regards, > >Alex. > > P.s. Oh, the power of GMail filters :-) > > > > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: > > http://www.solr-start.com/ > > > > > > On 9 Ma

RE: Relevancy : Keyword stuffing

2015-03-16 Thread Markus Jelsma
Hello - setting (e)dismax' tie breaker to 0 or much low than default would `solve` this for now. Markus -Original message- > From:Mihran Shahinian > Sent: Monday 16th March 2015 16:29 > To: solr-user@lucene.apache.org > Subject: Relevancy : Keyword stuffing > > Hi all, > I have a use

RE: Relevancy : Keyword stuffing

2015-03-16 Thread Markus Jelsma
Hello - Chris' suggestion is indeed a good one but it can be tricky to properly configure the parameters. Regarding position information, you can override dismax to have it use SpanFirstQuery. It allows for setting strict boundaries from the front of the document to a given position. You can als

RE: Distributed IDF performance

2015-03-18 Thread Markus Jelsma
Anshum, Jack - don't any of you have a cluster at hand to get some real results on this? After testing the actual functionality for a quite some time while the final patch was in development, we have not had the change to work on performance tests. We are still on Solr 4.10 and have to port lots

Re: Error trying to index files to Solr

2015-03-23 Thread Markus Jelsma
Hello Majisha, Nutch' Solr indexing plugin has support for stripping non-utf8 character codepoints from the input, but it does so only on the content field if i remember correctly. However, that stripping method was not built with the invalid middle byte exception in mind, and i have not seen

RE: Creating facets based on the content field

2015-03-23 Thread Markus Jelsma
Hi - trying to extract entities for facets or whatever using IDF is bad at best. MLT works well because of scoring, not for entity extraction, because it doesnt extract entities. The OpenNLP Lucene filters do what you need, but it depends on the model you built. The freely available maxent model

RE: Read or Capture Solr Logs

2015-03-24 Thread Markus Jelsma
Hello, you can either process the logs, or make a simple SearchComponent implementation that reads SolrQueryRequest. Markus -Original message- > From:Nitin Solanki > Sent: Tuesday 24th March 2015 11:38 > To: solr-user@lucene.apache.org > Subject: Read or Capture Solr Logs > > Hello

RE: Read or Capture Solr Logs

2015-03-24 Thread Markus Jelsma
Logs > > Hi Markus, > Can you please help me. How to do that? > Using both "Process the logs" > or "make a simple SearchComponent implementation that reads > SolrQueryRequest." > > On Tue, Mar 24, 2015 at 4:17 PM, Markus Jelsma > wrot

RE: German Compound Splitter words.fst causing problems.

2015-03-25 Thread Markus Jelsma
Hello Chris - i don't know that token filter you mention but i would like to recommend Lucene's HyphenationCompoundWordTokenFilter. It works reasonably well if you provide the hyphenation rules and a dictionary. It has some flaws such as decompounding to irrelevant subwords, overlapping subwords

RE: Confusing SOLR 5 memory usage

2015-04-21 Thread Markus Jelsma
Hi - what do you see if you monitor memory over time? You should see a typical saw tooth. Markus -Original message- > From:Tom Evans > Sent: Tuesday 21st April 2015 12:22 > To: solr-user@lucene.apache.org > Subject: Confusing SOLR 5 memory usage > > Hi all > > I have two SOLR 5 serve

Loading custom codec and SPI problem

2015-06-22 Thread Markus Jelsma
Hi - i've created a small Maven project with just a custom Lucene 4.3 codec. It is a basic MyCodec extends FilterCodec implementation, disabling compression. I have added the FQCN to the src/main/resources/META-INF/services/org.apache.lucene.codecs.Codec. I have checked the jar itself to see if

Attaching payload on spatial RPT?

2015-06-24 Thread Markus Jelsma
Hi we have a multiValued spatial RPT field. Each document has 0 or more coordinate pairs attached to it. I derived the coordinate pairs from the spatial facetting heatmap and i have the count that comes with it, to do so i had to translate the heatmap grid to a list of coordinate pairs with the

RE: Tokenizer and Filter Factory to index Chinese characters

2015-06-25 Thread Markus Jelsma
Hello - you can use HMMChineseTokenizerFactory instead. http://lucene.apache.org/core/5_2_0/analyzers-smartcn/org/apache/lucene/analysis/cn/smart/HMMChineseTokenizerFactory.html -Original message- > From:Zheng Lin Edwin Yeo > Sent: Thursday 25th June 2015 11:02 > To: solr-user@lucene.apac

RE: Tokenizer and Filter Factory to index Chinese characters

2015-06-25 Thread Markus Jelsma
订从速,读者可登陆www.wbsub.com.sg,或拨打客服专线6319 > 1800订购。 \n > 此外,一年一度的晚报保健美容展,将在本月23日和24日,在新达新加坡会展中心401、402展厅举行。 > \n > 现场将开设《联合晚报》订阅展摊,读者当场订阅晚报,除了可获得丰厚的赠品,还有机会参与“"], > "author":["Edwin"]}}} > > > Is there any suitable filter factory to solve this issu

RE: Tokenizer and Filter Factory to index Chinese characters

2015-06-25 Thread Markus Jelsma
":["chinese3"], > "content":[")组成的我国女队在今天的东运会保龄球女子三人赛中, > 以六局3963总瓶分夺冠,为新加坡赢得本届赛会第三枚金牌。队友陈诗桦(Jazreel)、梁蕙芬和陈诗静以3707总瓶分获得亚军,季军归菲律宾女队。(联合早报记者:郭嘉惠) > \n "], > "author":["Edwin"]}, > "chinese4":{ > &

RE: Correcting text at index time

2015-06-29 Thread Markus Jelsma
Hello - why not just use synonyms or StemmerOverrideFilter? Markus -Original message- > From:hossmaa > Sent: Monday 29th June 2015 14:08 > To: solr-user@lucene.apache.org > Subject: Correcting text at index time > > Hi everyone > > I'm wondering if it's possible in Solr to correct t

RE: Attaching payload on spatial RPT?

2015-07-01 Thread Markus Jelsma
Apologies for the bump, but there should be someone with a clever idea? :) Cheers, Markus -Original message- > From:Markus Jelsma > Sent: Wednesday 24th June 2015 11:20 > To: solr-user > Subject: Attaching payload on spatial RPT? > > Hi we have a multiValued spatial RPT field. Each do

RE: language identification during solrj indexing

2015-07-02 Thread Markus Jelsma
https://wiki.apache.org/solr/LanguageDetection -Original message- > From:Alessandro Benedetti > Sent: Thursday 2nd July 2015 11:06 > To: solr-user@lucene.apache.org > Subject: Re: language identification during solrj indexing > > SolrJ is simply a java client to access Solr REST API.

RE: Tokenizer and Filter Factory to index Chinese characters

2015-07-06 Thread Markus Jelsma
> > > > > > > > "highlighting":{ > > > > > > "chinese1":{ > > > > > > "text":["1月份的制造业产值同比仅增长0 \n \n 新加坡 我国1月份的制造业产值同比仅增长0.9%。 > > > 虽然制造业结束连续两个月的萎缩,但比经济师普遍预估的增长3.3%疲软得多。这也意味着,我国今年第一季度的经济很可能让

RE: Tokenizer and Filter Factory to index Chinese characters

2015-07-07 Thread Markus Jelsma
s ourselves before we can use it in 5.x? > > Regards, > Edwin > > On 6 July 2015 at 18:44, Markus Jelsma wrote: > > > Yes, analyzers slightly changed since 5.x. > > https://issues.apache.org/jira/browse/LUCENE-5388 > > > > -Original message- >

RE: function query result without queryNorm

2015-07-07 Thread Markus Jelsma
Hello - you can either use a similarity that does not use query normalization, or you can just ignore it, it is relative anyway. Also, consider using boost parameter instead of bf, it is multiplicative where bf is just additive, which offers less control. You may also want to reduce time resolut

RE: To the experts: howto force opening a new searcher?

2015-07-15 Thread Markus Jelsma
Well yes, a simple empty commit won't do the trick, the searcher is not going to reload on recent versions. Reloading the core will. -Original message- > From:Bernd Fehling > Sent: Wednesday 15th July 2015 13:42 > To: solr-user@lucene.apache.org > Subject: Re: To the experts: howto for

RE: To the experts: howto force opening a new searcher?

2015-07-15 Thread Markus Jelsma
See SOLR-5783. -Original message- > From:Alessandro Benedetti > Sent: Wednesday 15th July 2015 14:48 > To: solr-user@lucene.apache.org > Subject: Re: To the experts: howto force opening a new searcher? > > 2015-07-15 12:44 GMT+01:00 Markus Jelsma : > > &g

Programmatically find out if node is overseer

2015-07-16 Thread Markus Jelsma
Hello - i need to run a thread on a single instance of a cloud so need to find out if current node is the overseer. I know we can already programmatically find out if this replica is the leader of a shard via isLeader(). I have looked everywhere but i cannot find an isOverseer. I did find the el

RE: SolrCloud replicas consistently out of sync

2016-05-17 Thread Markus Jelsma
Hi, thats a known issue and unrelated: https://issues.apache.org/jira/browse/SOLR-9120 M. -Original message- > From:Stephen Weiss > Sent: Tuesday 17th May 2016 23:10 > To: solr-user@lucene.apache.org; Aleksey Mezhva ; > Hans Zhou > Subject: Re: SolrCloud replicas consistently out of

RE: puzzling StemmerOverrideFilterFactory

2016-05-19 Thread Markus Jelsma
Hello - that sounds odd indeed. Did you check query and indexing analysis? M. -Original message- > From:Dmitry Kan > Sent: Thursday 19th May 2016 9:36 > To: solr-user@lucene.apache.org > Subject: puzzling StemmerOverrideFilterFactory > > Hello! > > Puzzling case: there is a diction

RE: puzzling StemmerOverrideFilterFactory

2016-05-19 Thread Markus Jelsma
t; Hi, > > Yes, I have checked the analysis page and there everything is logical, > stemming is done as expected. So by analysis page the search should not > return anything. > > On Thu, May 19, 2016 at 12:14 PM, Markus Jelsma > wrote: > > > Hello - that sounds odd i

RE: Stemming nouns ending in 'y'

2016-05-19 Thread Markus Jelsma
Hello - try the KStem filter. It is better suited for english and doesn't show this behaviour. Markus -Original message- > From:Mark Vega > Sent: Thursday 19th May 2016 19:55 > To: solr-user@lucene.apache.org > Subject: Stemming nouns ending in 'y' > > I am using Apache Nutch v1.10

RE: Import html data in mysql and map schemas using only SolrCELL+TIKA+DIH [scottchu]

2016-05-24 Thread Markus Jelsma
Hello - did you find this manual page? It explains how HTML can be uploaded. https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika Markus -Original message- > From:scott.chu > Sent: Tuesday 24th May 2016 7:48 > To: solr-user > Subject: Re:

RE: SolrJ & json.facet?

2016-05-25 Thread Markus Jelsma
Hi - with some crazy casting to a NamedList and List you can finally get a SimpleOrderedMap that contains a bucket you can read. It's a bit tedious but it works fine. Markus -Original message- > From:Bram Van Dam > Sent: Wednesday 25th May 2016 16:48 > To: solr-user@lucene.apache.or

Unit tests, Session expired for ...state.json in AbstractFullDistribZkTestBase

2016-05-26 Thread Markus Jelsma
Hi, We have a bunch of tests extending AbstractFullDistribZkTestBase on 6.0 and our builds sometimes fail with the following message: org.apache.solr.common.SolrException: Could not load collection from ZK: collection1 at io. Caused by: org.apache.zookeeper.KeeperException$SessionExpiredEx

RE: Unit tests, Session expired for ...state.json in AbstractFullDistribZkTestBase

2016-05-26 Thread Markus Jelsma
Also, tests sometimes fail with: org.apache.solr.common.SolrException: No registered leader was found after waiting for 1ms , collection: collection1 slice: shard1 Despite having waitForThingsToLevelOut(45); If anyone has a suggestion for this as well, it would be much appreciated :) Markus

Small setFacetLimit() terminates Solr

2016-06-02 Thread Markus Jelsma
Hello, I ran accros an awkward situation where is collect all ~7.000.000 distinct values for a field via facetting. To keep things optimized and reduce memory consumption i don't do setFacetLimit(-1) but a reasonable limit of 10.000 or 100.000. To my surprise, Solr just stops or crashes. So, i

RE: Small setFacetLimit() terminates Solr

2016-06-03 Thread Markus Jelsma
I'll have a look at it! Thanks guys! Markus -Original message- > From:Toke Eskildsen > Sent: Thursday 2nd June 2016 15:49 > To: solr-user@lucene.apache.org > Subject: Re: Small setFacetLimit() terminates Solr > > On Thu, 2016-06-02 at 09:26 -0400, Yonik Seeley wrote: > > My guess wou

AtomicUpdateDocumentMerger Unknown operation for the an atomic update, operation ignored

2016-06-03 Thread Markus Jelsma
Hi, Just now i indexed ~15k doc to a newly made core and shema, running 6.0 local this time. It was just regular indexing, nothing fancy and very small documents. Then the following popped up in the logs: 2496200 WARN (qtp97730845-17) [ x:documents] o.a.s.u.p.AtomicUpdateDocumentMerger Unkn

RE: All Datanodes are Bad

2016-06-21 Thread Markus Jelsma
Hello Joseph, Your datanodes are in a bad state, you probably overwhelmed it when indexing. Check your max open files on those nodes. Usual default of 1024 is way too low. Markus -Original message- > From:Joseph Obernberger > Sent: Monday 20th June 2016 19:36 > To: solr-user@lucene

RE: Automatic Language Identification

2016-06-22 Thread Markus Jelsma
Hello, I recommend using the langdetect language detector, it supports many more languages and has much higher precission than Tika's detector. Markus -Original message- > From:Alexandre Rafalovitch > Sent: Wednesday 22nd June 2016 12:32 > To: solr-user > Subject: Re: Automatic Lan

RE: Solr node crashes while indexing - Too many open files

2016-06-30 Thread Markus Jelsma
Mads, some distributions require different steps for increasing max_open_files. Check how it works vor CentOS specifically. Markus -Original message- > From:Mads Tomasgård Bjørgan > Sent: Thursday 30th June 2016 10:52 > To: solr-user@lucene.apache.org > Subject: Solr node crashes wh

RE: Solr node crashes while indexing - Too many open files

2016-06-30 Thread Markus Jelsma
over 4000 files without closing them > properly? Is it for example possible to adjust autoCommit-settings I > solrconfig.xml for forcing Solr to close the files? > > Any help is appreciated :-) > > -Original Message- > From: Markus Jelsma [mailto:markus.jel...@openinde

RE: How to best serialize/deserialize a SolrInputDocument?

2016-06-30 Thread Markus Jelsma
Hello - we use GZipped output streams too for buffering large sets of SolrInputDocument's to disk before indexing. Works fine and SolrInputDocument is very easily compressed as well. Markus -Original message- > From:Sebastian Riemer > Sent: Thursday 30th June 2016 13:56 > To: solr-

RE: Strip HTML Tags and Store

2016-07-13 Thread Markus Jelsma
Hello - just as mentioned in the thread: > Add to your solrconfig: > > > > features > > > > And point your update handler config to this html-strip-features update processor chain. M. -Original message- > From:Kalpana > Sent: Wednesday 13th

RE: POST options (UNCLASSIFIED)

2016-07-13 Thread Markus Jelsma
Hello - this is the 101 documentation on indexing: https://cwiki.apache.org/confluence/display/solr/Indexing+and+Basic+Data+Operations ..and this describes stuff on indexing stuff with POST etc: https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers ..and things that

RE: Simple Post Tool result question (UNCLASSIFIED)

2016-07-14 Thread Markus Jelsma
Hello - Multiple slashes beyond the URL scheme should be normalized, but some crazy CMS' or their doubtful modules actually rely on it! It's madness. M. -Original message- > From:Jamal, Sarfaraz > Sent: Thursday 14th July 2016 21:39 > To: solr-user@lucene.apache.org > Subject: RE: S

RE: Are there issues with the use of SolrCloud / embedded Zookeeper in non-HA deployments?

2016-07-28 Thread Markus Jelsma
Hello - all our production environments as deployed as a cloud, even when just a single Solr instance is used. We did this for the purpose having a single method of deployment / provisioning and just because we have the option to add replica's with ease if we need to. We never use embedded Zook

RE: SOLR + Nutch set up (UNCLASSIFIED)

2016-08-03 Thread Markus Jelsma
Depending on your settings, Nutch does this as well. It is even possible to set up different inc/decremental values per mime-type. The algorithms are pluggable and overridable at any point of interest. You can go all the way. -Original message- > From:Walter Underwood > Sent: Wednes

RE: [Non-DoD Source] Re: SOLR + Nutch set up (UNCLASSIFIED)

2016-08-03 Thread Markus Jelsma
No, just run it continously, always! By default everything is refetched (if possible) every 30 days. Just read the descriptions for adaptive schedule and its javadoc. It is simple to use, but sometimes hard to predict its outcome, just because you never know what changes, at whatever time. You

RE: Out of sync deletions causing differing IDF

2016-08-04 Thread Markus Jelsma
Hello - your similarity should rely on numDoc instead, it solves the problem. I believe it is already fixed in trunk, but i am not sure. Markus -Original message- > From:Upayavira > Sent: Thursday 4th August 2016 13:59 > To: solr-user@lucene.apache.org > Subject: Out of sync deletions c

cannot override coord(int,int) in org.apache.lucene.search.similarities.PerFieldSimilarityWrapper

2016-08-31 Thread Markus Jelsma
Hi - I get this compile error when upgrading a Maven project to Solr 6.2.0 and i cannot find a reference in CHANGES.txt. Any ideas? Thanks, Markus cannot override coord(int,int) in org.apache.lucene.search.similarities.PerFieldSimilarityWrapper

RE: cannot override coord(int,int) in org.apache.lucene.search.similarities.PerFieldSimilarityWrapper

2016-08-31 Thread Markus Jelsma
Oh forget it, i missed it completely! LUCENE-7395, SOLR-9315: Fix PerFieldSimilarityWrapper to also delegate query norm and coordination factor using a default similarity added as ctor param. (Uwe Schindler, Sascha Markus) Sorry -Original message- > From:Markus Jelsma > Sent: Wedne

Streaming expressions, delete

2016-09-01 Thread Markus Jelsma
Hi, I've read up on the streaming expressions on the cwiki. The update decorator is ideal for quick imports from various sources such as jdbc, and combined with daemon it can be used for periodic delta imports too, which is very nice. But besides the update function, i would also expect a delet

Facetting on a field doesn't work, until i optimized the index

2016-09-14 Thread Markus Jelsma
Hello - we've just spotted the weirdest issue on Solr 6.1. We have a Solr index full of logs, new items are added every few minutes. We also have an application that shows charts based on what's in the index, Banana style. Yesterday we saw facets for a specific field were missing. Today we chec

RE: Facetting on a field doesn't work, until i optimized the index

2016-09-14 Thread Markus Jelsma
; different definition for your field. Optimize would have to resolve > this somehow, perhaps that process made the magic happen? > > NOTE: I'm not conversant with the internals of merge, so this may be > totally bogus.. > > Best, > Erick > > On Wed, Sep 14, 2

RE: Facetting on a field doesn't work, until i optimized the index

2016-09-15 Thread Markus Jelsma
n that query? > > On Wed, Sep 14, 2016 at 4:15 PM, Markus Jelsma > wrote: > > > Hello - we've just spotted the weirdest issue on Solr 6.1. > > > > We have a Solr index full of logs, new items are added every few minutes. > > We also have an application tha

RE: Search with the start of field

2016-09-21 Thread Markus Jelsma
Yes, definately SpanFirstQuery! But i didn't know you can invoke it via XMLQueryParser, thank Mikhail for that! There is a tiny drawback to SpanFirst, there is no gradient boosting depending on distance from the beginning. Markus -Original message- > From:Mikhail Khludnev > Sent:

Results not ordered by score and debug info is incorrect, crazy

2016-09-27 Thread Markus Jelsma
Hi, I just spotted something weird, again. A regular search popped up a weird candidate for first result, so i've reproduced it on our production system. Digging deeper, it appears that the fl parameter has something to do with it. Not the order of results but the scores / explain in the debug

RE: Results not ordered by score and debug info is incorrect, crazy

2016-09-27 Thread Markus Jelsma
ps the "get fields" phase because it already has all the right > information. > > You can force the single pass for the first request as well by adding > distrib.singlePass=true as a request parameter. It might be interesting to > get that output as well and compare it w

RE: Results not ordered by score and debug info is incorrect, crazy

2016-09-27 Thread Markus Jelsma
the normal two phase search? Please open a jira issue. This > warrants more investigation. > > On Tue, Sep 27, 2016 at 6:26 PM, Markus Jelsma > wrote: > > > Shalin, that does the trick indeed! > > > > Also noted that the manual is incorrect, setting it to a blank or false

RE: SolrJ & Ridiculously Large Queries

2016-10-14 Thread Markus Jelsma
Yes, you can use HTTP POST with SolrJ for queries. SolrRequest request = new QueryRequest((SolrParams)query, SolrRequest.METHOD.POST); QueryResponse response = new QueryResponse(client.request(request), client); https://lucene.apache.org/solr/6_2_1/solr-solrj/org/apache/solr/client/solrj/SolrReq

RE: PDF writer

2016-10-17 Thread Markus Jelsma
Did someone miss https://pdfbox.apache.org/ ? It can write PDF documents, is ASF and has a ton of examples to learn from. M. -Original message- > From:John Bickerstaff > Sent: Monday 17th October 2016 22:05 > To: solr-user@lucene.apache.org > Subject: Re: PDF writer > > It's not fun

RE: Public/Private data in Solr :: Metadata or ?

2016-10-18 Thread Markus Jelsma
In case you're not up for Doug or Jan's anwers; we have relied on HTTP proxies (nginx) to solve the problem of restriction for over 6 years. Very easy if visibility is your only problem. Of course, the update handlers are hidden (we perform indexing for clients with crawlers) so we don't expose

RE: Public/Private data in Solr :: Metadata or ?

2016-10-18 Thread Markus Jelsma
ase that client's key is fairly static, yes? It doesn't change at > any time, but tends to live on the data more or less permanently? > > On Tue, Oct 18, 2016 at 4:07 PM, Markus Jelsma > wrote: > > > In case you're not up for Doug or Jan's anwers; w

RE: Public/Private data in Solr :: Metadata or ?

2016-10-18 Thread Markus Jelsma
ManifoldCF can do this really flexible, with Filenet or Sharepoint, or both, i don't remember that well. This means a variety of users can have changing privileges at any time. The backend determines visibility, ManifoldCF just asks how visible it should be. This also means you need those back

RE: Related Search

2016-10-26 Thread Markus Jelsma
Indeed, we have similar processes running of which one generates a 'related query collection' which just contains a (normalized) query and its related queries. I would not know how this is even possible without continuously processing query and click logs. M. -Original message- > Fr

RE: Getting NullPointerException in an attempt to boost query result by date

2016-10-31 Thread Markus Jelsma
Does it work if you comment out any of the two local param queries? I'd doubt passing two sets of local params ever worked at all without wrapping one in the other. -Original message- > From:Gintautas Sulskus > Sent: Monday 31st October 2016 21:19 > To: solr-user@lucene.apache.org > S

UpdateProcessor as a batch

2016-11-03 Thread Markus Jelsma
Hi - i need to process a batch of documents on update but i cannot seem to find a point where i can hook in and process a list of SolrInputDocuments, not in UpdateProcessor nor in UpdateHandler. For now i let it go and implemented it on a per-document basis, it is fast, but i'd prefer batches.

RE: UpdateProcessor as a batch

2016-11-03 Thread Markus Jelsma
t; Sent: Thursday 3rd November 2016 18:57 > To: solr-user > Subject: Re: UpdateProcessor as a batch > > Markus: > > How are you indexing? SolrJ has a client.add(List) > form, and post.jar lets you add as many documents as you want in a > batch > > Best, > Eri

RE: UpdateProcessor as a batch

2016-11-03 Thread Markus Jelsma
ly no > batching at that level that I know of. I'm pretty sure that even > indexing batches of 1,000 documents from, say, SolrJ go through this > method. > > I don't think there's much to be gained by any batching at this level, > it pretty immediately tells Luce

RE: UpdateProcessor as a batch

2016-11-04 Thread Markus Jelsma
e they're removed from the > > >> incoming batch and passed on, but I admit I have no > > >> clue where to do that. Possibly in an update chain? If > > >> so, you'd need to be careful to only augment when > > >> they'd reached their final s

RE: Custom Response writer

2017-06-16 Thread Markus Jelsma
Yes, index the employee and item names instead of only their ID's. And if you can't for some reason, i'd implement a DocTransformer instead of a ResponseWriter. Regards, Markus -Original message- > From:mganeshs > Sent: Friday 16th June 2017 16:19 > To: solr-user@lucene.apache.org

IndexSchema uniqueKey is not stored

2017-06-16 Thread Markus Jelsma
Hi, Moving over to docValues as stored field i got this: o.a.s.s.IndexSchema uniqueKey is not stored - distributed search and MoreLikeThis will not work But distributed and MLT still work. Is the warning actually obsolete these days? Regards, Markus

RE: Estimating CPU

2017-06-20 Thread Markus Jelsma
To add on Erick, First thing that comes to mind, you also have a huge heap, do you really need it to be that large, if not absolutely necessary, reduce it. If you need it because of FieldCache, consider DocValues instead and reduce the heap again. Use tools like VisualVM to see what the CPU is

RE: How can I enable NER Plugin in Solr 6.x

2017-06-22 Thread Markus Jelsma
Solr hasn't got built in support for NER, but you can try its UIMA integration with external third-party suppliers: https://cwiki.apache.org/confluence/display/solr/UIMA+Integration -Original message- > From:FOTACHE CHRISTIAN > Sent: Thursday 22nd June 2017 19:03 > To: Solr-user > S

RE: Questions about typical/simple clustered Solr software and hardware architecture

2017-06-23 Thread Markus Jelsma
Hello, see inline. -Original message- > From:ken edward > Sent: Friday 23rd June 2017 21:07 > To: solr-user@lucene.apache.org > Subject: Questions about typical/simple clustered Solr software and hardware > architecture > > Hello, > > I am brand new to Solr, and trying to ramp up qui

RE: Proximity searches with a wildcard

2017-06-23 Thread Markus Jelsma
Sure: https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser -Original message- > From:Michael Craven > Sent: Friday 23rd June 2017 22:06 > To: solr-user@lucene.apache.org > Subject: Proximity searches with a wildcard > > I apologize in

SolrJ 6.6.0 Connection pool shutdown

2017-06-27 Thread Markus Jelsma
Hi, We have a proces checking presence of many documents in a collection, just a simple client.getById(id). It sometimes begins throwing lots of these exceptions in a row: org.apache.solr.client.solrj.SolrServerException: java.lang.IllegalStateException: Connection pool shut down Then, as su

RE: SolrJ 6.6.0 Connection pool shutdown

2017-06-29 Thread Markus Jelsma
t; Sent: Tuesday 27th June 2017 23:02 > To: solr-user@lucene.apache.org > Subject: Re: SolrJ 6.6.0 Connection pool shutdown > > On 6/27/2017 6:50 AM, Markus Jelsma wrote: > > We have a proces checking presence of many documents in a collection, just > > a simple client.g

RE: SolrJ 6.6.0 Connection pool shutdown

2017-06-29 Thread Markus Jelsma
firewall). Yes, that's using the hammer to swat a fly :-) > > Regards, >Alex. > > http://www.solr-start.com/ - Resources for Solr users, new and experienced > > > On 29 June 2017 at 08:21, Markus Jelsma wrote: > > Hi, > > > > Everything is 6.6

RE: High disk write usage

2017-07-05 Thread Markus Jelsma
Try mergeFactor of 10 (default) which should be fine in most cases. If you got an extreme case, either create more shards and consider better hardware (SSD's) -Original message- > From:Antonio De Miguel > Sent: Wednesday 5th July 2017 16:48 > To: solr-user@lucene.apache.org > Subject: R

RE: OpenNLP and Solr

2017-07-06 Thread Markus Jelsma
Hi - There is no out-of-the-box integration of OpenNLP in Lucene at this moment, but there is an ancient patch if you are adventurous. Regards, LUCENE-2899 -Original message- > From:meenu > Sent: Thursday 6th July 2017 16:26 > To: solr-user@lucene.apache.org > Subject: OpenNLP and

Slowly running OOM due to Query instances?!

2017-07-07 Thread Markus Jelsma
Hello, This morning i spotted our QTime suddenly go up. This has been going on for a few hours by now and coincides with a serious increase in heap consumption. No node ran out of memory so far but either that is going to happen soon, or the nodes become unusable in another manner. I restarted

RE: Slowly running OOM due to Query instances?!

2017-07-07 Thread Markus Jelsma
w much total cache > consumption may go based on your current solrconfig.xml settings. Also 2 > shards and 3 replca's are on 6 such machines i assume. > > Thanks, > Susheel > > On Fri, Jul 7, 2017 at 7:01 AM, Markus Jelsma > wrote: > > > Hello, > > > > Thi

RE: Slowly running OOM due to Query instances?!

2017-07-07 Thread Markus Jelsma
> > What changed in the system? > > Has there been a code change, increased QPS or different types of queries > being run? > > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Fri, Jul 7, 2017 at 8:07 AM, Markus Jelsma > wrote: > &g

RE: Slowly running OOM due to Query instances?!

2017-07-07 Thread Markus Jelsma
ixes made in Solr > 6.6 with PayloadScoreQuery in this regard. See LUCENE-7808 and LUCENE-7481 > > Erik > > > > On Jul 7, 2017, at 7:01 AM, Markus Jelsma > > wrote: > > > > Hello, > > > > This morning i spotted our QTime suddenly go up.

RE: SolrJ 6.6.0 Connection pool shutdown now with stack trace

2017-07-18 Thread Markus Jelsma
t; Regards, > >    Alex. > > > > http://www.solr-start.com/ - Resources for Solr users, new and experienced > > > > > > On 29 June 2017 at 08:21, Markus Jelsma wrote: > > > Hi, > > > > > > Everything is 6.6.0. I could include a stac

RE: SolrJ 6.6.0 Connection pool shutdown now with stack trace

2017-07-18 Thread Markus Jelsma
with stack trace > > Do you see any errors etc. in solr.log during this time? > > On Tue, Jul 18, 2017 at 7:10 AM, Markus Jelsma > wrote: > > > The problem was never resolved but Shawn asked for the stack trace, here > > it is: > >

RE: SolrJ 6.6.0 Connection pool shutdown now with stack trace

2017-07-18 Thread Markus Jelsma
nt/connections happening concurrently. > > Thanks, > Susheel > > On Tue, Jul 18, 2017 at 8:43 AM, Markus Jelsma > wrote: > > > Hello Susheel, > > > > No, nothing at all. I've check all six nodes, they are clean. > > > > Thanks, > > Markus > &g

6.6 cloud starting to eat CPU after 8+ hours

2017-07-19 Thread Markus Jelsma
Hello, Another peculiarity here, our six node (2 shards / 3 replica's) cluster is going crazy after a good part of the day has passed. It starts eating CPU for no good reason and its latency goes up. Grafana graphs show the problem really well After restarting 2/6 nodes, there is also quite a

RE: 6.6 cloud starting to eat CPU after 8+ hours

2017-07-19 Thread Markus Jelsma
> On July 19, 2017 5:35:32 AM EDT, Markus Jelsma > wrote: > >Hello, > > > >Another peculiarity here, our six node (2 shards / 3 replica's) cluster > >is going crazy after a good part of the day has passed. It starts > >eating CPU for no good reason a

RE: 6.6 cloud starting to eat CPU after 8+ hours

2017-07-19 Thread Markus Jelsma
hey start to sync shards due to some reason? > > On Wed, Jul 19, 2017 at 12:35 PM, Markus Jelsma > wrote: > > > Hello, > > > > Another peculiarity here, our six node (2 shards / 3 replica's) cluster is > > going crazy after a good part of the day has passed

<    1   2   3   4   5   6   7   8   9   10   >