[ANNOUNCE] Luke 5.4.0 released

2016-02-14 Thread Dmitry Kan
earlier, but not announced separately on this list: luke running on Apache Pivot instead of the Thinlet library. It supports lucene 5.2.1. Grab it here: https://github.com/DmitryKey/luke/releases/tag/pivot-luke-5.2.1 Your feedback is appreciated! -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey

change default id in results clustering

2016-02-18 Thread Dmitry Kan
Hi, Is it possible to change the id field, that defaults to 'id' in carrot based result clustering? I have another field, 'externalId', that is stamped on each document and would like to return it in clusters instead. -- Dmitry Kan Luke Toolbox: http://github.com/Dmitr

Re: [Migration Solr4 to Solr5] Collection reload error

2016-03-04 Thread Dmitry Kan
gt; > > Kelkoo SAS > Société par Actions Simplifiée > Au capital de € 4.168.964,30 > Siège social : 158 Ter Rue du Temple 75003 Paris > 425 093 069 RCS Paris > > Ce message et les pièces jointes sont confidentiels et établis à > l'attention exclusive de leurs destinataires. Si vous n'êtes pas le > destinataire de ce message, merci de le détruire et d'en avertir > l'expéditeur. > -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info

Re: [Migration Solr4 to Solr5] Collection reload error

2016-03-10 Thread Dmitry Kan
- Add or delete documents from the main collection >solrClient.add(doc, 180) // commitWithin > == 30 mn > solrClient.deleteById(doc, 180) // commitWithin == 30 mn > > Maybe you will spot something obviously wrong ? > > Thanks >

Re: [Migration Solr4 to Solr5] Collection reload error

2016-03-10 Thread Dmitry Kan
Thanks Shawn, Missed the openSearcher=false setting. So another thing to check really is whether there are concurrent commitWithin calls ever to the same shard. 10 марта 2016 г. 4:39 PM пользователь "Shawn Heisey" написал: > On 3/10/2016 3:05 AM, Dmitry Kan wrote: > > Th

[ANNOUNCEMENT] Luke 5.5.0 released

2016-03-19 Thread Dmitry Kan
Download the release zip here: https://github.com/DmitryKey/luke/releases/tag/luke-5.5.0 <https://github.com/DmitryKey/luke/releases/tag/luke-5.4.0> Fixed in this release: #50 <https://github.com/DmitryKey/luke/issues/50> (Literally, the upgrade to Lucene 5.5.0) Enjoy! -- Dmi

[ANNOUNCEMENT] Luke 6.0.0 released

2016-04-18 Thread Dmitry Kan
Download the release zip here: https://github.com/DmitryKey/luke/releases/tag/luke-6.0.0 Major upgrade to new Lucene 6.0.0 API. #55 <https://github.com/DmitryKey/luke/pull/55> Enjoy! -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com T

[JOB] Financial search engine company AlphaSense is looking for Search Engineers

2015-08-03 Thread Dmitry Kan
Send your CV over and let's have a chat. Please e-mail me, if you have any questions. -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info

how to extend JavaBinCodec and make it available in solrj api

2015-08-05 Thread Dmitry Kan
lugin framework such that JavaBinCodec is extended and used for the new data structure? -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info

Re: how to extend JavaBinCodec and make it available in solrj api

2015-08-07 Thread Dmitry Kan
> What do you mean by a custom format? As long as your custom component > is writing primitives or NamedList/SimpleOrderedMap or collections > such as List/Map, any response writer should be able to handle them. > > On Wed, Aug 5, 2015 at 5:08 PM, Dmitry Kan wrote: > >

Re: how to extend JavaBinCodec and make it available in solrj api

2015-08-08 Thread Dmitry Kan
changing the response writers. Instead, if you just used > nested maps/lists or SimpleOrderedMap/NamedList then every response > writer should be able to just directly write the output. Nesting is > not a problem. > > On Fri, Aug 7, 2015 at 6:09 PM, Dmitry Kan wrote: > > S

Re: how to extend JavaBinCodec and make it available in solrj api

2015-08-17 Thread Dmitry Kan
Shekhar Mangar > wrote: > > No, I'm afraid you will have to extend the XmlResponseWriter in that > case. > > > > On Sat, Aug 8, 2015 at 2:02 PM, Dmitry Kan wrote: > >> Shalin, > >> > >> Thanks, can I also introduce custom entity tags

modular QueryParser in contrib

2015-09-21 Thread Dmitry Kan
modularity and customizability. Can you point to what the exact class is? -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info

Re: modular QueryParser in contrib

2015-09-21 Thread Dmitry Kan
pache/lucene/queryparser/flexible/standard/package-summary.html > > > > The original Jira: > > https://issues.apache.org/jira/browse/LUCENE-1567 > > > > This new query parser was dumped into Lucene some years ago, but I > haven't > > noticed any real ac

[ANNOUNCE] Luke 5.3.0 released

2015-09-28 Thread Dmitry Kan
, please file an issue on the luke's github: https://github.com/DmitryKey/luke Luke Team -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info

ways to affect on SpanMultiTermQueryWrapper.TopTermsSpanBooleanQueryRewrite

2015-11-02 Thread Dmitry Kan
Hi solr fans, Are there ways to affect on strategy behind SpanMultiTermQueryWrapper.TopTermsSpanBooleanQueryRewrite ? As it seems, at the moment, the rewrite method loads max N words that maximize term score. How can this be changed to loading top terms by frequency, for example? -- Dmitry Kan

similarity as a parameter

2015-12-15 Thread Dmitry Kan
Hi guys, Is there a way to alter the similarity class at runtime, with a parameter? -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info

Re: similarity as a parameter

2015-12-15 Thread Dmitry Kan
se field but > then had the desired alternate similarity, using SchemaSimilarityFactory. > > See: > https://cwiki.apache.org/confluence/display/solr/Other+Schema+Elements > > > -- Jack Krupansky > > > On Tue, Dec 15, 2015 at 10:02 AM, Dmitry Kan wrote: > > > Hi guys

Re: similarity as a parameter

2015-12-15 Thread Dmitry Kan
structed query is going to be the > simplest/cleanest solution regardless of wether #1 or #2 makes the most > sense -- perhaps even achieving #2 by using #1 so that createWeight in > your new QueryWrapper class does the IndexSearcher wrapping before > delegating. > > > >

Re: Weird Solr Replication Slave out of sync

2015-02-17 Thread Dmitry Kan
Because these type > of issues are going to > be hard to find especially when there are on errors. > > What could be happening. and how can I avoid this from happening ? > > > Thanks, > Summer > > -- Dmitry Kan Luke Toolbox: http://github.com/Dmi

unusually high 4.10.2 vs 4.3.1 RAM consumption

2015-02-17 Thread Dmitry Kan
indexing. What else could be the artifact of such a difference -- Solr or JVM? Can it only be explained by the mass indexing? What is worrisome is that the 4.10.2 shard reserves 8x times it uses. What can be done about this? -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http

Re: unusually high 4.10.2 vs 4.3.1 RAM consumption

2015-02-17 Thread Dmitry Kan
> > > This unusual spike happened during mass data indexing. > > > > What else could be the artifact of such a difference -- Solr or JVM? Can > it > > only be explained by the mass indexing? What is worrisome is that the > > 4.10.2 shard reserves

Re: unusually high 4.10.2 vs 4.3.1 RAM consumption

2015-02-17 Thread Dmitry Kan
> -Xmx25600m > > > > > > > > The RAM consumption remained the same after the load has stopped on > the > > > > 4.10.2 cluster. Manually collecting the memory on a 4.10.2 shard via > > > > jvisualvm dropped the used RAM from 8,5G to 0,5G. But t

Re: unusually high 4.10.2 vs 4.3.1 RAM consumption

2015-02-17 Thread Dmitry Kan
-XX:CMSInitiatingOccupancyFraction=40 Dmitry On Tue, Feb 17, 2015 at 1:34 PM, Toke Eskildsen wrote: > On Tue, 2015-02-17 at 11:05 +0100, Dmitry Kan wrote: > > Solr: 4.10.2 (high load, mass indexing) > > Java: 1.7.0_76 (Oracle) > > -Xmx25600m > > > > > > Solr: 4.3.1 (normal load, no ma

Re: Internal document format for Solr 4.10.2

2015-02-18 Thread Dmitry Kan
ity to store this internal document in xml format ? > > -- > Best Regards, > Dinesh Naik > -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info

highlighting the boolean query

2015-02-23 Thread Dmitry Kan
the standard highlighter? Can it be mitigated? -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info

[ANNOUNCE] Luke 4.10.3 released

2015-02-23 Thread Dmitry Kan
iation changed from ASL 2.0 to ALv2 Thanks to respective contributors! P.S. waiting for lucene 5.0 artifacts to hit public maven repositories for the next major release of luke. -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitt

Re: highlighting the boolean query

2015-02-23 Thread Dmitry Kan
Erick, nope, we are using std lucene qparser with some customizations, that do not affect the boolean query parsing logic. Should we try some other highlighter? On Mon, Feb 23, 2015 at 6:57 PM, Erick Erickson wrote: > Are you using edismax? > > On Mon, Feb 23, 2015 at 3:28 AM, D

Re: highlighting the boolean query

2015-02-24 Thread Dmitry Kan
oes look like something with the highlighter. Whether other > highlighters are better for this case.. no clue ;( > > Best, > Erick > > On Mon, Feb 23, 2015 at 9:36 AM, Dmitry Kan wrote: > > Erick, > > > > nope, we are using std lucene qparser with some customization

Re: Integration Tests with SOLR 5

2015-02-24 Thread Dmitry Kan
so no local repository cache is present > - How to deploy your schema.xml, stopwords, solr plug-ins etc. for testing > in an isolated environment > - What does a maven boilerplate code look like? > > Any ideas would be appreciated. > > Kind regards, > > Thomas > --

Re: [ANNOUNCE] Luke 4.10.3 released

2015-02-24 Thread Dmitry Kan
7;t have any opinions, just want to understand current status and avoid > duplicate works. > > Apologize for a bit annoying post. > > Many thanks, > Tomoko > > > > 2015-02-24 0:00 GMT+09:00 Dmitry Kan : > > > Hello, > > > > Luke 4.10.3 has been r

Re: [ANNOUNCE] Luke 4.10.3 released

2015-02-25 Thread Dmitry Kan
long way to go for Pivot's version, of course, I'd like to also > make pull requests to enhance github's version if I can. > > Thanks, > Tomoko > > 2015-02-24 23:34 GMT+09:00 Dmitry Kan : > > > Hi, Tomoko! > > > > Thanks for being a fan of luke

Re: highlighting the boolean query

2015-02-25 Thread Dmitry Kan
>> within the document. Been a while since I dug into the HighlightComponent, >> so maybe there’s some other options available out of the box? >> >> — >> Erik Hatcher, Senior Solutions Architect >> http://www.lucidworks.com <http://www.lucidworks.com/> >

Re: [ANNOUNCE] Luke 4.10.3 released

2015-02-25 Thread Dmitry Kan
s, > Tomoko > > 2015-02-25 18:37 GMT+09:00 Dmitry Kan : > > > Ok, sure. The plan is to make the pivot branch in the current github repo > > and update its structure accordingly. > > Once it is there, I'll let you know. > > > > Thank you, > >

Re: [ANNOUNCE] Luke 4.10.3 released

2015-02-26 Thread Dmitry Kan
; > // compile and make jars and run > $ ant dist > ... > BUILD SUCCESSFULL > $ java -cp "dist/*" org.apache.lucene.luke.ui.LukeApplication > ... > > > Thanks, > Tomoko > > 2015-02-26 16:39 GMT+09:00 Dmitry Kan : > > > Hi Tomoko, > > > > Thanks for t

Re: [ANNOUNCE] Luke 4.10.3 released

2015-02-26 Thread Dmitry Kan
> Seems something wrong around Pivot's, but I have no idea about it. > Would you tell me java version you're using ? > > Tomoko > > 2015-02-26 21:15 GMT+09:00 Dmitry Kan : > > > Thanks, Tomoko, it compiles ok! > > >

Re: [ANNOUNCE] Luke 4.10.3 released

2015-03-01 Thread Dmitry Kan
27;s versions at same place, as you suggested. > > Thanks, > Tomoko > > 2015-02-26 22:15 GMT+09:00 Dmitry Kan : > > > Sure, it is: > > > > java version "1.7.0_76" > > Java(TM) SE Runtime Environment (build 1.7.0_76-b13) > > Java Ho

Re: Conditional invocation of HTMLStripCharFactory

2015-03-02 Thread Dmitry Kan
; View this message in context: > http://lucene.472066.n3.nabble.com/Conditional-invocation-of-HTMLStripCharFactory-tp4190010.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blo

Re: unusually high 4.10.2 vs 4.3.1 RAM consumption

2015-03-10 Thread Dmitry Kan
This freed up couple dozen GBs on the solr server! On Tue, Feb 17, 2015 at 1:47 PM, Dmitry Kan wrote: > Thanks Toke! > > Now I consistently see the saw-tooth pattern on two shards with new GC > parameters, next I will try your suggestion. > > The current params are:

Re: Missing doc fields

2015-03-11 Thread Dmitry Kan
uot;:true}, > { > "name":"id", > "type":"string", > "multiValued":false, > "indexed":true, > "required":true, > "stored":true}, > { > "name":"ymd", > "type":"tdate", > "indexed":true, > "stored":true}], > > > > Yet, when I display $results in the richtext_doc.vm Velocity template, > documents only contain three fields (id, _version_, score): > > SolrDocument{id=3, _version_=1495262517955395584, score=1.0}, > > > How can I increase the number of doc fields? > > Many thanks. > > Philipppe > -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info

Re: Missing doc fields

2015-03-12 Thread Dmitry Kan
":"*","rows":"3","wt":"json"}},"response":{"numFound":160238,"start":0,"docs":[{"id":"10","_version_":1495262519674011648},{"id":"1","_version_":1495262517261238272},{

Re: DocumentAnalysisRequestHandler

2015-03-12 Thread Dmitry Kan
Is /analysis/document deprecated in SOLR 5? > >class="solr.DocumentAnalysisRequestHandler" > startup="lazy" /> > > > What is the modern equivalent of Luke? > > Many thanks. > > Philippe > -- Dmitry Ka

Re: [Poll]: User need for Solr security

2015-03-12 Thread Dmitry Kan
uture version of > Solr. > Examples: Local user management, AD/LDAP integration, SSL, authenticated > login to Admin UI, authorization for Admin APIs, e.g. admin user vs > read-only user etc > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com &

Re: [Poll]: User need for Solr security

2015-03-13 Thread Dmitry Kan
t;run" in > "run" match "running", "runner" "runs" etc. Any but trivial encryption > will break that, and the trivial encryption is easy to break. > > So putting all this over an encrypting filesystem is an approach > that's often used. &

Re: [Poll]: User need for Solr security

2015-03-13 Thread Dmitry Kan
a wildcard search I need to have the "run" in > > "run" match "running", "runner" "runs" etc. Any but trivial encryption > > will break that, and the trivial encryption is easy to break. > > > > So putting all this over an en

Re: solr 4.7.2 mergeFactor/ Merge policy issue

2015-03-16 Thread Dmitry Kan
nd just to be really clear, you _only_ seeing more segments being > >> added, right? If you're only counting files in the index directory, it's > >> _possible_ that merging is happening, you're just seeing new files take > >> the place of old ones. > >> > >> Best, > >> Erick > >> > >> On Wed, Mar 4, 2015 at 7:12 PM, Shawn Heisey > wrote: > >>> On 3/4/2015 4:12 PM, Erick Erickson wrote: > >>>> I _think_, but don't know for sure, that the merging stuff doesn't get > >>>> triggered until you commit, it doesn't "just happen". > >>>> > >>>> Shot in the dark... > >>> > >>> I believe that new segments are created when the indexing buffer > >>> (ramBufferSizeMB) fills up, even without commits. I'm pretty sure that > >>> anytime a new segment is created, the merge policy is checked to see > >>> whether a merge is needed. > >>> > >>> Thanks, > >>> Shawn > >>> > > > > -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info

[ANNOUNCE] Luke 4.10.4 released

2015-03-16 Thread Dmitry Kan
now distributed as a tar.gz with the luke binary and a launcher script. There is currently luke atop apache pivot cooking in its own branch. You can try it out already for some basic index loading and search operations: https://github.com/DmitryKey/luke/tree/pivot-luke -- Dmitry Kan Luke Toolbox

phraseFreq vs sloppyFreq

2015-04-22 Thread Dmitry Kan
er phraserFreq increase the final similarity score? -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info

Re: Odp.: phraseFreq vs sloppyFreq

2015-04-22 Thread Dmitry Kan
1k? > > @LAFK_PL > Oryginalna wiadomość > Od: Dmitry Kan > Wysłano: środa, 22 kwietnia 2015 09:26 > Do: solr-user@lucene.apache.org > Odpowiedz: solr-user@lucene.apache.org > Temat: phraseFreq vs sloppyFreq > > Hi guys. I'm executing the following proximity query: "

payload similarity

2015-04-24 Thread Dmitry Kan
TermQuery(new Term("body", "dogs")); termQuery.setBoost(1.1f); TopDocs topDocs = searcher.search(termQuery, 10); printResults(searcher, termQuery, topDocs); [/code] -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter

Re: payload similarity

2015-04-24 Thread Dmitry Kan
eldNorm(doc=1) 1.0 = MaxPayloadFunction.docScore() On Fri, Apr 24, 2015 at 2:50 PM, Dmitry Kan wrote: > Hi, > > > Using the approach here > http://lucidworks.com/blog/getting-started-with-payloads/ I have > implemented my own PayloadSimilarity class. When debugging the code I have > noticed, tha

Re: payload similarity

2015-04-24 Thread Dmitry Kan
Ahmet, exactly. As I have just illustrated with code, simultaneously with your reply. Thanks! On Fri, Apr 24, 2015 at 4:30 PM, Ahmet Arslan wrote: > Hi Dmitry, > > I think, it is activated by PayloadTermQuery. > > Ahmet > > > > On Friday, April 24, 2015 2:51 PM

Re: payload similarity

2015-04-25 Thread Dmitry Kan
see: > > http://lucidworks.com/blog/end-to-end-payload-example-in-solr/ > > Best, > Erick > > On Fri, Apr 24, 2015 at 6:33 AM, Dmitry Kan wrote: > > Ahmet, exactly. As I have just illustrated with code, simultaneously with > > your reply. Thanks! > > > > On Fri, Apr 2

Re: Proximity Search

2015-04-30 Thread Dmitry Kan
> > SolrJ > > > > > Query API? > > > > > > > > > > Thanks & Regards > > > > > Vijay > > > > > > > > > > > > > -- > > > > The contents of this e-mail are confidential and for the exclusive > use > > of > > > > the intended recipient. If you receive this e-mail in error please > > delete > > > > it from your system immediately and notify us either by e-mail or > > > > telephone. You should not copy, forward or otherwise disclose the > > content > > > > of the e-mail. The views expressed in this communication may not > > > > necessarily be the view held by WHISHWORKS. > > > > > > > > > > > -- > > The contents of this e-mail are confidential and for the exclusive use of > > the intended recipient. If you receive this e-mail in error please delete > > it from your system immediately and notify us either by e-mail or > > telephone. You should not copy, forward or otherwise disclose the content > > of the e-mail. The views expressed in this communication may not > > necessarily be the view held by WHISHWORKS. > > > -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info

Re: severe problems with soft and hard commits in a large index

2015-05-06 Thread Dmitry Kan
one after > another > (around 5-10minutes), I start getting many OOM exceptions. > > > Thank you. > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/severe-problems-with-soft-and-hard-commits-in-a-large-index-tp4204068.html > Sent fr

storeOffsetsWithPositions does not reflect in the index

2015-05-11 Thread Dmitry Kan
e fine. Any ideas how to make storeOffsetsWithPositions work? -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info

bug in search with sloppy queries

2015-06-14 Thread Dmitry Kan
panNear([Contents:eä, Contents:commerceä], 0, true)], 300, false) This query produces words as hits, like: >From E-Tail In the inner spanNear query we expect that e and commerce will occur within 0 slop in that order. Can somebody shed light into what is going on? -- Dmitry Kan Luke Toolbox: http

Re: bug in search with sloppy queries

2015-06-15 Thread Dmitry Kan
quot;verbose" box checked > and you'll see the position of each token after analysis to see if my guess > is accurate. > > Best, > Erick > > On Sun, Jun 14, 2015 at 4:34 AM, Dmitry Kan wrote: > > Hi guys, > > > > We observe some strange bug in solr

Re: bug in search with sloppy queries

2015-06-15 Thread Dmitry Kan
To clarify additionally: we use StandardTokenizer & StandardFilter in front of the WDF. Already following ST's transformations e-tail gets split into two consecutive tokens On Mon, Jun 15, 2015 at 10:08 AM, Dmitry Kan wrote: > Thanks, Erick. Analysis page shows the positions are gro

Re: bug in search with sloppy queries

2015-06-15 Thread Dmitry Kan
termStats); } } [/code] as query we get the above structure, from which all terms are extracted without keeping the query structure? Could someone shed light on the logic behind this weight calculation? On Mon, Jun 15, 2015 at 10:23 AM, Dmitry Kan wrote: > To clarify additionally: we use St

MappingCharFilterFactory and start and end offsets

2015-06-18 Thread Dmitry Kan
.jpg Ideally, we would like to have start and end offset respecting the remapped token. Can this be achieved with settings? -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info

Re: MappingCharFilterFactory and start and end offsets

2015-06-25 Thread Dmitry Kan
intent is for offsets to > map to the *original* text. You can work around this by performing the > substitution prior to Solr analysis, e.g. in an update processor like > RegexReplaceProcessorFactory. > > Steve > www.lucidworks.com > > > On Jun 18, 2015,

issue with highlighting in solr 4.10.2

2015-06-26 Thread Dmitry Kan
feature? Is there any way to debug the highlighter using solr admin? -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info

Re: issue with highlighting in solr 4.10.2

2015-06-29 Thread Dmitry Kan
nippet > size you've specified? > > Shot in the dark, > Erick > > On Fri, Jun 26, 2015 at 3:22 AM, Dmitry Kan wrote: > > Hi, > > > > When highlighting hits for the following query: > > > > (+Contents:apple +Contents:watch) Contents:iphone > >

[ANNOUNCE] Luke 5.2.0 released

2015-07-07 Thread Dmitry Kan
m/DmitryKey/luke/pull/27> Lucene 5x support #28 <https://github.com/DmitryKey/luke/pull/28> Added LUKE_PATH env variable to luke.sh #30 <https://github.com/DmitryKey/luke/pull/30> Luke 5.2 -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blog

[JOB] Financial search engine company AlphaSense is looking for Search Engineers

2015-07-09 Thread Dmitry Kan
Revolution, ApacheCon, Berlin buzzwords), review books on Solr. Send your CV over and let's have a chat. -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info

puzzling StemmerOverrideFilterFactory

2016-05-19 Thread Dmitry Kan
? This is on solr 4.10.2. Thanks, Dmitry -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info

Re: puzzling StemmerOverrideFilterFactory

2016-05-19 Thread Dmitry Kan
an. Still > > searching with organization finds it in the index. Anybody has an idea > why > > this happens? > > > > This is on solr 4.10.2. > > > > Thanks, > > Dmitry > > > > -- > > Dmitry Kan > > Luke Toolbox: http://github.com/DmitryKey/l

Re: puzzling StemmerOverrideFilterFactory

2016-05-19 Thread Dmitry Kan
uot;solr.StemmerOverrideFilterFactory" > > > > dictionary="stemdict.txt" /> on query side, but not indexing. One > rule is > > > > mapping organization onto organiz (on query). On indexing > > > > SnowballPorterFilterFactory will

Re: puzzling StemmerOverrideFilterFactory

2016-06-30 Thread Dmitry Kan
Hi, It appears, the issue was due to a mis-config I did in schema. After StemmerOverrideFilterFactory was added on both query and index sides, the problem has disappeared. Thanks, Dmitry On Thu, May 19, 2016 at 9:01 PM, Shawn Heisey wrote: > On 5/19/2016 5:26 AM, Dmitry Kan wrote: >

RE: Where is Stored values resides ?

2016-07-22 Thread Dmitry Kan
Hi, To my best knowledge the getopt luke is not supported anymore. Use this instead: https://github.com/DmitryKey/luke Regards, Dmitry Hi Prabaharan, You can use Luke to open an index. http://www.getopt.org/luke/ -Original Message- From: Rajendran, Prabaharan [mailto:rajendra...@d

Re: [Result Query Solr] How to retrieve the content of pdfs

2016-09-20 Thread Dmitry Kan
Hi Alexandre, Could you add fl=* to your query and check the output? Alternatively, have a look at your schema file and check what could look like content field: text or similar. Dmitry 14 сент. 2016 г. 1:27 AM пользователь "Alexandre Martins" < alexandremart...@gmail.com> написал: > Hi Guys, >

ClassCastException in RelevanceComparator

2017-09-19 Thread Dmitry Kan
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:532) at java.lang.Thread.run(Thread.java:745) Would tint fields be causing this? If so, should they be defined as Floats? Thanks, Dmitry -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http

tf function query

2017-10-05 Thread Dmitry Kan
don't use edismax parser to apply multifield boosts, but instead use a custom ranking function. Would appreciate any thoughts, Dmitry -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer:

Re: tf function query

2017-10-12 Thread Dmitry Kan
expect as output? tf(field, "a OR b AND c NOT d"). I'm > not sure what term frequency would even mean in that situation. > > tf is a pretty simple function, it expects a single term and there's > now way I know of to do what you're asking. > > Best, &

[ANNOUNCEMENT] Luke 6.4.1 released

2017-02-12 Thread Dmitry Kan
Download the release zip here: https://github.com/DmitryKey/luke/releases/tag/luke-6.4.1 Upgrade to Lucene 6.4.1. Supports: Apache Solr 6.4.1 Elasticsearch 5.2.0 Pull-requests: #79 <https://github.com/DmitryKey/luke/pull/79> and #80 <https://github.com/DmitryKey/luke/pull/80>. --

sort by function with cursor based result fetching

2017-03-05 Thread Dmitry Kan
fixed in solr 6.x? Thanks! -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan Insider Solutions: https://semanticanalyzer.info

null's in logging

2017-04-07 Thread Dmitry Kan
org.apache.solr.update.DirectUpdateHandler2 *null* - Reordered DBQs detected. Is this a known issue to have *null* or a misconfig on our part? Thanks, Dmitry -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan

Re: Partial Counts in SOLR

2014-03-10 Thread Dmitry Kan
Salman, It looks like what you describe has been implemented at Twitter. Presentation from the recent Lucene / Solr Revolution conference in Dublin: http://www.youtube.com/watch?v=AguWva8P_DI On Sat, Mar 8, 2014 at 4:16 PM, Salman Akram < salman.ak...@northbaysolutions.net> wrote: > The issue

Luke 4.7.0 released

2014-03-10 Thread Dmitry Kan
simple Windows launch script: In Windows, Luke can now be launched easily by executing luke.bat. Script sets MaxPermSize to 512m because Luke was found to crash on lower settings. Best regards, Dmitry Kan -- Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan

Re: Partial Counts in SOLR

2014-03-12 Thread Dmitry Kan
As Hoss pointed out above, different projects have different requirements. Some want to sort by date of ingestion reverse, which means that having posting lists organized in a reverse order with the early termination is the way to go (no such feature in Solr directly). Some other projects want to c

Re: Partial Counts in SOLR

2014-03-13 Thread Dmitry Kan
rt. > > I wanted to avoid creating multiple indexes (maybe based on years) but > seems that to search on partial data that's the only feasible way. > > > > > On Wed, Mar 12, 2014 at 2:47 PM, Dmitry Kan wrote: > > > As Hoss pointed out above, different projec

[solr 4.7.0] analysis page: issue with HTMLStripCharFilterFactory

2014-03-15 Thread Dmitry Kan
Hello, The following type does not get analyzed properly on the solr 4.7.0 analysis page: Example text: fox jumps Screenshot: http://pbrd.co/1

Re: [solr 4.7.0] analysis page: issue with HTMLStripCharFilterFactory

2014-03-15 Thread Dmitry Kan
UI: > > https://issues.apache.org/jira/browse/SOLR-5800 > > It was unclear to me if it would be part of a 4.7.1 release. I hope so, > as it'll probably save people a lot of time from thinking their > analyzers are broken. > > > Sent from my Windows Phone From: Dmitry

Re: [solr 4.7.0] analysis page: issue with HTMLStripCharFilterFactory

2014-03-23 Thread Dmitry Kan
se :) > > > > -Stefan > > > > On Saturday, March 15, 2014 at 6:58 PM, Dmitry Kan wrote: > > > > > Hello, > > > > > > The following type does not get analyzed properly on the solr 4.7.0 > > > analysis page: > > > >

Re: Singles in solr for bigrams,trigrams in parsed_query

2014-03-24 Thread Dmitry Kan
Hi, Query rewrite happens down the chain, after query parsing. For example a wildcard query triggers an index based query rewrite where terms matching the wildcard are added into the original query. In your case, looks like the query rewrite will generate the ngrams and add them into the original

Re: Using Sentence Information For Snippet Generation

2014-03-24 Thread Dmitry Kan
Hi Furkan, I have done an implementation with a custom filler (special character) sequence in between sentences. A better solution I landed at was increasing the position of each sentence's first token by a large number, like 1 (perhaps, a smaller number could be used too). Then a user search

Re: Fixing corrupted index?

2014-03-24 Thread Dmitry Kan
Hi, Have a look at: http://lucene.apache.org/core/4_1_0/core/org/apache/lucene/index/CheckIndex.html HTH, Dmitry On Mon, Mar 24, 2014 at 8:16 PM, zqzuk wrote: > My Lucene index - built with Solr using Lucene4.1 - is corrupted. Upon > trying > to read the index using the following code I get

Re: Fixing corrupted index?

2014-03-25 Thread Dmitry Kan
Oh, somehow missed that in your original e-mail. How do you run the checkindex? Do you pass the -fix option? [1] You may want to try luke [2] to open index without opening the IndexReader and run the Tools->Check Index tool from the luke. [1] http://java.dzone.com/news/lucene-and-solrs-checkindex

Re: Fixing corrupted index?

2014-03-25 Thread Dmitry Kan
1. Luke: if you leave the IndexReader on, does the index even open? Can you access the CheckIndex? 2. The command line CheckIndex: what does the CheckIndex -fix do? On Tue, Mar 25, 2014 at 10:54 AM, zqzuk wrote: > Thank you. > > I tried Luke with IndexReader disabled, however it seems the index

Re: Fixing corrupted index?

2014-03-25 Thread Dmitry Kan
right. If you have cfs files in the index directory, there is a thread discussing the method of regenerating the segment files: http://www.gossamer-threads.com/lists/lucene/java-user/39744 backup before doing changes! source on SO: http://stackoverflow.com/questions/9935177/how-to-repair-corrupt

Re: SOLR Cloud 4.6 - PERFORMANCE WARNING: Overlapping onDeckSearchers=2

2014-03-28 Thread Dmitry Kan
Hi Rishi, Do you really need soft-commit every second? Can you make it 10 mins, for example? What is happening (conditional on checking your logs) is that several commits (looks like 2 in your case) are arriving in a quick succession. Then system is starting to warmup the searchers, one per each

Re: SOLR Cloud 4.6 - PERFORMANCE WARNING: Overlapping onDeckSearchers=2

2014-03-28 Thread Dmitry Kan
And in addition, if you begin to see more onDeckSearchers warming up simultaneously, just bumping up maxWarmingSearchers is only postponing the proper core problem solution [1] We have been through this ourselves! http://wiki.apache.org/solr/SolrPerformanceProblems#Slow_commits On Fri, Mar 28,

Re: solr 4.2.1 index gets slower over time

2014-03-31 Thread Dmitry Kan
Hi, We have noticed something like this as well, but with older versions of solr, 3.4. In our setup we delete documents pretty often. Internally in Lucene, when a document is client requested to be deleted, it is not physically deleted, but only marked as "deleted". Our original optimization assum

Re: solr 4.2.1 index gets slower over time

2014-04-01 Thread Dmitry Kan
, we'll add that in the delete > process. But from what I read, this is done in the optimize process (cf. > > http://lucene.472066.n3.nabble.com/Does-expungeDeletes-need-calling-during-an-optimize-td1214083.html > ). > Or maybe not? > > Thanks again, > Elisabeth &

Re: Re: solr 4.2.1 index gets slower over time

2014-04-01 Thread Dmitry Kan
ted docs in the > index. This can increase index size by 50%!! Dmitry Kan < > solrexp...@gmail.com> schreef:Elisabeth, > > Yes, I believe you are right in that the deletes are part of the optimize > process. If you delete often, you may consider (if not already) the > Tie

Re: Luke 4.7.0 released

2014-04-03 Thread Dmitry Kan
quot; > > Exception: java.lang.OutOfMemoryError thrown from the > > UncaughtExceptionHandler in thread "main" > > > > Any ideas? > > > > On Monday, March 10, 2014 5:20:05 PM UTC-4, Dmitry Kan wrote: > >> > >> Hello! > >

Re: Luke 4.7.0 released

2014-04-03 Thread Dmitry Kan
welcome! there will be a shell script in the next luke release: https://github.com/DmitryKey/luke/blob/master/luke.sh On Thu, Apr 3, 2014 at 3:39 PM, simon wrote: > adding that worked - thanks. > > > On Thu, Apr 3, 2014 at 4:18 AM, Dmitry Kan wrote: > > > Hi Joshua,

Re: Using Sentence Information For Snippet Generation

2014-04-07 Thread Dmitry Kan
y scanner? > > Thanks; > Furkan KAMACI > > > > 2014-03-24 21:14 GMT+02:00 Dmitry Kan : > > > Hi Furkan, > > > > I have done an implementation with a custom filler (special character) > > sequence in between sentences. A better solution I landed at was >

converting 4.7 index to 4.3.1

2014-04-07 Thread Dmitry Kan
Dear list, We have been generating solr indices with the solr-hadoop contrib module (SOLR-1301). Our current solr in use is of 4.3.1 version. Is there any tool that could do the backward conversion, i.e. 4.7->4.3.1? Or is the upgrade the only way to go? -- Dmitry Blog: http://dmitrykan.blogspot.

  1   2   3   4   5   6   >