Re: How to use BitDocSet within a PostFilter

2015-08-03 Thread Roman Chyla
Hi, inStockSkusBitSet.get(currentChildDocNumber) Is that child a lucene id? If yes, does it include offset? Every index segment starts at a different point, but docs are numbered from zero. So to check them against the full index bitset, I'd be doing Bitset.exists(indexBase + docid) Just one thin

Re: Reverse query?

2015-10-02 Thread Roman Chyla
I'd like to offer another option: you say you want to match long query into a document - but maybe you won't know whether to pick "Mad Max" or "Max is" (not mentioning the performance hit of "*mad max*" search - or is it not the case anymore?). Take a look at the NGram tokenizer (say size of 2; or

Re: Scramble data

2015-10-08 Thread Roman Chyla
Or you could also apply XSL to returned records: https://wiki.apache.org/solr/XsltResponseWriter On Thu, Oct 8, 2015 at 5:06 PM, Uwe Reh wrote: > Hi, > > my suggestions are probably to simple, because they are not a real > protection of privacy. But maybe one fits to your needs. > > Most simple:

Re: Forking Solr

2015-10-17 Thread Roman Chyla
I've taken the route of extending solr, the repo checks out solr and builds on top of that. The hard part was to figure out how to use solr test classes and the default location for integration tests, but once there, it is relatively easy. Google for montysolr, the repo is on github. Roman On Oct 1

Jetty refuses connections

2016-05-16 Thread Roman Chyla
Hi, I'm hoping someone has seen/encountered a similar problem. We have solr instances with all Jetty threads in BLOCKED state. The application does not respond to any http requests. It is SOLR 4.9 running inside docker on Amazon EC2. Jetty is 8.1 and there is an nginx proxy in front of it (with p

New UI for SOLR-based projects

2015-01-30 Thread Roman Chyla
Hi everybody, There exists a new open-source implementation of a search interface for SOLR. It is written in Javascript (using Backbone), currently in version v1.0.19 - but new features are constantly coming. Rather than describing it in words, please see it in action for yourself at http://ui.ads

Re: New UI for SOLR-based projects

2015-01-30 Thread Roman Chyla
hanks, Roman On 30 Jan 2015 21:51, "Shawn Heisey" wrote: > On 1/30/2015 1:07 PM, Roman Chyla wrote: > > There exists a new open-source implementation of a search interface for > > SOLR. It is written in Javascript (using Backbone), currently in version > > v1.0.19 - bu

Re: Mutli term synonyms

2015-04-29 Thread Roman Chyla
I'm not sure I understand - the autophrasing filter will allow the parser to see all the tokens, so that they can be parsed (and multi-token synonyms) identified. So if you are using the same analyzer at query and index time, they should be able to see the same stuff. are you using multi-token syn

Re: Mutli term synonyms

2015-04-29 Thread Roman Chyla
TE 20 [MART.],SORBIMACROGOL LAURATE > 300,POLYSORBATE 20 [FHFI],FEMA NO. 2915,POLYSORBATE 20 [FCC],POLYSORBATE 20 > [WHO-DD],POLYSORBATE 20 [VANDF] > > *Autophrase.txt...* > > Has all the above phrases in one column > > *Indexed document....* > > > 31 > Poly

Re: Mutli term synonyms

2015-04-29 Thread Roman Chyla
t; "parsedquery": "name:tweenx20", > "parsedquery_toString": "name:tweenx20", > "explain": {}, > > Thank you, > > Kaushik > > > On Wed, Apr 29, 2015 at 4:00 PM, Roman Chyla > wrote: > > > Pls post o

Re: Mutli term synonyms

2015-04-29 Thread Roman Chyla
ose to the solution. > Any thoughts there? > > I appreciate your help on this matter. > > Thank you, > > Kaushik > > > > On Wed, Apr 29, 2015 at 5:48 PM, Roman Chyla > wrote: > > > Hi Kaushik, I meant to compare tween 20 against "tween 20

Re: Injecting synonymns into Solr

2015-05-04 Thread Roman Chyla
It shouldn't matter. Btw try a url instead of a file path. I think the underlying loading mechanism uses java File , it could work. On May 4, 2015 2:07 AM, "Zheng Lin Edwin Yeo" wrote: > Would like to check, will this method of splitting the synonyms into > multiple files use up a lot of memory?

storing large text fields in a database? (instead of inside index)

2018-02-20 Thread Roman Chyla
Hello, We have a use case of a very large index (slave-master; for unrelated reasons the search cannot work in the cloud mode) - one of the fields is a very large text, stored mostly for highlighting. To cut down the index size (for purposes of replication/scaling) I thought I could try to save it

Re: storing large text fields in a database? (instead of inside index)

2018-02-20 Thread Roman Chyla
well at least. > > On Tue, Feb 20, 2018 at 10:27 AM, Roman Chyla > wrote: > > > Hello, > > > > We have a use case of a very large index (slave-master; for unrelated > > reasons the search cannot work in the cloud mode) - one of the fields is > a >

Re: storing large text fields in a database? (instead of inside index)

2018-02-21 Thread Roman Chyla
Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > On 20 Feb 2018, at 20:39, Roman Chyla wrote: > > > > Say there is a high load and I'd like to bring a new machine and let it > > replicate the index, if 100gb and more can be shaved, i

Re: Anti-Pattern in lucent-join jar?

2014-12-04 Thread Roman Chyla
+1, additionally (as it follows from your observation) the query can get out of sync with the index, if eg it was saved for later use and ran against newly opened searcher Roman On 4 Dec 2014 10:51, "Darin Amos" wrote: > Hello All, > > I have been doing a lot of research in building some custom

Re: Anti-Pattern in lucent-join jar?

2014-12-05 Thread Roman Chyla
> onto segment keys, hence it exclude such leakage across different > searchers. > > On Fri, Dec 5, 2014 at 6:43 AM, Roman Chyla wrote: > > > +1, additionally (as it follows from your observation) the query can get > > out of sync with the index, if eg it was saved for

Re: Anti-Pattern in lucent-join jar?

2014-12-05 Thread Roman Chyla
parser or parser plugin? > > I might not have followed you, this discussing challenges my understanding > of Lucene and SOLR. > > Darin > > > > > On Dec 5, 2014, at 12:47 PM, Roman Chyla wrote: > > > > Hi Mikhail, I think you are right, it won't be pro

Re: Queries not supported by Lucene Query Parser syntax

2015-01-01 Thread Roman Chyla
Hi Leonid, I didn't look into solr qparser for a long time, but I think you should be able to combine different query parsers in one query. Look at the SolrQueryParser code, maybe now you can specify custom query parser for every clause (?), st like: foo AND {!lucene}bar I dont know, but worth e

Re: SOLR - any open source framework

2015-01-06 Thread Roman Chyla
Hi Vishal, Alexandre, Here is another one, using Backbone, just released v1.0.16 https://github.com/adsabs/bumblebee you can see it in action: http://ui.adslabs.org/ While it primarily serves our own needs, I tried to architect it to be extendible (within reasonable limits of code, man power)

Re: SOLR - any open source framework

2015-01-06 Thread Roman Chyla
, but that was one year ago... On Tue, Jan 6, 2015 at 5:20 PM, Vishal Swaroop wrote: > Thanks Roman... I will check it... Maybe it's off topic but how about > Angular... > On Jan 6, 2015 5:17 PM, "Roman Chyla" wrote: > > > Hi Vishal, Alexandre, > > &

Re: shards per disk

2015-01-20 Thread Roman Chyla
I think this makes sense to (ie. the setup), since the search is getting 1K documents each time (for textual analysis, ie. they are probably large docs), and use Solr as a storage (which is totally fine) then the parallel multiple drive i/o shards speed things up. The index is probably large, so it

The most efficient way to get un-inverted view of the index?

2016-08-16 Thread Roman Chyla
I need to read data from the index in order to build a special cache. Previously, in SOLR4, this was accomplished with FieldCache or DocTermOrds Now, I'm struggling to see what API to use, there is many of them: on lucene level: UninvertingReader.getNumericDocValues (and others) .getNumericValue

Re: The most efficient way to get un-inverted view of the index?

2016-08-17 Thread Roman Chyla
values are available. --roman On Tue, Aug 16, 2016 at 9:54 PM, Joel Bernstein wrote: > You'll want to use org.apache.lucene.index.DocValues. The DocValues api has > replaced the field cache. > > > > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On

Re: The most efficient way to get un-inverted view of the index?

2016-08-17 Thread Roman Chyla
amp; !(i < liveDocs.length() && liveDocs.get(i))) { i++; continue; } transformer.process(docBase, i); i++; } } } } On Wed, Aug 17, 2016 at 1:22 PM, Roman Chyla wrote: > Joel, thanks, but which of them? I'v

Re: List of Solr Query Parsers

2013-05-06 Thread Roman Chyla
Hi Jan, Please add this one http://29min.wordpress.com/category/antlrqueryparser/ - I can't edit the wiki This parser is written with ANTLR and on top of lucene modern query parser. There is a version which implements Lucene standard QP as well as a version which includes proximity operators, mult

Re: List of Solr Query Parsers

2013-05-06 Thread Roman Chyla
f! The NEAR/5 syntax is really something I think we should get into > the default lucene parser. Can't wait to have a look at your code. > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > > 6. mai 2013 kl. 15:41 skrev Roman Chyla : > >

RE: Solr Cloud with large synonyms.txt

2013-05-07 Thread Roman Chyla
We have synonym files bigger than 5MB so even with compression that would be probably failing (not using solr cloud yet) Roman On 6 May 2013 23:09, "David Parks" wrote: > Wouldn't it make more sense to only store a pointer to a synonyms file in > zookeeper? Maybe just make the synonyms file acces

RE: Solr Cloud with large synonyms.txt

2013-05-08 Thread Roman Chyla
David, have you seen the finite state automata the synonym lookup is built on? The lookup is very efficient and fast. You have a point though, it is going to fail for someone. Roman On 8 May 2013 03:11, "David Parks" wrote: > I can see your point, though I think edge cases would be one concern, i

Re: Portability of Solr index

2013-05-10 Thread Roman Chyla
Hi Mukesh, This seems like something lucene developers should be aware of - you have probably spent quiet some time to find problem/solution. Could you create a JIRA ticket? Roman On 10 May 2013 03:29, "mukesh katariya" wrote: > There is a problem with Base64 encoding. There is a project specifi

Re: List of Solr Query Parsers

2013-05-22 Thread Roman Chyla
Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > > 6. mai 2013 kl. 19:58 skrev Roman Chyla : > > > Hi Jan, > > My login is RomanChyla > > Thanks, > > > > Roman > > On 6 May 2013 10:00, "Jan Høydahl" wrote: > &

Re: Prevention of heavy wildcard queries

2013-05-27 Thread Roman Chyla
You are right that starting to parse the query before the query component can get soon very ugly and complicated. You should take advantage of the flex parser, it is already in lucene contrib - but if you are interested in the better version, look at https://issues.apache.org/jira/browse/LUCENE-501

Re: Prevention of heavy wildcard queries

2013-05-27 Thread Roman Chyla
t; > Any other solution/work-around? How do other production environments of > Solr overcome this issue? > you can also try modifying the standard solr parser, or even the JavaCC generated classes I believe many people do just that (or some sort of preprocessing) roman > > > O

Re: Solr/Lucene Analayzer That Writes To File

2013-05-28 Thread Roman Chyla
You can store them and then use different analyzer chains on it (stored, doesn't need to be indexed) I'd probably use the collector pattern se.search(new MatchAllDocsQuery(), new Collector() { private AtomicReader reader; private int i = 0; @Override public boolean a

Re: how are you handling killer queries?

2013-06-03 Thread Roman Chyla
I think you should take a look at the TimeLimitingCollector (it is used also inside SolrIndexSearcher). My understanding is that it will stop your server from consuming unnecessary resources. --roman On Mon, Jun 3, 2013 at 4:39 AM, Bernd Fehling < bernd.fehl...@uni-bielefeld.de> wrote: > How ar

Two instances of solr - the same datadir?

2013-06-04 Thread Roman Chyla
Hello, I need your expert advice. I am thinking about running two instances of solr that share the same datadirectory. The *reason* being: indexing instance is constantly building cache after every commit (we have a big cache) and this slows it down. But indexing doesn't need much RAM, only the se

Re: Two instances of solr - the same datadir?

2013-06-04 Thread Roman Chyla
LOAD&core=collection1 " But this is not an ideal solution; I'd like for the read-only server to discover index changes on its own. Any pointers? Thanks, roman On Tue, Jun 4, 2013 at 2:01 PM, Roman Chyla wrote: > Hello, > > I need your expert advice. I am thinking about

Re: Two instances of solr - the same datadir?

2013-06-04 Thread Roman Chyla
t meet your > requirements? It was geared explicitly for this purpose, including the > automatic discovery of changes to the data on the index master. > > Jason > > On Jun 4, 2013, at 1:50 PM, Roman Chyla wrote: > > > OK, so I have verified the two instances can run alongside

Re: Two instances of solr - the same datadir?

2013-06-05 Thread Roman Chyla
mechanism between the 2 instance processes to tell the > searcher the index has changed, then call commit when called (more complex > coding, but good if the index changes on an ad-hoc basis). > Note, doing things this way isn't really suitable for an NRT environment. > > HTH, &

Re: Two instances of solr - the same datadir?

2013-06-05 Thread Roman Chyla
aster:http://localhost }/solr/admin/cores?wt=json&action=RELOAD&core=collection1 This works, I still don't like the reload of the whole core, but it seems like the easiest thing to do now. -- roman On Wed, Jun 5, 2013 at 12:07 PM, Roman Chyla wrote: > Hi Peter, > > Thank

Re: Two instances of solr - the same datadir?

2013-06-07 Thread Roman Chyla
e suggestion of using autoCommit to reload the index. If I'm > reading that right, you'd set an autoCommit on 'zero docs changing', or > just 'every N seconds'? Did that work? > > Best of luck! > > Tim > > > On 5 June 2013 10:19, Roman Chyla

Re: New operator.

2013-06-17 Thread Roman Chyla
Hello Yanis, We are probably using something similar - eg. 'functional operators' - eg. edismax() to treat everything inside the bracket as an argument for edismax, or pos() to search for authors based on their position. And invenio() which is exactly what you describe, to get results from externa

Re: Avoiding OOM fatal crash

2013-06-17 Thread Roman Chyla
I think you can modify the response writer and stream results instead of building them first and then sending in one go. I am using this technique to dump millions of docs in json format - but in your case you may have to figure out how to dump during streaming if you don't want to save data to dis

Re: UnInverted multi-valued field

2013-06-19 Thread Roman Chyla
On Wed, Jun 19, 2013 at 5:30 AM, Jochen Lienhard < lienh...@ub.uni-freiburg.de> wrote: > Hi @all. > > We have the problem that after an update the index takes to much time for > 'warm up'. > > We have some multivalued facet-fields and during the startup solr creates > the messages: > > INFO: UnInv

Re: cores sharing an instance

2013-06-29 Thread Roman Chyla
Cores can be reloaded, they are inside solrcore loader /I forgot the exact name/, and they will have different classloaders /that's servlet thing/, so if you want singletons you must load them outside of the core, using a parent classloader - in case of jetty, this means writing your own jetty init

Re: cores sharing an instance

2013-07-01 Thread Roman Chyla
ches > > > class="com.name.Project.AppCaches"/> > > > So each core has its own so specific cachedResources handler. Where in > SOLR would I need to place the AppCaches code to make it visible to all > other cores then? > > t

Re: Solr large boolean filter

2013-07-02 Thread Roman Chyla
Hello @, This thread 'kicked' me into finishing som long-past task of sending/receiving large boolean (bitset) filter. We have been using bitsets with solr before, but now I sat down and wrote it as a qparser. The use cases, as you have discussed are: - necessity to send lng list of ids as a

Re: Solr large boolean filter

2013-07-02 Thread Roman Chyla
Wrong link to the parser, should be: https://github.com/romanchyla/montysolr/blob/master/contrib/adsabs/src/java/org/apache/solr/search/BitSetQParserPlugin.java On Tue, Jul 2, 2013 at 1:25 PM, Roman Chyla wrote: > Hello @, > > This thread 'kicked' me into finishing so

Re: Two instances of solr - the same datadir?

2013-07-02 Thread Roman Chyla
e' lock for the readers and 'native' for the writer, which seems to work OK roman On Fri, Jun 7, 2013 at 9:05 PM, Roman Chyla wrote: > I have auto commit after 40k RECs/1800secs. But I only tested with manual > commit, but I don't see why it should work differe

Re: Solr large boolean filter

2013-07-02 Thread Roman Chyla
y > in solr as a content stream? It makes base64 compression not necessary. > AFAIK url length is limited somehow, anyway. > > > On Tue, Jul 2, 2013 at 9:32 PM, Roman Chyla wrote: > > > Wrong link to the parser, should be: > > > > > https://github.com/romanchyl

Re: Two instances of solr - the same datadir?

2013-07-02 Thread Roman Chyla
7;native', we use native lockType for both > write and RO instances, and it works fine - no contention. > Which version of Solr are you using? Perhaps there's been a change in > behaviour? > > Peter > > > On Tue, Jul 2, 2013 at 7:30 PM, Roman Chyla wrote

Re: Surround query parser not working?

2013-07-03 Thread Roman Chyla
Hi Niran, all, Please look at JIRA LUCENE-5014. There you will find a Lucene parser that does both analysis and span queries, equivalent to combination of lucene+surround, and much more The ticket needs your review. Roman

Re: What are the options for obtaining IDF at interactive speeds?

2013-07-03 Thread Roman Chyla
Hi Kathryn, I wonder if you could index all your terms as separate documents and then construct a new query (2nd pass) q=term:term1 OR term:term2 OR term:term3 and use func to score them *idf(other_field,field(term))* * * the 'term' index cannot be multi-valued, obviously. Other than that, if y

Re: Two instances of solr - the same datadir?

2013-07-03 Thread Roman Chyla
getting updated somehow, otherwise > > how would it know your write instance made any changes? > > Perhaps your write instance notifies the RO instance externally from > Solr? > > (a perfectly valid approach, and one that would allow a 'single' lock to > > wo

Re: SOLR 4.0 frequent admin problem

2013-07-04 Thread Roman Chyla
Yes :-) see SOLR-118, seems an old issue... On 4 Jul 2013 06:43, "David Quarterman" wrote: > Hi, > > About once a week the admin system comes up with SolrCore Initialization > Failures. There's nothing in the logs and SOLR continues to work in the > application it's supporting and in the 'direct

Re: Sending Documents via SolrServer as MapReduce Jobs at Solrj

2013-07-05 Thread Roman Chyla
I don't want to sound negative, but I think it is a valid question to consider - for the lack of information and certain mental rigidity may make it sound bad - first of all, it is probably not for few gigabytes of data and I can imagine that building indexes at the side when data lives is much fas

Re: What are the options for obtaining IDF at interactive speeds?

2013-07-08 Thread Roman Chyla
gt; would never have occurred to me. Thank you too! > > Best, > Katie > > > On Wed, Jul 3, 2013 at 11:35 PM, Roman Chyla > wrote: > > > Hi Kathryn, > > I wonder if you could index all your terms as separate documents and then > > construct a new query

Re: joins in solr cloud - good or bad idea?

2013-07-08 Thread Roman Chyla
Hello, The joins are not the only idea, you may want to write your own function (ValueSource) that can implement your logic. However, I think you should not throw away the regex idea (as being slow), before trying it out - because it can be faster than the joins. Your problem is that the number of

Re: solr way to exclude terms

2013-07-08 Thread Roman Chyla
One of the approaches is to index create a new field based on the stopwords (ie accept only stopwords :)) - ie. if the documents contains them, you index 1 - and use a q=apple&fq=bad_apple:0 This has many limitations (in terms of flexibility), but it will be superfast roman On Mon, Jul 8, 2013 a

Re: Solr large boolean filter

2013-07-08 Thread Roman Chyla
uot;server-side named filters". It > matches the feature described at > http://www.elasticsearch.org/blog/terms-filter-lookup/ > > Would be a cool addition, IMHO. > > Otis > -- > Solr & ElasticSearch Support -- http://sematext.com/ > Performance Monitoring -- http

Re: Best way to call asynchronously - Custom data import handler

2013-07-09 Thread Roman Chyla
Other than using futures and callables? Runnables ;-) Other than that you will need async request (ie. client). But in case sb else is looking for an easy-recipe for the server-side async: public void handleRequestBody(.) { if (isBusy()) { rsp.add("message", "Batch processing is already r

Re: amount of values in a multi value field - is denormalization always the best option?

2013-07-10 Thread Roman Chyla
On Wed, Jul 10, 2013 at 5:37 PM, Marcelo Elias Del Valle wrote: > Hello, > > I have asked a question recently about solr limitations and some about > joins. It comes that this question is about both at the same time. > I am trying to figure how to denormalize my data so I will need just 1

Re: Performance of cross join vs block join

2013-07-12 Thread Roman Chyla
Hi Mikhail, I have commented on your blog, but it seems I have done st wrong, as the comment is not there. Would it be possible to share the test setup (script)? I have found out that the crucial thing with joins is the number of 'joins' [hits returned] and it seems that the experiments I have see

Re: ACL implementation: Pseudo-join performance & Atomic Updates

2013-07-15 Thread Roman Chyla
On Sun, Jul 14, 2013 at 1:45 PM, Oleg Burlaca wrote: > Hello Erick, > > > Join performance is most sensitive to the number of values > > in the field being joined on. So if you have lots and lots of > > distinct values in the corpus, join performance will be affected. > Yep, we have a list of uni

Re: ACL implementation: Pseudo-join performance & Atomic Updates

2013-07-16 Thread Roman Chyla
JIRA? Somehow I missed it if it did, and this > > would > > be pretty cool > > > > Erick > > > > On Mon, Jul 15, 2013 at 6:52 PM, Roman Chyla > > wrote: > > > On Sun, Jul 14, 2013 at 1:45 PM, Oleg Burlaca > > wrote: > > > > > >> He

Re: Range query on a substring.

2013-07-16 Thread Roman Chyla
Well, I think this is slightly too categorical - a range query on a substring can be thought of as a simple range query. So, for example the following query: "lucene 1*" becomes behind the scenes: "lucene (10|11|12|13|14|1abcd)" the issue there is that it is a string range, but it is a range que

Re: Range query on a substring.

2013-07-16 Thread Roman Chyla
on of different fields is allowed). I'd like to spend > some time on ANTLR and the new way of parsing you mentioned. I will let you > know if it was useful for me. Thanks. > > Kind regards. > > > On 16 July 2013 20:07, Roman Chyla wrote: > > > Well, I think this is

Re: Searching w/explicit Multi-Word Synonym Expansion

2013-07-17 Thread Roman Chyla
Hi all, What I find very 'sad' is that Lucene/SOLR contain all the necessary components for handling multi-token synonyms; the Finite State Automaton works perfectly for matching these items; the biggest problem is IMO the old query parser which split things on spaces and doesn't know to be smarte

Re: Searching w/explicit Multi-Word Synonym Expansion

2013-07-17 Thread Roman Chyla
implementation, but again, this is all a > longer-term future, not a "here and now". Maybe in the 5.0 timeframe? > > I don't want anyone to get the impression that there are off-the-shelf > patches that completely solve the synonym phrase problem. Yes, progress is > be

Re: Searching w/explicit Multi-Word Synonym Expansion

2013-07-17 Thread Roman Chyla
rch for query-time phrase > synonyms, off-the-shelf, today, no patches required.) > > > -- Jack Krupansky > > -Original Message- From: Roman Chyla > Sent: Wednesday, July 17, 2013 11:44 AM > > To: solr-user@lucene.apache.org > Subject: Re: Searching w/expli

Re: Searching w/explicit Multi-Word Synonym Expansion

2013-07-17 Thread Roman Chyla
Hi Dave, On Wed, Jul 17, 2013 at 2:03 PM, dmarini wrote: > Roman, > > As a developer, I understand where you are coming from. My issue is that I > specialize in .NET, haven't done java dev in over 10 years. As an > organization we're new to solr (coming from endeca) and we're looking to > use

Re: ACL implementation: Pseudo-join performance & Atomic Updates

2013-07-17 Thread Roman Chyla
gt; field in a Solr doc with the value 6 in it. I can then > > form a query like > > {!bitwise field=myfield op=AND source=2} > > and it would match. > > > > You're talking about a much different operation as I > > understand it. > > > > In which ca

Re: Getting a large number of documents by id

2013-07-18 Thread Roman Chyla
Look at speed of reading the data - likely, it takes long time to assemble a big response, especially if there are many long fields - you may want to try SSD disks, if you have that option. Also, to gain better understanding: Start your solr, start jvisualvm and attach to your running solr. Start

Re: short-circuit OR operator in lucene/solr

2013-07-22 Thread Roman Chyla
Deepak, I think your goal is to gain something in speed, but most likely the function query will be slower than the query without score computation (the filter query) - this stems from the fact how the query is executed, but I may, of course, be wrong. Would you mind sharing measurements you make?

Re: Performance of cross join vs block join

2013-07-22 Thread Roman Chyla
the query, so in that sense, it is not different from pre-computing the citation cache - but it happens for every query/request, and so for 0.5M of edges it must take some time. But I guess I should measure it. I haven't made notes so now I am having hard time backtracking :) roman > It

Re: Processing a lot of results in Solr

2013-07-23 Thread Roman Chyla
Hello Matt, You can consider writing a batch processing handler, which receives a query and instead of sending results back, it writes them into a file which is then available for streaming (it has its own UUID). I am dumping many GBs of data from solr in few minutes - your query + streaming write

Re: Processing a lot of results in Solr

2013-07-24 Thread Roman Chyla
you disclosure how that streaming writer works? What does it stream > docList or docSet? > > Thanks > > > On Wed, Jul 24, 2013 at 5:57 AM, Roman Chyla > wrote: > > > Hello Matt, > > > > You can consider writing a batch processing handler, which recei

Re: Processing a lot of results in Solr

2013-07-24 Thread Roman Chyla
performances acceptable (~ within minutes) ? > > Thanks, > Matt > > On 7/23/13 6:57 PM, "Roman Chyla" wrote: > > >Hello Matt, > > > >You can consider writing a batch processing handler, which receives a > >query > >and instead of sending res

Re: How to debug an OutOfMemoryError?

2013-07-24 Thread Roman Chyla
_One_ idea would be to configure your java to dump core on the oom error - you can then load the dump into some analyzers, eg. Eclipse, and that may give you the desired answers (I fortunately don't remember that from top of my head how to activate the dump, but google will give your the answer) r

Re: Document Similarity Algorithm at Solr/Lucene

2013-07-24 Thread Roman Chyla
This paper contains an excellent algorithm for plagiarism detection, but beware the published version had a mistake in the algorithm - look for corrections - I can't find them now, but I know they have been published (perhaps by one of the co-authors). You could do it with solr, to create an index

Re: Using Solr to search between two Strings without using index

2013-07-25 Thread Roman Chyla
Hi, I think you are pushing it too far - there is no 'string search' without an index. And besides, these things are just better done by a few lines of code - and if your array is too big, then you should create the index... roman On Thu, Jul 25, 2013 at 9:06 AM, Rohit Kumar wrote: > Hi, > >

Re: processing documents in solr

2013-07-27 Thread Roman Chyla
Dear list, I'vw written a special processor exactly for this kind of operations https://github.com/romanchyla/montysolr/tree/master/contrib/adsabs/src/java/org/apache/solr/handler/batch This is how we use it http://labs.adsabs.harvard.edu/trac/ads-invenio/wiki/SearchEngineBatch It is capable of

Re: paging vs streaming. spawn from (Processing a lot of results in Solr)

2013-07-27 Thread Roman Chyla
Mikhail, If your solution gives lazy loading of solr docs /and thus streaming of huge result lists/ it should be big YES! Roman On 27 Jul 2013 07:55, "Mikhail Khludnev" wrote: > Otis, > You gave links to 'deep paging' when I asked about response streaming. > Let me understand. From my POV, deep p

Re: paging vs streaming. spawn from (Processing a lot of results in Solr)

2013-07-27 Thread Roman Chyla
.com/m-khl/solr-patches/compare/streaming#L15R115 > > all other code purposed for distributed search. > > > > On Sat, Jul 27, 2013 at 4:44 PM, Roman Chyla > wrote: > > > Mikhail, > > If your solution gives lazy loading of solr docs /and thus streaming of > > hu

Re: processing documents in solr

2013-07-27 Thread Roman Chyla
On Sat, Jul 27, 2013 at 4:17 PM, Shawn Heisey wrote: > On 7/27/2013 11:38 AM, Joe Zhang wrote: > > I have a constantly growing index, so not updating the index can't be > > practical... > > > > Going back to the beginning of this thread: when we use the vanilla > > "*:*"+pagination approach, woul

Re: Solr-4663 - Alternatives to use same data dir in different cores for optimal cache performance

2013-07-28 Thread Roman Chyla
Hi, Yes, it can be done, if you search the mailing list for 'two solr instances same datadir', you will a post where i am describing our setup - it works well even with automated deployments how do you measure performance? I am asking before one reason for us having the same setup is sharing the O

Measuring SOLR performance

2013-07-30 Thread Roman Chyla
Hello, I have been wanting some tools for measuring performance of SOLR, similar to Mike McCandles' lucene benchmark. so yet another monitor was born, is described here: http://29min.wordpress.com/2013/07/31/measuring-solr-query-performance/ I tested it on the problem of garbage collectors (see

Re: Measuring SOLR performance

2013-07-31 Thread Roman Chyla
es(options) >> File "solrjmeter.py", line 351, in check_prerequisities >> error('Cannot contact: %s' % options.query_endpoint) >> File "solrjmeter.py", line 66, in error >> traceback.print_stack() >> Cannot contact: http://localhost:8983

Re: Measuring SOLR performance

2013-07-31 Thread Roman Chyla
o think of default G1 as 'bad', and that these G1 parameters, even if they don't seem G1 specific, have real effect. Thanks, roman On Tue, Jul 30, 2013 at 11:01 PM, Shawn Heisey wrote: > On 7/30/2013 6:59 PM, Roman Chyla wrote: > > I have been wanting some tools for meas

Re: Measuring SOLR performance

2013-07-31 Thread Roman Chyla
y be random. So, yes, now I am sure what to > > think of default G1 as 'bad', and that these G1 parameters, even if they > > don't seem G1 specific, have real effect. > > Thanks, > > > > roman > > > > > > On Tue, Jul 30, 2013 at 1

Re: Measuring SOLR performance

2013-08-01 Thread Roman Chyla
ib/python2.7/contextlib.py", line 17, in __enter__ > return self.gen.next() > File "solrjmeter.py", line 229, in changed_dir > os.chdir(new) > OSError: [Errno 20] Not a directory: > '/home/dmitry/projects/lab/solrjmeter/queries/demo/demo.queries' >

Re: Measuring SOLR performance

2013-08-01 Thread Roman Chyla
storting your measurements. > > > Bernd > > > Am 31.07.2013 05:01, schrieb Shawn Heisey: > > On 7/30/2013 6:59 PM, Roman Chyla wrote: > >> I have been wanting some tools for measuring performance of SOLR, > similar > >> to Mike McCandles' lucene benchmark.

Re: How to uncache a query to debug?

2013-08-01 Thread Roman Chyla
When you set your cache (solrconfig.xml) to size=0, you are not using a cache. so you can debug more easily roman On Thu, Aug 1, 2013 at 1:12 PM, jimtronic wrote: > I have a query that runs slow occasionally. I'm having trouble debugging it > because once it's cached, it runs fast -- under 10

Re: Measuring SOLR performance

2013-08-01 Thread Roman Chyla
Hi, here is a short post describing the results of the yesterday run with added parameters as per Shawn's recommendation, have fun getting confused ;) http://29min.wordpress.com/2013/08/01/measuring-solr-performance-ii/ roman On Wed, Jul 31, 2013 at 12:32 PM, Roman Chyla wrote: > I&#

Re: Measuring SOLR performance

2013-08-01 Thread Roman Chyla
On Thu, Aug 1, 2013 at 6:11 PM, Shawn Heisey wrote: > On 8/1/2013 2:08 PM, Roman Chyla wrote: > >> Hi, here is a short post describing the results of the yesterday run with >> added parameters as per Shawn's recommendation, have fun getting confused >> ;) >> &

Re: Measuring SOLR performance

2013-08-02 Thread Roman Chyla
x27; % > (options.serverName, options.serverPort) > > jmx_options = [] > for k, v in options.__dict__.items(): > > > > Dmitry > > > On Thu, Aug 1, 2013 at 6:41 PM, Roman Chyla wrote: > > > Dmitry, > > Can you post the entire invocation line?

Re: Measuring SOLR performance

2013-08-05 Thread Roman Chyla
o JSON object could be decoded: line 1 > column 0 (char 0) > > > The README.md on the github is somehow outdated, it suggests using -q > ./demo/queries/demo.queries, but there is no such path in the fresh > checkout. > > Nice to have the -t param. > > Dmitry > > >

Re: Can a field with defined synonym be searched without the synonym?

2012-12-12 Thread Roman Chyla
@wunder It is a misconception (well, supported by that wiki description) that the query time synonym filter have these problems. It is actually the default parser, that is causing these problems. Look at this if you still think that index time synonyms are cure for all: https://issues.apache.org/ji

Re: Can a field with defined synonym be searched without the synonym?

2012-12-12 Thread Roman Chyla
e IDF problem is real; I've run up against it. The most rare variant of > the synonym have the highest score. This probably the opposite of what you > want. For me, it was "TV" and "television". Documents with "TV" had higher > scores than those with "

Re: MoreLikeThis supporting multiple document IDs as input?

2012-12-26 Thread Roman Chyla
Jay Luker has written MoreLikeThese which is probably what you want. You may give it a try, though I am not sure if it works with Solr4.0 at this point (we didn't port it yet) https://github.com/romanchyla/montysolr/blob/MLT/contrib/adsabs/src/java/org/apache/solr/handler/MoreLikeTheseHandler.java

  1   2   >