Re: How to implement a custom boost function

2016-10-03 Thread Lucas Cotta
Hi Walter, unfortunately I use pagination so that would not be possible.. Thanks 2016-10-04 0:51 GMT-03:00 Walter Underwood : > How about sorting them after you get them back from Solr? > > wunder > Walter Underwood > wun...@wunderwood.org >

Re: SOLR Sizing

2016-10-03 Thread Walter Underwood
Dropping ngrams also makes the index 5X smaller on disk. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 3, 2016, at 9:02 PM, Walter Underwood wrote: > > I did not believe the benchmark results the first time, but it

Re: SOLR Sizing

2016-10-03 Thread Walter Underwood
I did not believe the benchmark results the first time, but it seems to hold up. Nobody gets a speedup of over a thousand (unless you are going from that Oracle search thing to Solr). It probably won’t help for most people. We have one service with very, very long queries, up to 1000 words of

Re: How to implement a custom boost function

2016-10-03 Thread Walter Underwood
How about sorting them after you get them back from Solr? wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 3, 2016, at 6:45 PM, Lucas Cotta wrote: > > I actually could also use a custom similarity class that always returns

Re: firstSearcher per SolrCore or JVM?

2016-10-03 Thread Jihwan Kim
Thanks Eric. FirstSearcher and newSearcher events open with two separate searchers. For the external file field case at least, the cache created with the firstSearcher is not being used after the newSearcher creates another cache (with same values) I believe the warming is also per searcher.

Re: Scaling data extractor with Solr

2016-10-03 Thread Erick Erickson
You can have as many clients indexing to Solr (either Cloud or stand-alone) as you want, limited only by the load you put on Solr. I.e. if your indexing throughput is so great that it makes querying too slow then you have to scale back... I know of setups with 100+ separate clients all indexing

Re: SOLR Sizing

2016-10-03 Thread Erick Erickson
Walter: What did you change? I might like to put that in my bag of tricks ;) Erick On Mon, Oct 3, 2016 at 6:30 PM, Walter Underwood wrote: > That approach doesn’t work very well for estimates. > > Some parts of the index size and speed scale with the vocabulary instead

Re: Upgrading from Solr cloud 4.1 to 6.2

2016-10-03 Thread Erick Erickson
the very easiest way is to re-index. 10M documents shouldn't take very long unless they're no longer available... When you say you tried to use the index upgrader, which one? You'd have to use the one distributed with 5.x to upgrade from 4.x->5.x, then use the one distributed with 6x to go from

Re: firstSearcher per SolrCore or JVM?

2016-10-03 Thread Erick Erickson
firstSearcher and newSeacher are definitely per core, they have to be since they are intended to warm searchers and searchers are per core. I don't particularly see the benefit of firing them both either. Not sure which one makes the most sense though. Best, Erick On Mon, Oct 3, 2016 at 7:10

Upgrading from Solr cloud 4.1 to 6.2

2016-10-03 Thread Neeraj Bhatt
Hello All We are trying to upgrade our production solr with 10 million documents from solr cloud (5 shards, 5 nodes, one collection, 3 replica) 4.1 to 6.2 How to upgrade the lucene index created by solr. Should I go into indexes created by each shard and upgrade and replicate it manually ? Also

firstSearcher per SolrCore or JVM?

2016-10-03 Thread Jihwan Kim
I am using external file fields with larger external files and I noticed Solr Core Reload loads external files twice: firstSearcher and nextSearcher event. Does it mean the Core Reload triggers both events? What is the benefit/reason of triggering both events at the same time? I see this on V.

Listing of fields on Block Join Parent Query Parser

2016-10-03 Thread Zheng Lin Edwin Yeo
Hi, Would like to check, how can we list out all the fields that are available in the index? I'm using dynamic fields, so the Schema API is not working for me, as it will only list out things like *_s, *_f and not the full field name. Also, as I'm using the Block Join Parent Query Parser, it

Re: How to implement a custom boost function

2016-10-03 Thread Lucas Cotta
I actually could also use a custom similarity class that always returns 1.0 then I could use small boost factors such as ^1, ^2, ^3, etc. But I want to do this only in some specific queries (that may contain other fields besides studentId) How could I do this, use the custom similarity class

Re: SOLR Sizing

2016-10-03 Thread Walter Underwood
That approach doesn’t work very well for estimates. Some parts of the index size and speed scale with the vocabulary instead of the number of documents. Vocabulary usually grows at about the square root of the total amount of text in the index. OCR’ed text breaks that estimate badly, with huge

Re: SOLR Sizing

2016-10-03 Thread Susheel Kumar
In short, if you want your estimate to be closer then run some actual ingestion for say 1-5% of your total docs and extrapolate since every search product may have different schema,different set of fields, different index vs. stored fields, copy fields, different analysis chain etc. If you want

Re: EmbeddedSolrServer and Core dataDir in Solr 6.x

2016-10-03 Thread Bryan Bende
After some more debugging, I think putting the dataDir in the Map of properties is actually working, but still running into a couple of issues with the setup... I created an example project that demonstrates the scenario:

Re: EmbeddedSolrServer and Core dataDir in Solr 6.x

2016-10-03 Thread Bryan Bende
Yea I'll try to put something together and report back. On Mon, Oct 3, 2016 at 6:54 PM, Alan Woodward wrote: > Ah, I see what you mean. Putting the dataDir property into the Map > certainly ought to work - can you write a test case that shows what’s > happening? > > Alan

Re: EmbeddedSolrServer and Core dataDir in Solr 6.x

2016-10-03 Thread Alan Woodward
Ah, I see what you mean. Putting the dataDir property into the Map certainly ought to work - can you write a test case that shows what’s happening? Alan Woodward www.flax.co.uk > On 3 Oct 2016, at 23:50, Bryan Bende wrote: > > Alan, > > Thanks for the response. I will

Re: EmbeddedSolrServer and Core dataDir in Solr 6.x

2016-10-03 Thread Bryan Bende
Alan, Thanks for the response. I will double-check, but I believe that is going to put the data directory for the core under coreHome/coreName. What I am trying to setup (and did a poor job of explaining) is something like the following... - Solr home in src/test/resources/solr - Core home in

How to implement a custom boost function

2016-10-03 Thread Lucas Cotta
Hello, I'm new in Solr (4.7.2) and I was given the following requirement: Given a query such as: studentId:(875141 OR 873071 OR 875198 OR 108142 OR 918841 OR 870688 OR 107920 OR 870637 OR 870636 OR 870635 OR 918792 OR 107721 OR 875078 OR 875166 OR 875151 OR 918829 OR 918808) I want the results

Facet+Stats+MinCount: How to use mincount filter when use facet+stats

2016-10-03 Thread Jeffery Yuan
We store some events data such as *accountId, startTime, endTime, timeSpent* and some other searchable fields.We want to get all acountIds that spend more than xhours between startTime and endTime and some other criteria which are not important here.We can use facet and stats query like

Scaling data extractor with Solr

2016-10-03 Thread Steven White
Hi everyone, I'm up to speed about Solr on how it can be setup to provide high availability (if one Solr server goes down, the backup one takes over). My question is how do I make my custom crawler to play "nice" with Solr in this environment. Let us say I setup Solr with 3 servers so that if

Re: CheckHdfsIndex with Kerberos not working

2016-10-03 Thread Rishabh Patel
Thanks Kevin, this worked for me. On Mon, Oct 3, 2016 at 11:48 AM, Kevin Risden wrote: > You need to have the hadoop pieces on the classpath. Like core-site.xml and > hdfs-site.xml. There is an hdfs classpath command that would help but it > may have too many pieces.

Re: Faceting and Grouping Performance Degradation in Solr 5

2016-10-03 Thread Solr User
Below is some further testing. This was done in an environment that had no other queries or updates during testing. We ran through several scenarios so I pasted this with HTML formatting below so you may view this as a table. Sorry if you have to pull this out into a different file for viewing,

Re: Preceding special characters in ClassicTokenizerFactory

2016-10-03 Thread Ahmet Arslan
Hi Andy, WordDelimeterFilter has "types" option. There is an example file named wdftypes.txt in the source tree that preserves #hashtags and @mentions. If you follow this path, please use Whitespace tokenizer. Ahmet On Monday, October 3, 2016 9:52 PM, "Whelan, Andy"

RE: SOLR Sizing

2016-10-03 Thread Allison, Timothy B.
This doesn't answer your question, but Erick Erickson's blog on this topic is invaluable: https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ -Original Message- From: Vasu Y [mailto:vya...@gmail.com] Sent: Monday, October 3, 2016

Preceding special characters in ClassicTokenizerFactory

2016-10-03 Thread Whelan, Andy
Hello, I am guessing that what I am looking for is probably going to require extending StandardTokenizerFactory or ClassicTokenizerFactory. But I thought I would ask the group here before attempting this. We are indexing documents from an eclectic set of sources. There is, however, a heavy

Re: CheckHdfsIndex with Kerberos not working

2016-10-03 Thread Kevin Risden
You need to have the hadoop pieces on the classpath. Like core-site.xml and hdfs-site.xml. There is an hdfs classpath command that would help but it may have too many pieces. You may just need core-site and hdfs-site so you don't get conflicting jars. Something like this may work for you: java

CheckHdfsIndex with Kerberos not working

2016-10-03 Thread Rishabh Patel
Hello, My SolrCloud 5.5 installation has Kerberos enabled. The CheckHdfsIndex test fails to run. However, without Kerberos, I am able to run the test with no issues. I ran the following command: java -cp

SOLR Sizing

2016-10-03 Thread Vasu Y
Hi, I am trying to estimate disk space requirements for the documents indexed to SOLR. I went through the LucidWorks blog ( https://lucidworks.com/blog/2011/09/14/estimating-memory-and-storage-for-lucenesolr/) and using this as the template. I have a question regarding estimating "Avg. Document

Re: EmbeddedSolrServer and Core dataDir in Solr 6.x

2016-10-03 Thread Alan Woodward
This should work: SolrCore solrCore = coreContainer.create(coreName, Paths.get(coreHome).resolve(coreName), Collections.emptyMap()); Alan Woodward www.flax.co.uk > On 3 Oct 2016, at 18:41, Bryan Bende wrote: > > Curious if anyone knows how to create an

RE: Multi-level nesting query inconsistency

2016-10-03 Thread Juan Botero
Hi, thank you. 1. So why do I get those back? They are not even 'legitimate' grandchildren 2. If I do localhost:8983/solr/nested_object_testing/query?debug=query=otype:pf=*,[docid],[child parentFilter=otype:pf limit=500 childFilter='otype:(a p ap)'] --> I get other children except name

EmbeddedSolrServer and Core dataDir in Solr 6.x

2016-10-03 Thread Bryan Bende
Curious if anyone knows how to create an EmbeddedSolrServer in Solr 6.x, with a core where the dataDir is located somewhere outside of where the config is located. I'd like to do this without system properties, and all through Java code. In Solr 5.x I was able to do this with the following code:

Re: Multi-level nesting query inconsistency

2016-10-03 Thread Mikhail Khludnev
Hello, you can strip grandchildren with [child parentFilter=otype:pf limit=500 childFilter='otype:(a p ap)'] If you need to get three level nesting you might probably check [subquery], but I suppose it's easier to recover hierarchy from what you have rigth now. On Mon, Oct 3, 2016 at 7:38 PM,

Multi-level nesting query inconsistency

2016-10-03 Thread Juan Botero
I am fairly new to Solr, so is possible I am writing the query wrong (I have Solr 4.10) On this data: [{ "id": -1666, "otype": "ao", "parent_id": -1, "parent_type": "root", "name": "JOSHUA N AARON MD PA", "account_number": "002812300", "tax_id": "50042772325", "group_npi": 134630688333,

Re: JSON Facet "allBuckets" behavior

2016-10-03 Thread Karthik Ramachandran
So if i cannot use allBuckets since its not filtering, how can I achieve this? On Fri, Sep 30, 2016 at 7:19 PM, Yonik Seeley wrote: > On Tue, Sep 27, 2016 at 12:20 PM, Karthik Ramachandran > wrote: > > While performing json faceting with

Re: How to use StreamingApi MultiFieldComparator?

2016-10-03 Thread Joel Bernstein
Ok, I'll test this out. Joel Bernstein http://joelsolr.blogspot.com/ On Mon, Oct 3, 2016 at 4:40 AM, Markko Legonkov wrote: > here is the stacktrace > > java.io.IOException: Unable to construct instance of > org.apache.solr.client.solrj.io.stream.ComplementStream >

Re: How to use StreamingApi MultiFieldComparator?

2016-10-03 Thread Markko Legonkov
here is the stacktrace java.io.IOException: Unable to construct instance of org.apache.solr.client.solrj.io.stream.ComplementStream at org.apache.solr.client.solrj.io.stream.expr.StreamFactory.createInstance(StreamFactory.java:323) at

Re: How to use StreamingApi MultiFieldComparator?

2016-10-03 Thread Markko Legonkov
Thanks for quick response Here is what i tried complement( search( products, qt="/export", q="*:*", fq="product_id_i:15940162", fl="id, product_id_i, product_name_s,sale_price_d", sort="product_id_i asc" ), select( search( products, qt="/export", q="*:*", fq="product_id_i:15940162", fl="id,