bin/solr start - long response on screen

2017-02-21 Thread Uchit Patel
Hi, I have upgraded SOLR to 6.4.0 from 5.1.0. When I am starting my SOLR I am getting following: -bash-3.2$ bin/solr startArchiving 1 old GC log files to /opt/wml/solr-6.4.0/server/logs/archivedArchiving 1 console log files to /opt/wml/solr-6.4.0/server/logs/archivedRotating solr logs, keeping

Query complexity scorer.

2017-02-21 Thread Modassar Ather
Hi, I am trying to find possible complexity of a query heuristically/ based on learning and provide a score to it before it is actually sent to Solr for execution. The query may contain wildcards, complex phrases, phrases with wildcards. The approach is to assign a number to each part of a query

Re: Solr on HDFS: AutoAddReplica does not add a replica

2017-02-21 Thread Hendrik Haddorp
Hi Erick, in the none HDFS case that sounds logical but in the HDFS case all the index data is in the shared HDFS file system. Even the transaction logs should be in there. So the node that once had the replica should not really have more information then any other node, especially if

Re: Arabic words search in solr

2017-02-21 Thread mohanmca01
Hi Stave, As per your suggestion I added ICU folding filter and I re-indexed entire solr data, but still am unable to find the expected results which i highlighted earlier. attached excel sheet with examples of Arabic names for your investigation & reproducing the issue. Arabic_Characters2.xlsx

Re: Migrate Documents to Another Collection

2017-02-21 Thread Piyush Kunal
I have also noticed this issue and it happens while creating the collated result. Mostly due to huge version mismatch between the server and client. Best idea would be to use same server and client version. Or else switch off collation (spell check you can still keep on) and do the collation (

Re: Question about best way to architect a Solr application with many data sources

2017-02-21 Thread Walter Underwood
Reindexing is exactly why you want the Single Source of Truth to be in a repository outside of Solr. For our slowly-changing data sets, we have an intermediate JSONL batch. That is created from the source repositories and saved in Amazon S3. Then we load it into Solr nightly. That allows us to

Re: Question about best way to architect a Solr application with many data sources

2017-02-21 Thread Erick Erickson
Dave: Oh, I agree that a DB is a perfectly valid place to store the data and you're absolutely right that it allows better interaction than flat files; you can ask questions of an RDBMS that you can't easily ask the disk ;). Storing to disk is an alternative if you're unwilling to deal with a DB

Re: Question about best way to architect a Solr application with many data sources

2017-02-21 Thread Dave
Ha I think I went to one of your training seminars in NYC maybe 4 years ago Eric. I'm going to have to respectfully disagree about the rdbms. It's such a well know data format that you could hire a high school programmer to help with the db end if you knew how to flatten it to solr. Besides

Re: java.util.concurrent.TimeoutException: Idle timeout expired: 50001/50000 ms

2017-02-21 Thread Sadheera Vithanage
Cool, Thank you very much Erick and Walter. On Wed, Feb 22, 2017 at 12:32 PM, Walter Underwood wrote: > I’ve run with 8GB for years for moderate data sets (250K to 15M docs). > Faceting can need more space. > > Make -Xms equal to -Xmx. The heap will grow to the max size

Re: java.util.concurrent.TimeoutException: Idle timeout expired: 50001/50000 ms

2017-02-21 Thread Walter Underwood
I’ve run with 8GB for years for moderate data sets (250K to 15M docs). Faceting can need more space. Make -Xms equal to -Xmx. The heap will grow to the max size regardless and you’ll get pauses while it grows. Starting at the max will avoid that pain. Solr uses lots and lots of short-lived

Re: Solr on HDFS: AutoAddReplica does not add a replica

2017-02-21 Thread Erick Erickson
Hendrik: bq: Not really sure why one replica needs to be up though. I didn't write the code so I'm guessing a bit, but consider the situation where you have no replicas for a shard up and add a new one. Eventually it could become the leader but there would have been no chance for it to check if

Re: How to figure out whether stopwords are being indexed or not

2017-02-21 Thread Erick Erickson
Attach =query to your query and look at the parsed query that's returned. That'll tell you what was searched at least. You can also use the TermsComponent to examine terms in a field directly. Best, Erick On Tue, Feb 21, 2017 at 2:52 PM, Pratik Patel wrote: > I have a

Re: Question about best way to architect a Solr application with many data sources

2017-02-21 Thread Walter Underwood
Awesome advice. flat=fast in Solr. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Feb 21, 2017, at 5:17 PM, Dave wrote: > > B is a better option long term. Solr is meant for retrieving flat data, fast, > not

Re: Question about best way to architect a Solr application with many data sources

2017-02-21 Thread Erick Erickson
I'll add that I _guarantee_ you'll want to re-index the data as you change your schema and the like. You'll be able to do that much more quickly if the data is stored locally somehow. A RDBMS is not necessary however. You could simply store the data on disk in some format you could re-read and

Re: Question about best way to architect a Solr application with many data sources

2017-02-21 Thread Robert Hume
Thanks for that! I was thinking (B) too, but wanted guidance that I'm using the tool correctly. Am still interested in hearing opinions from others, thanks! rh On Tue, Feb 21, 2017 at 8:17 PM, Dave wrote: > B is a better option long term. Solr is meant for

Re: Question about best way to architect a Solr application with many data sources

2017-02-21 Thread David Hastings
And not to sound redundant but if you ever need help, database programmers are a dime a dozen, good luck finding solr developers that are available freelance for a price you're willing to pay. If you can do the solr anyone else that does web dev can do the sql > On Feb 21, 2017, at 8:17 PM,

Re: java.util.concurrent.TimeoutException: Idle timeout expired: 50001/50000 ms

2017-02-21 Thread Erick Erickson
Solr is very memory-intensive. 1g is still a very small heap. For any sizeable data store people often run with at least 4G, often 8G or more. If you facet or group or sort on fields that are _not_ docValues="true" fields you'll use up a lot of JVM memory. The filterCache uses up maxDoc/8 bytes

Re: Question about best way to architect a Solr application with many data sources

2017-02-21 Thread Dave
B is a better option long term. Solr is meant for retrieving flat data, fast, not hierarchical. That's what a database is for and trust me you would rather have a real database on the end point. Each tool has a purpose, solr can never replace a relational database, and a relational database

Question about best way to architect a Solr application with many data sources

2017-02-21 Thread Robert Hume
To learn how to properly use Solr, I'm building a little experimental project with it to search for used car listings. Car listings appear on a variety of different places ... central places Craigslist and also many many individual Used Car dealership websites. I am wondering, should I: (a)

Re: CPU Intensive Scoring Alternatives

2017-02-21 Thread Fuad Efendi
Walter, I use BM25 which is default for Solr 6.3, and I clearly visually saw correlation between number of hits and response times in Solr logs, it is almost linear. With underloaded system. With “solrmeter” 10-requests-per-second CPU goes to 400% on 12-core-hyperthread machine, and with

Re: java.util.concurrent.TimeoutException: Idle timeout expired: 50001/50000 ms

2017-02-21 Thread Sadheera Vithanage
Thanks Eric, It looked like the garbage collection was blocking the other processes. I updated the SOLR_JAVA_MEM="-Xms1g -Xmx4g" as it was the default before and looked like the garbage collection was triggered too frequent. Lets see how it goes now. Thanks again for the support. On Mon, Feb

How to figure out whether stopwords are being indexed or not

2017-02-21 Thread Pratik Patel
I have a field type in schema which has been applied stopwords list. I have verified that path of stopwords file is correct and it is being loaded fine in solr admin UI. When I analyse these fields using "Analysis" tab of the solr admin UI, I can see that stopwords are being filtered out. However,

Re: CPU Intensive Scoring Alternatives

2017-02-21 Thread Walter Underwood
300 ms seems pretty good for 200 million documents. Is that average? Median? 95th percentile? Why are you sure it is because the huge number of hits? That would be unusual. The size of the posting lists is a more common cause. Why do you think it is caused by tf.idf? That should be faster than

Re: Arabic words search in solr

2017-02-21 Thread Steve Rowe
Hi Mohan, It looks to me like the example query should match, since the analyzed query terms look like a subset of the analyzed document terms. Did you re-index your docuemnts after you changed your schema? If not, then the indexed documents won’t have the same terms as the ones you see on

Re: Solr on HDFS: AutoAddReplica does not add a replica

2017-02-21 Thread Hendrik Haddorp
Hi, I had opened SOLR-10092 (https://issues.apache.org/jira/browse/SOLR-10092) for this a while ago. I was now able to gt this feature working with a very small code change. After a few seconds Solr reassigns the replica to a different Solr instance as long as one replica is still up. Not

Re: Fwd: Solr dynamic field blowing up the index size

2017-02-21 Thread Pratik Patel
I think I have found something concrete. Reading up more on nvd file extension, I found that it is being used to store length and boost factors for documents and fields. These are normalization files. Normalization on a field is controlled by omitNorms attribute. If omitNorms=true then the field

Re: Facet query - exlude main query

2017-02-21 Thread Jacques du Rand
Oh right right ! Sorry late night :) Thank You Chris On 21 February 2017 at 20:30, Chris Hostetter wrote: > > : Maybe I'm doing something wrong ? > : /select?q.op=OR=2={!tag=mq}nissan=name%20name_ > raw=json=0=20={!ex=tag_mq}feature_s_1_make > > that url still

Re: CPU Intensive Scoring Alternatives

2017-02-21 Thread Doug Turnbull
With that many documents, why not start with an AND search and reissue an OR query if there's no results? My strategy is to prefer an AND for large collections (or a higher mm than 1) and prefer closer to an OR for smaller collections. -Doug On Tue, Feb 21, 2017 at 1:39 PM Fuad Efendi

Re: CPU Intensive Scoring Alternatives

2017-02-21 Thread Fuad Efendi
Thank you Ahmet, I will try it; sounds reasonable From: Ahmet Arslan Reply: solr-user@lucene.apache.org , Ahmet Arslan Date: February 21,

Re: Facet query - exlude main query

2017-02-21 Thread Chris Hostetter
: Maybe I'm doing something wrong ? : /select?q.op=OR=2={!tag=mq}nissan=name%20name_raw=json=0=20={!ex=tag_mq}feature_s_1_make that url still contains "ex=tag_mq" .. which is looking for a query with a tag named "tag_mq" .. in your q param you are using a tag named "mq" Use

Re: Facet query - exlude main query

2017-02-21 Thread Jacques du Rand
Maybe I'm doing something wrong ? /select?q.op=OR=2={!tag=mq}nissan=name%20name_raw=json=0=20={!ex=tag_mq}feature_s_1_make Still only getting ONE facet value ? { status: 0, QTime: 5, params: { mm: "2", facet.field: [ "{!ex=tag_mq}feature_s_1_make", "{!ex=tag_model}feature_s_2_model" ], qs:

Re: Facet query - exlude main query

2017-02-21 Thread Chris Hostetter
: facet.field: [ : "{!ex=tag_make,tag_model,tag_mq}feature_s_1_make", : "{!ex=tag_model}feature_s_2_model" ... : q: "{!tag=mq}nissan", You are attempting to exclude tags named "tag_make", "tag_model", and "tag_mq" -- but the name of the tag you are using in the query is "mq" if you

Re: Facet query - exlude main query

2017-02-21 Thread Jacques du Rand
Sure.. So this is for a car search-application If i change: q={!tag=mq}nissan to q=*:* I get all the makes. VAR ECHO: responseHeader: { status: 0, QTime: 4, params: { mm: "2", facet.field: [ "{!ex=tag_make,tag_model,tag_mq}feature_s_1_make", "{!ex=tag_model}feature_s_2_model" ], qs: "5",

Re: Facet query - exlude main query

2017-02-21 Thread Chris Hostetter
: Solr3.1 Starting with Solr 3.1, the : primary relevance query (i.e. the one normally specified by the *q* parameter) : may also be excluded. : : But doesnt show me how to exlude it ?? same tag + ex local params, as you had in your example... : 2.

Re: Fwd: Solr dynamic field blowing up the index size

2017-02-21 Thread Pratik Patel
I am using the schema from solr 5 which does not have any field with docValues enabled.In fact to ensure that everything is same as solr 5 (except the breaking changes) I am using the solrconfig.xml also from solr 5 with schemaFactory set as classicSchemaFactory to use schema.xml from solr 5. On

Re: Fwd: Solr dynamic field blowing up the index size

2017-02-21 Thread Alexandre Rafalovitch
Did you reuse the schema or rebuilt it on top of the latest examples? Because the latest example schema enabled docValues for strings on the fieldType level. I would do a diff of the schemas to see what changed. If they look very different and you are looking for tools to normalize/extract

Re: Fwd: Solr dynamic field blowing up the index size

2017-02-21 Thread Mike Thomsen
Correct me if I'm wrong, but heavy use of doc values should actually blow up the size of your index considerably if they are in fields that get sent a lot of data. On Tue, Feb 21, 2017 at 10:50 AM, Pratik Patel wrote: > Thanks for the reply. I can see that in solr 6, more

Re: Fwd: Solr dynamic field blowing up the index size

2017-02-21 Thread Pratik Patel
Thanks for the reply. I can see that in solr 6, more than 50% of the index directory is occupied by ".nvd" file extension. It is something related to norms and doc values. On Tue, Feb 21, 2017 at 10:27 AM, Alexandre Rafalovitch wrote: > Did you look in the data directories

Re: Solr: Return field names that contain search term

2017-02-21 Thread sunayana1991
Hi rahul, I am working on a similar scenario, was wondering if you find a way to resolve this. Thanks in advance! -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Return-field-names-that-contain-search-term-tp3329993p4321399.html Sent from the Solr - User mailing list

Re: Fwd: Solr dynamic field blowing up the index size

2017-02-21 Thread Alexandre Rafalovitch
Did you look in the data directories to check what index file extensions contribute most to the difference? That could give a hint. Regards, Alex On 21 Feb 2017 9:47 AM, "Pratik Patel" wrote: > Here is the same question in stackOverflow for better format. > >

Fwd: Solr dynamic field blowing up the index size

2017-02-21 Thread Pratik Patel
Here is the same question in stackOverflow for better format. http://stackoverflow.com/questions/42370231/solr- dynamic-field-blowing-up-the-index-size Recently, I upgraded from solr 5.0 to solr 6.4.1. I can run my app fine but the problem is that index size with solr 6 is way too large. In solr

Facet query - exlude main query

2017-02-21 Thread Jacques du Rand
HI Guys The Wiki says Solr3.1 Starting with Solr 3.1, the primary relevance query (i.e. the one normally specified by the *q* parameter) may also be excluded. But doesnt show me how to exlude it ?? i've tried: 1 .={!ex=tag_man,mainquery}manufacturer

Re: CPU Intensive Scoring Alternatives

2017-02-21 Thread Ahmet Arslan
Hi, New default similarity is BM25. May be explicitly set similarity to tf-idf and see how it goes? Ahmet On Tuesday, February 21, 2017 4:28 AM, Fuad Efendi wrote: Hello, Default TF-IDF performs poorly with the indexed 200 millions documents. Query "Michael Jackson" may run