Re: Tokenizer or Filter ?

2015-01-13 Thread Jack Krupansky
Actually, you may be able to get by using PatternReplaceCharFilterFactory - copy the source value to two fields, one that treats d2.*/d2 as the delimiter pattern to delete and then other uses d1.*/d1 as the delimiter pattern to delete, so the first field has only d1 and then second has only d2.

Re: Solr large boolean filter

2015-01-13 Thread rashmy1
Hello, We have a similar requirement where a large list of IDs needs to be sent to SOLR in filter query. Could someone please help understand if this feature is now supported in the new versions of SOLR? Thanks -- View this message in context:

Re: Slow faceting performance on a docValues field

2015-01-13 Thread David Smith
Shawn, Thanks for the suggestion, but experimentally, in my case the same query with facet.method=enum returns in almost the same amount of time. Regards David On Tuesday, January 13, 2015 12:02 PM, Shawn Heisey apa...@elyograg.org wrote: On 1/13/2015 10:35 AM, David Smith wrote:

Slow faceting performance on a docValues field

2015-01-13 Thread David Smith
I have a query against a single 50M doc index (175GB) using Solr 4.10.2, that exhibits the following response times (via the debugQuery option in Solr Admin): process: { time: 24709, query: { time: 54 }, facet: { time: 24574 }, The query time of 54ms is great and exactly as expected -- this

Improved suggester question

2015-01-13 Thread Dan Davis
The suggester is not working for me with Solr 4.10.2 Can anyone shed light over why I might be getting the exception below when I build the dictionary? response lst name=responseHeader int name=status500/int int name=QTime26/int /lst lst name=error str name=msglen must be = 32767; got 35680/str

Re: Logging in Solr's DataImportHandler

2015-01-13 Thread Dan Davis
Mikhail, Thanks - it works now.The script transformer was really not needed, a template transformer is clearer, and the log transformer is now working. On Mon, Dec 8, 2014 at 1:56 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Hello Dan, Usually it works well. Can you describe

Re: Highting whole pharse

2015-01-13 Thread Ahmet Arslan
Hi, hl.usePhraseHighlighter is valid for standard highlighter. May be you are using one of the other highlighters? May be you have omitTermFreqAndPositions=true in definition of text_general field type? Ahmet On Tuesday, January 13, 2015 5:52 PM, meena.sri...@mathworks.com

Re: How to configure Solr PostingsFormat block size

2015-01-13 Thread Tom Burton-West
Thanks Michael and Hoss, assuming I've written the subclass of the postings format, I need to tell Solr to use it. Do I just do something like: fieldType name=ocr class=solr.TextField postingsFormat=MySubclass / Is there a way to set this for all fieldtypes or would that require writing a

Suggester questions

2015-01-13 Thread Dan Davis
I am having some trouble getting the suggester to work. The spell requestHandler is working, but I didn't like the results I was getting from the word breaking dictionary and turned them off. So some basic questions: - How can I check on the status of a dictionary? - How can I see what is

Re: Slow faceting performance on a docValues field

2015-01-13 Thread Shawn Heisey
On 1/13/2015 10:35 AM, David Smith wrote: I have a query against a single 50M doc index (175GB) using Solr 4.10.2, that exhibits the following response times (via the debugQuery option in Solr Admin): process: { time: 24709, query: { time: 54 }, facet: { time: 24574 }, The query time

Re: Slow faceting performance on a docValues field

2015-01-13 Thread Tomás Fernández Löbbe
Range Faceting won't use the DocValues even if they are there set, it translates each gap to a filter. This means that it will end up using the FilterCache, which should cause faster followup queries if you repeat the same gaps (and don't commit). You may also want to try interval faceting, it

Re: Best way to implement Spotlight of certain results

2015-01-13 Thread Dan Davis
Maybe I can use grouping, but my understanding of the feature is not up to figuring that out :) I tried something like http://localhost:8983/solr/collection/select?q=childhood+cancergroup=ongroup.query=childhood+cancer Because the group.limit=1, I get a single result, and no other results. If I

Re: Occasionally getting error in solr suggester component.

2015-01-13 Thread Michael Sokolov
I think you are probably getting bitten by one of the issues addressed in LUCENE-5889 I would recommend against using buildOnCommit=true - with a large index this can be a performance-killer. Instead, build the index yourself using the Solr spellchecker support (spellcheck.build=true)

Re: Slow faceting performance on a docValues field

2015-01-13 Thread Tomás Fernández Löbbe
Just a side question. In your first example you have dates set with time but in the second (where you set intervals) time is not set. Is this something that can be resolved having a field that only sets date (without time), and then use regular field faceting and facet.sort=index? If that's

Re: Unexplained leader initiated recovery after updates - SolrCmdDistributor no longer retries on RemoteSolrException

2015-01-13 Thread Lindsay Martin
We are experiencing unexpected recovery events when a leader is sending updates to a replica. A java.net.SocketException: Connection reset² is encountered when updating the replica which triggers the recovery. In our previous Solr 4.6.1 installation, update errors triggered retry logic in the

Re: Slow faceting performance on a docValues field

2015-01-13 Thread David Smith
Tomás, Thanks for the response -- the performance of my query makes perfect sense in light of your information. I looked at Interval faceting.  My required interval is 1 day.  I cannot change that requirement.  Unless I am mis-reading the doc, that means to facet a 10 year range, the query

Re: Slow faceting performance on a docValues field

2015-01-13 Thread Tomás Fernández Löbbe
No, you are not misreading, right now there is no automatic way of generating the intervals on the server side similar to range faceting... I guess it won't work in your case. Maybe you should create a Jira to add this feature to interval faceting. Tomás On Tue, Jan 13, 2015 at 10:44 AM, David

Re: How to configure Solr PostingsFormat block size

2015-01-13 Thread Chris Hostetter
: assuming I've written the subclass of the postings format, I need to tell : Solr to use it. : : Do I just do something like: : : fieldType name=ocr class=solr.TextField postingsFormat=MySubclass / the postingFormat xml tag in schema.xml just refers to the name of the postingFormat in SPI --

Re: Slow faceting performance on a docValues field

2015-01-13 Thread David Smith
What is stumping me is that the search result has 3 hits, yet faceting those 3 hits takes 24 seconds.  The documentation for facet.method=fc is quite explicit about how Solr does faceting: fc (stands for Field Cache) The facet counts are calculated by iterating over documents that match the

Re: Slow faceting performance on a docValues field

2015-01-13 Thread Tomás Fernández Löbbe
fc, fcs and enum only apply for field faceting, not range faceting. Tomás On Tue, Jan 13, 2015 at 11:24 AM, David Smith dsmiths...@yahoo.com.invalid wrote: What is stumping me is that the search result has 3 hits, yet faceting those 3 hits takes 24 seconds. The documentation for

Re: Solr grouping problem - need help

2015-01-13 Thread Erick Erickson
bq: My question is for indexed=false, stored=true field..what is optimized way to get unique values in such field. There isn't any. To do this you'll have to read the doc from disk, it'll be decompressed along the way and then the field is read. Note that this happens automatically when you call

RE: Distributed unit tests and SSL doesn't have a valid keystore

2015-01-13 Thread Markus Jelsma
Thanks, we will supress it for now! M. -Original message- From:Mark Miller markrmil...@gmail.com Sent: Monday 12th January 2015 19:25 To: solr-user@lucene.apache.org Subject: Re: Distributed unit tests and SSL doesn't have a valid keystore I'd have to do some digging. Hossman

Re: Slow faceting performance on a docValues field

2015-01-13 Thread Alexandre Rafalovitch
Could probably write a custom SearchComponent to prepend and expand the query for the required use case. Though if something then has to parse that query back, it would still be an issue. Regards, Alex Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 13 January

Re: Occasionally getting error in solr suggester component.

2015-01-13 Thread Dan Davis
Related question - I see mention of needing to rebuild the spellcheck/suggest dictionary after solr core reload. I see spellcheckIndexDir in both the old wiki entry and the solr reference guide https://cwiki.apache.org/confluence/display/solr/Spell+Checking. If this parameter is provided, it

Re: Frequent deletions

2015-01-13 Thread Shawn Heisey
On 1/13/2015 12:10 AM, ig01 wrote: Unfortunately this is the case, we do have hundreds of millions of documents on one Solr instance/server. All our configs and schema are with default configurations. Our index size is 180G, does that mean that we need at least 180G heap size? If you have

Solr fails to start with log file not found error

2015-01-13 Thread Graeme Pietersz
I get this error when starting Solr using the script in bin/solr tail cannot open `[path]/logs/solr.log’ for reading: No such file or directory It does not happen every time, but it does happen a lot. It sometimes clears up after a while. I have tried creating an empty file, but solr then just

Re: leader split-brain at least once a day - need help

2015-01-13 Thread Thomas Lamy
Hi Mark, we're currently at 4.10.2, update to 4.10.3 ist scheduled for tomorrow. T Am 12.01.15 um 17:30 schrieb Mark Miller: bq. ClusterState says we are the leader, but locally we don't think so Generally this is due to some bug. One bug that can lead to it was recently fixed in 4.10.3 I

Re: Extending solr analysis in index time

2015-01-13 Thread Ali Nazemian
Dear Markus, Unfortunately I can not use payload since I want to retrieve this score to each user as a simple field alongside other fields. Unfortunately payload does not provide that. Also I dont want to change the default similarity method of Lucene, I just want to have this filed to do the

Re: Tokenizer or Filter ?

2015-01-13 Thread tomas.kalas
Thanks Jack for your advice. Can you please explain me little more, how it works? From Apache Wiki it's not to clear for me. I can write some javaScript code when i want filtering some data ? In this case i have d1bla bla bla/d1 d2 bla bla bla /d2 d1bla bla bla /d1 and i want filtering d2 bla bla

Getting error while indexing XML files on Hadoop

2015-01-13 Thread celebis
Hi to all from Istanbul, Turkey, I can say that I'm a newbie in Solr Hadoop, I’m trying to index XML files (ipod_other.xml from lucidworks’ example files, converted into sequence file format), using SolrXMLIngestMapper jars. I’ve modified the schema.xml file by making the necesssary addions

Re: SolrCloud shard leader elections - Altering zookeeper sequence numbers

2015-01-13 Thread Daniel Collins
Is it important where your leader is? If you just want to minimize leadership changes during rolling re-start, then you could restart in the opposite order (S3, S2, S1). That would give only 1 transition, but the end result would be a leader on S2 instead of S1 (not sure if that important to you

Re: SolrCloud shard leader elections - Altering zookeeper sequence numbers

2015-01-13 Thread Zisis Tachtsidis
Daniel Collins wrote Is it important where your leader is? If you just want to minimize leadership changes during rolling re-start, then you could restart in the opposite order (S3, S2, S1). That would give only 1 transition, but the end result would be a leader on S2 instead of S1 (not sure

Solr grouping problem - need help

2015-01-13 Thread Naresh Yadav
*Schema :* field name=tenant_pool type=text stored=true/ *Code :* SolrQuery q = new SolrQuery().setQuery(*:*); q.set(GroupParams.GROUP, true); q.set(GroupParams.GROUP_FIELD, tenant_pool); *Data :* tenant_pool : Baroda Farms tenant_pool : Ketty Farms *Output coming :* groupValue=Farms, docs=2

Re: Solr startup script in version 4.10.3

2015-01-13 Thread Dominique Bejean
Thank you for your responses. However, according to my tests, solr 4.10.3 doesn’t use server by default anymore due to the removal of these lines in the bin/solr script. # TODO: see SOLR-3619, need to support server or example # depending on the version of Solr if [ -e $SOLR_TIP/server/start.jar

Re: Solr grouping problem - need help

2015-01-13 Thread Jack Krupansky
That's your job. The easiest way is to do a copyField to a string field. -- Jack Krupansky On Tue, Jan 13, 2015 at 7:33 AM, Naresh Yadav nyadav@gmail.com wrote: *Schema :* field name=tenant_pool type=text stored=true/ *Code :* SolrQuery q = new SolrQuery().setQuery(*:*);

Re: Extending solr analysis in index time

2015-01-13 Thread Jack Krupansky
A function query or an update processor to create a separate field are still your best options. -- Jack Krupansky On Tue, Jan 13, 2015 at 4:18 AM, Ali Nazemian alinazem...@gmail.com wrote: Dear Markus, Unfortunately I can not use payload since I want to retrieve this score to each user as a

Re: Solr grouping problem - need help

2015-01-13 Thread Naresh Yadav
Hi jack, Thanks for replying, i am new to solr please guide me on this. I have many such columns in my schema so copy field will create lot of duplicate fields beside i do not need any search on original field. My usecase is i do not want any search on tenant_pool field thats why i declared it

Occasionally getting error in solr suggester component.

2015-01-13 Thread Dhanesh Radhakrishnan
Hi all, I am experiencing a problem in Solr SuggestComponent Occasionally solr suggester component throws an error like Solr failed: {responseHeader:{status:500,QTime:1},error:{msg:suggester was not built,trace:java.lang.IllegalStateException: suggester was not built\n\tat

Re: Solr fails to start with log file not found error

2015-01-13 Thread Erick Erickson
By any chance are you trying to start Solr as a different user when this happens? I'm wondering if there's a permissions issue here Wild guess. On Tue, Jan 13, 2015 at 12:37 AM, Graeme Pietersz gra...@pietersz.net wrote: I get this error when starting Solr using the script in bin/solr

Re: Extending solr analysis in index time

2015-01-13 Thread Ali Nazemian
I decided to go for function query and implementing function query to read term frequency for each document from index. Anyway I did not find any tutorial which is matched my problem well. I really appreciate if somebody could provide me some useful tutorial or example for this case. Thank you

Highting whole pharse

2015-01-13 Thread meena.sri...@mathworks.com
Highlighting does not highlight the whole Phrase, instead each word gets highlighted. I tried all the suggestions that was given, with no luck These are my special setting I tried for phrase highlighting hl.usePhraseHighlighter=true hl.q=query

Re: SolrCloud shard leader elections - Altering zookeeper sequence numbers

2015-01-13 Thread Erick Erickson
SolrCloud is intended to work in the rolling restart case... Index size, segment counts, segment names can (and will) be different on different replicas of the same shard without anything being amiss. Commits (hard) happen at different times across the replicas in a shard. Merging logic kicks in

Re: SpellCheck (AutoComplete) Not Working In Distributed Environment

2015-01-13 Thread Charles Sanders
Still not able to get my autoComplete component to work in a distributed environment. Works fine on a non-distributed system. Also, on the distributed system, if I include distrib=false, it works. I have tried shards.qt and shards parameters, but they make no difference. I should add, I am

Re: Solr limiting number of rows to indexed to 21500 every time.

2015-01-13 Thread Michael Della Bitta
Looks like you have an underlying JDBC problem. The socket representing your database connection seems to be going away. Have you tried running this query outside of Solr and iterating through all the results? How about in a standalone Java program? Do you have a DBA you can consult to see if

Re: Solr grouping problem - need help

2015-01-13 Thread Erick Erickson
Something is very wrong here. Have you perhaps been changing your schema without re-indexing? And I recommend you completely remove your data directory (the one with index and tlog subdirectories) after you change your schema.xml file. Because you're trying to group on a field that is _not_

Re: Tokenizer or Filter ?

2015-01-13 Thread Jack Krupansky
Would it be sufficient for your user case to simply extract all the d1 into one field and all the d2 in another field? If so, the update processor script would be very simple, simply matching all d1.*/d1 and copying them to a separate field value and same for d2. If you want examples of script

Re: leader split-brain at least once a day - need help

2015-01-13 Thread Shawn Heisey
On 1/12/2015 5:34 AM, Thomas Lamy wrote: I found no big/unusual GC pauses in the Log (at least manually; I found no free solution to analyze them that worked out of the box on a headless debian wheezy box). Eventually i tried with -Xmx8G (was 64G before) on one of the nodes, after checking

Re: Solr grouping problem - need help

2015-01-13 Thread Naresh Yadav
Erick, my schema is same no change in that.. *Schema :* field name=tenant_pool type=text stored=true/ my guess is i had not mentioned indexed true or falsemay be default indexed is true My question is for indexed=false, stored=true field..what is optimized way to get unique values in

Re: Solr large boolean filter

2015-01-13 Thread Alexandre Rafalovitch
TermsQueryParser I think is somewhat new. Have you tried that one? https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermsQueryParser Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 13 January 2015 at 12:54, rashmy1

Re: Engage custom hit collector for special search processing

2015-01-13 Thread tedsolr
As insane as it sounds, I need to process all the results. No one document is more or less important than another. Only a few hundred unique docs will be sent to the client at any one time, but the users expect to page through them all. I don't expect sub-second performance for this task. I'm

Re: Solr unit tests intermittently fail with error: java.lang.NoClassDefFoundError: org/eclipse/jetty/util/security/CertificateUtils

2015-01-13 Thread Shawn Heisey
On 1/13/2015 2:50 PM, brian4 wrote: The problem is the jetty-util version included in the Solr build is 6.1.26, but this particular package is from version 7+. Looks like it is a bug in the build files for Solr. I fixed it by downloading jetty 7 separately and manually adding

Engage custom hit collector for special search processing

2015-01-13 Thread tedsolr
I have a complicated problem to solve, and I don't know enough about lucene/solr to phrase the question properly. This is kind of a shot in the dark. My requirement is to return search results always in completely collapsed form, rolling up duplicates with a count. Duplicates are defined by

Re: Slow faceting performance on a docValues field

2015-01-13 Thread David Smith
Shawn, I've been thinking along your lines, and continued to run tests through the day.  The results surprised me. For my index, Solr range faceting time is most closely related to the total number of documents in the index for the range specified.  The number of buckets in the range is a

Re: How to configure Solr PostingsFormat block size

2015-01-13 Thread Chris Hostetter
: ...the nuts bolts of it is that the PostingFormat baseclass should take : care of all the SPI name registration that you need based on what you : pass to the super() construction ... allthough now that i think about it, : i'm not sure how you'd go about specifying your own name for the :

Re: Slow faceting performance on a docValues field

2015-01-13 Thread Shawn Heisey
On 1/13/2015 11:44 AM, David Smith wrote: I looked at Interval faceting. My required interval is 1 day. I cannot change that requirement. Unless I am mis-reading the doc, that means to facet a 10 year range, the query needs to specify over 3,600 intervals ?? I am very ignorant of how the

Re: Engage custom hit collector for special search processing

2015-01-13 Thread Jack Krupansky
Do you have a sense of what your typical queries would look like? I mean, maybe you wouldn't actually need to fetch more than a tiny fraction of those million documents. Do you only need to determine the top 10 or 20 or 50 unique field value row sets, or do you need to determine ALL unique row

Re: Engage custom hit collector for special search processing

2015-01-13 Thread Alexandre Rafalovitch
Sounds like: https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results http://heliosearch.org/the-collapsingqparserplugin-solrs-new-high-performance-field-collapsing-postfilter/ The main issue is your multi-field criteria. So you may need to extend/overwrite the comparison

Re: Solr unit tests intermittently fail with error: java.lang.NoClassDefFoundError: org/eclipse/jetty/util/security/CertificateUtils

2015-01-13 Thread brian4
The problem is the jetty-util version included in the Solr build is 6.1.26, but this particular package is from version 7+. Looks like it is a bug in the build files for Solr. I fixed it by downloading jetty 7 separately and manually adding jetty-util-7.6.16.v20140903.jar to the end of my

Re: How to configure Solr PostingsFormat block size

2015-01-13 Thread Tom Burton-West
Thanks Hoss, This is starting to sound pretty complicated. Are you saying this is not doable with Solr 4.10? ...or at least: that's how it *should* work :) makes me a bit nervous about trying this on my own. Should I open a JIRA issue or am I probably the only person with a use case for

Re: Engage custom hit collector for special search processing

2015-01-13 Thread Joel Bernstein
You may also want to take a look at how AnalyticsQueries can be plugged in. This won't show you how to do the implementation but it will show you how you can plugin a custom collector. http://heliosearch.org/solrs-new-analyticsquery-api/ http://heliosearch.org/solrs-mergestrategy/ Joel Bernstein

Re: How to configure Solr PostingsFormat block size

2015-01-13 Thread Chris Hostetter
: This is starting to sound pretty complicated. Are you saying this is not : doable with Solr 4.10? it should be doable in 4.10, using a wrapper class like the one i mentioned below (delegating to Lucene51PostingsFormat instead of Lucene50PostingsFormat) ... it's just that the 4.10 APIs are