Re: Contrib module for Document Clustering

2016-04-06 Thread davidphilip cherian
Hi Joel, Right now, we are (web) crawling almost 85millions of documents and this can increase to double. Collection is plainly divided into shards and so while searching, its search across all shards. If it is possible for a system to distributed documents into shards based on documents

Re: Multiple data-config.xml in one collection?

2016-04-06 Thread Alexandre Rafalovitch
A million of collections is rather drastic, but just as a basic answer, you also have collection aliases (in SolrCloud mode): https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-CreateormodifyanAliasforaCollection You can also send request passing parameters in POST,

Re: Multiple data-config.xml in one collection?

2016-04-06 Thread Yangrui Guo
Yes URL length is also one of my concerns. If, say, I have a million of collections, must I specify all the collection names in the request to perform a search across all collections? The reason I want to combine data config into a single node is because I feel it is impractical to search large

Re: Solr 5.5.0: SearchHandler: Appending a Join query

2016-04-06 Thread Alexandre Rafalovitch
I think the easiest thing would be then to put 'q' in the invariant part and use parameter substitution to get the user query. Use either https://cwiki.apache.org/confluence/display/solr/Local+Parameters+in+Queries#LocalParametersinQueries-ParameterDereferencing or

Re: Multiple data-config.xml in one collection?

2016-04-06 Thread Alexandre Rafalovitch
I believe the config request for DIH is read on every import, so it is entirely possible to just have one handler and pass the parameter for which specific file to use as the configuration. It is also possible to actually pass the full configuration as a URL parameter dataConfig. Need to watch

Re: Adding configset in SolrCloud via API

2016-04-06 Thread Don Bosco Durai
Shawn, thank you. This was exactly what I was looking for. I am already using SolrJ, so the follow two lines did the job: ZkConfigManager configManager = new ZkConfigManager(cloudSolrClient.getZkStateReader().getZkClient()); configManager.uploadConfigDir(Paths.get(configPath), configName);

Re: Adding configset in SolrCloud via API

2016-04-06 Thread Shawn Heisey
On 4/6/2016 3:26 PM, Don Bosco Durai wrote: > I want to automate the entire process from my Java process which is not > running on any of the servers were SolrCloud is running. In short, I don’t > have access to bin/solr or server/scripts/cloud-scripts, etc from my > application. So I was

Re: Adding configset in SolrCloud via API

2016-04-06 Thread John Bickerstaff
Hmmm...Not sure I understand, but it sounds like you've found the best solution for the limitations you're experiencing... On Wed, Apr 6, 2016 at 4:38 PM, Don Bosco Durai wrote: > My challenge is, the server where my application is running doesn’t have > Solr bits installed. >

Re: Adding configset in SolrCloud via API

2016-04-06 Thread Don Bosco Durai
My challenge is, the server where my application is running doesn’t have Solr bits installed. Right now I am asking users to install (just unzip) solr on any server and I give them a shell script to run the script from command line before starting my application. It is inconvenient, so I

Re: using solr AnalyticsQuery API vs facet API

2016-04-06 Thread sudsport s
Adding Yonik, I almost implemented custom aggregate function using new facet API but later on got runtime exceptions as "FacetContext" is not public. so looks like Facet api components can't be created as external plugins. I am successful using AnalyticsQueryAPI to perform what I want. Yonik

Re: Adding configset in SolrCloud via API

2016-04-06 Thread John Bickerstaff
Therefore, this becomes possible: http://stackoverflow.com/questions/525212/how-to-run-unix-shell-script-from-java-code Hackish, but certainly doable... Given there's no API... On Wed, Apr 6, 2016 at 3:44 PM, John Bickerstaff wrote: > Yup - just tested - that command

Re: Adding configset in SolrCloud via API

2016-04-06 Thread John Bickerstaff
Yup - just tested - that command runs fine with Solr NOT running... On Wed, Apr 6, 2016 at 3:41 PM, John Bickerstaff wrote: > If you can get to the IP addresses from your application, then there's > probably a way... Do you mean you're firewalled off or in some other

Re: Adding configset in SolrCloud via API

2016-04-06 Thread John Bickerstaff
If you can get to the IP addresses from your application, then there's probably a way... Do you mean you're firewalled off or in some other way unable to access the Solr box IP's from your Java application? If you're looking to do "automated build of virtual machines" there are some tools like

Re: Saving Solr filter query.

2016-04-06 Thread John Bickerstaff
Right... You can store that anywhere - but at least consider not storing it in your existing SOLR collection just because it's there... It's not really the same kind of data -- it's application meta-data and/or user-specific data... Getting it out later will be more difficult than if you store

Re: Adding configset in SolrCloud via API

2016-04-06 Thread Don Bosco Durai
I have SolrCloud pre-installed. I need to create a collection, but before that I need to load the config into zookeeper. I want to automate the entire process from my Java process which is not running on any of the servers were SolrCloud is running. In short, I don’t have access to bin/solr

Re: Saving Solr filter query.

2016-04-06 Thread Erick Erickson
That's more of an app-level feature, there's nothing in Solr that does this for you. Some people have used a different Solr collection to store the queries as strings for display, but that's again something you build on top of Solr, not a core feature. Best, Erick On Wed, Apr 6, 2016 at 2:32

Re: Update Speed: QTime 1,000 - 5,000

2016-04-06 Thread Erick Erickson
you can mitigate the impact of throwing away caches on soft commits by doing appropriate autowarming, both the newSearcher and cache settings in solrconfig.xml. Be aware that you don't want to go overboard here, I'd start with 20 or so as the autowarm counts for queryResultCache and filterCache.

Re: Adding configset in SolrCloud via API

2016-04-06 Thread Erick Erickson
As of Solr 5.5 the bin/solr script can do this, see: https://cwiki.apache.org/confluence/display/solr/Solr+Start+Script+Reference It's still not quite what you're looking for, but uploading arbitrary xml scripts through a browser is a security issue, so it's possible there will never be an API

Re: Adding configset in SolrCloud via API

2016-04-06 Thread Anshum Gupta
As of now, there's no way to do so. There were some efforts on those lines but it's been on hold. -Anshum > On Apr 6, 2016, at 12:21 PM, Don Bosco Durai wrote: > > Is there an equivalent of server/scripts/cloud-scripts/zkcli.sh -zkhost > $zk_host -cmd upconfig -confdir

Re: MLT Query Parser

2016-04-06 Thread Shawn Heisey
On 4/6/2016 11:07 AM, shamik wrote: > Thanks Alessandro, that answers my doubt. in a nutshell, to make MLT Query > parser work, you need to know the document id. I'm just curious as why this > constraint has been added. This will not work for a bulk of use cases. For > e.g. if we are trying to

Re: How to implement Autosuggestion

2016-04-06 Thread chandan khatri
Hi Alessandro, Thanks for replying! Here are my answers inline. 1. "First of all, simple string autosuggestion or document autosuggestion ? ( with more additional field to show then the label) Document autosuggestions 2. Are you interested in the analysis for the text to suggest ? Fuzzy

Re: How to implement Autosuggestion

2016-04-06 Thread chandan khatri
Hi Alessandro, Thanks for replying! Here are my answers inline. On Mon, Apr 4, 2016 at 6:34 PM, Alessandro Benedetti wrote: > Hi Chandan, > I will answer as my previous answer to a similar topic that got lost : > "First of all, simple string autosuggestion or

Adding configset in SolrCloud via API

2016-04-06 Thread Don Bosco Durai
Is there an equivalent of server/scripts/cloud-scripts/zkcli.sh -zkhost $zk_host -cmd upconfig -confdir $config_folder -confname $config_name using APIs? I want to bootstrap by uploading the configs via API. Once the configs are uploaded, I am now able to do everything else via API. Thanks

Re: Contrib module for Document Clustering

2016-04-06 Thread Joel Bernstein
I don't know of any contrib or module that does this. Can you describe why you'd want to route documents to shards based on similarity? What advantages would you get by using this approach? Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Apr 6, 2016 at 1:36 PM, davidphilip cherian <

Re: Contrib module for Document Clustering

2016-04-06 Thread davidphilip cherian
Any thoughts? On Tue, Apr 5, 2016 at 9:05 PM, davidphilip cherian < davidphilipcher...@gmail.com> wrote: > Hi, > > Is there any contribution(open source contrib module) that routes > documents to shards based on document similarity technique? Or any > suggestions that integrates mahout to solr

Re: search design question

2016-04-06 Thread Reth RM
Why not copy the field values of category, title, features, spec into a common text field and then search on that field. Otherwise use a edismax query parser and search with user search string on all the above fields may be by boosting title, category and specs field in order to get relevant

Re: How to use TZ parameter in a query

2016-04-06 Thread Chris Hostetter
Please note the exact description of hte property on the URL you mentioned.. "The TZ parameter can be specified to override the default TimeZone (UTC) used for the purposes of adding and rounding in date math" The newer ref guide docs for this param also explain...

Re: MLT Query Parser

2016-04-06 Thread shamik
Thanks Alessandro, that answers my doubt. in a nutshell, to make MLT Query parser work, you need to know the document id. I'm just curious as why this constraint has been added. This will not work for a bulk of use cases. For e.g. if we are trying to generate MLT based on a text or a keyword, how

Re: CompositId router

2016-04-06 Thread John Bickerstaff
I think that's how I would approach it. I used command-line instead of rest api to create collection, but I think that just generates rest api command via curl... so that will be no different as far as I can tell - I'm just more comfortable on the command line. Step 8 is the thing I'm not sure

Re: MLT Query Parser

2016-04-06 Thread Alessandro Benedetti
Wait a second, and let's avoid any confusion. We can have different input for a More Like This Request Handler ( if this is what you were using). 1) the Id of the document we want to find similar documents to 2) a bunch of text Then you have a lot of parameters that will affect the MLT core.

Re: CompositId router

2016-04-06 Thread John Bickerstaff
I'll agree with Shawn too - munging Zookeeper by hand can lead to VERY unexpected results... My recommendation would be to start fresh with a new 5.x setup and a new /chroot in Zookeeper. (This can be deleted and recreated repeatedly if necessary - I know because I did... a lot... before I got

Re: CompositId router

2016-04-06 Thread John Bickerstaff
I recently upgraded from 4.x to 5.5 -- it was a pain to figure it out, but it turns out to be fairly straightforward... Caveat: Because I run all my data into Kafka first, I was able to easily re-create my collections by running a microservice that pulls from Kafka and dumps into Solr. I have a

RE: BYOPW in security.json

2016-04-06 Thread Davis, Daniel (NIH/NLM) [C]
I'm bordering on development post, but I want to write an Authentication Plugin that uses Proxy Authentication and a White List. So, it will accept a request header such as REMOTE_USER as the username from certain hosts, by default 127.0.0.1, ::1. I also thought about having a whitelist of IPs

Re: Can't get phrase field boosting to work using edismax

2016-04-06 Thread Jack Krupansky
I haven't traced through all the code recently, so I can't dispute Jan if he knows a place that checks the output of the pf phrase analysis to see if it is a single term, but... the INPUT to pf is definitely multiple clauses. Regardless of the use of the keyword tokenizer, the query parser sees

Re: Can't get phrase field boosting to work using edismax

2016-04-06 Thread Shawn Heisey
On 4/6/2016 7:13 AM, jimi.hulleg...@svensktnaringsliv.se wrote: > Ah, thanks. It never occurred to me that clicking on the text "Create" would > give me a different result compared to clicking on the arrow. In my mind, > "Create" was simply the label, and the arrow indicating a dropdown option

Re: Solr 5.5.0: SearchHandler: Appending a Join query

2016-04-06 Thread Mikhail Khludnev
I suppose q= is singular param doesn't accept multiple values. On Wed, Apr 6, 2016 at 1:01 PM, Anand Chandrashekar wrote: > Greetings. > > 1) A join query creates an array of "q" parameter. For example, the query > > >

RE: Can't get phrase field boosting to work using edismax

2016-04-06 Thread jimi.hullegard
On Wednesday, April 6, 2016 2:50 PM, apa...@elyograg.org wrote: > > If you can only create a service desk request, then you might be clicking the > "Service Desk" menu item, > or maybe you're clicking the little down arrow on the right side of the big > red "Create" button. > Try clicking

RE: BYOPW in security.json

2016-04-06 Thread Oakley, Craig (NIH/NLM/NCBI) [C]
Thanks. I googled to look for examples of how to proceed, and notice that you opened SOLR-8951 Thanks again -Original Message- From: Jan Høydahl [mailto:jan@cominvent.com] Sent: Wednesday, April 06, 2016 4:18 AM To: solr-user@lucene.apache.org Subject: Re: BYOPW in security.json

Re: CompositId router

2016-04-06 Thread Shawn Heisey
On 4/5/2016 3:08 PM, Anuj Lal wrote: > I am new to solr. Need some advice from more experienced solr team members > > I am upgrading 4.4 solr cluster to 5.5 > > One of the step I am doing for upgrade is to bootstrap from existing 4.4 solr > home ( after upgrading solr installation to 5.5)

Re: Can't get phrase field boosting to work using edismax

2016-04-06 Thread Shawn Heisey
On 4/6/2016 2:35 AM, jimi.hulleg...@svensktnaringsliv.se wrote: > I guess I can conclude that this is a bug. But I wasn't able to report it in > Jira. I just got to some servicedesk form > (https://issues.apache.org/jira/servicedesk/customer/portal/5/create/27) that > didn't seem related to

Re: Solr 5.5.0: SearchHandler: Appending a Join query

2016-04-06 Thread Stefan Matheis
Anand, have a look at the example schema, there is a section that explains "invariants" which could be one solution to your question. -Stefan On Wed, Apr 6, 2016 at 12:01 PM, Anand Chandrashekar wrote: > Greetings. > > 1) A join query creates an array of "q" parameter.

Re: Can't get phrase field boosting to work using edismax

2016-04-06 Thread Jan Høydahl
> Oh, hang on... If a phrase is defined as multiple tokens, and pf is used for > phrase boosting, does that mean that even with a regular tokenizer the pf > won't work for fields that only contain one word? For example if the title of > one document is "John", and the user searches for 'John'

Re: Update Speed: QTime 1,000 - 5,000

2016-04-06 Thread Alessandro Benedetti
On Wed, Apr 6, 2016 at 7:53 AM, Robert Brown wrote: > The QTime's are from the updates. > > We don't have the resource right now to switch to SolrJ, but I would > assume only sending updates to the leaders would take some redirects out of > the process, How do you route

Re: How to use TZ parameter in a query

2016-04-06 Thread Alessandro Benedetti
At the moment, the tz parameter will be used to calculate the UTC date in the query, based on the tz supplied. In the index the dates are in UTC. To show the dates in the same timezone we query, we should implement a DocTransformer[1] . This DocTransformer will check for all ( or a subset) of date

Solr 5.5.0: SearchHandler: Appending a Join query

2016-04-06 Thread Anand Chandrashekar
Greetings. 1) A join query creates an array of "q" parameter. For example, the query http://localhost:8983/solr/gettingstarted/select?q=projectdata%3A%22top+secret+data2%22=%7B!join+from=locatorByUser+to=locator%7Dusers=joe creates the following array elements for the "q" parameter. [array

Re: How to use TZ parameter in a query

2016-04-06 Thread Bogdan Marinescu
I understand. Would be nice though :) Thanks. On 04/06/2016 11:26 AM, jimi.hulleg...@svensktnaringsliv.se wrote: I think that this parameter is only used to interpret the dates provided in the query, like query filters. At least that is how I interpret the wiki text. Your interpretation

Re: search design question

2016-04-06 Thread Binoy Dalal
I understand. Although I am not exactly sure how to solve this one, this should serve as a helpful starting point: https://lucidworks.com/resources/webinars/natural-language-search-with-solr/ On Wed, 6 Apr 2016, 11:27 Midas A, wrote: > thanks Binoy for replying , > > i am

Saving Solr filter query.

2016-04-06 Thread Pritam Kute
Hi, I have designed one web page on which user can search and filter his data based on some term facets. I am using Apache Solr 5.3.1 for the same. It is working perfectly fine. Now my requirement is to save the query which I have executed on Solr, so, in future, if I need to search the same

RE: How to use TZ parameter in a query

2016-04-06 Thread jimi.hullegard
I think that this parameter is only used to interpret the dates provided in the query, like query filters. At least that is how I interpret the wiki text. Your interpretation makes more sense in general though, it would be nice if it was possible to modify the timezone for both the query and

RE: Can't get phrase field boosting to work using edismax

2016-04-06 Thread jimi.hullegard
OK, well I'm not sure I agree with you. First of all, you ask me to point my "pf" towards a tokenized field, but I already do that (the fact that all text is tokenized into a single token doesn't change that fact). Also, I don't agree with the view that a single term phrase never is

How to use TZ parameter in a query

2016-04-06 Thread Bogdan Marinescu
Hi, According to the wiki https://wiki.apache.org/solr/CoreQueryParameters#TZ I can use the TZ param to specify the timezone. I tried to make a query and put in the raw section TZ=Europe/Berlin or any other found in https://en.wikipedia.org/wiki/List_of_tz_database_time_zones but no luck.

Re: Can't get phrase field boosting to work using edismax

2016-04-06 Thread Jan Høydahl
Hi, Phrase match via “pf” requires the target field to contain a phrase. A phrase is defined as multiple tokens. Yours does not contain a phrase since you use the KeywordTokenizer, leaving only one token in the field. eDismax pf will thus never kick in. Please point your “pf” towards a

RE: Can't get phrase field boosting to work using edismax

2016-04-06 Thread jimi.hullegard
I guess I can conclude that this is a bug. But I wasn't able to report it in Jira. I just got to some servicedesk form (https://issues.apache.org/jira/servicedesk/customer/portal/5/create/27) that didn't seem related to solr in any way, (the affects/fix version fields didn't correspond to any

Re: BYOPW in security.json

2016-04-06 Thread Jan Høydahl
Hi Note that storing the user names and passwords in security.json is just one implementation, to easily get started. It uses the Sha256AuthenticationProvider class, which is pluggable. That means that if you require Basic Auth with some form of self-service management, you could/should add

Re: Update Speed: QTime 1,000 - 5,000

2016-04-06 Thread Robert Brown
The QTime's are from the updates. We don't have the resource right now to switch to SolrJ, but I would assume only sending updates to the leaders would take some redirects out of the process, I can regularly query for the collection status to know who's who. I'm now more interested in the