Re: Loading Solr Analyzer from RuntimeLib Blob

2015-09-11 Thread Shalin Shekhar Mangar
That is a current limitation of the blob store API. It can only be used to load plugins in solrconfig.xml. It does not support loading schema plugins such as analyzers, tokenizers. Can you open an issue? On Fri, Sep 11, 2015 at 9:24 AM, Steve Davids wrote: > Accidentally sent

Re: Bug or Operator Error?

2015-09-11 Thread Erick Erickson
Several ideas, all shots in the dark because to analyze this we need the schema definitions and the result of your query with =true added. In particular you'll see the "parsed query" section near the bottom, and often the parsed query isn't quite what you think it is. In particular this is often

Re: Help storing + highlighting search results in PDF newspapers

2015-09-11 Thread Erick Erickson
Yeah, there are a lot of moving parts to connect Let's see the highlight configuration you're using. Should be in your solrconfig.xml file for the request handler you're using. Are you calling out the field you want highlighted in the hl.fl list? Unfortunately getting specific fields

Re: Duplicate Documents

2015-09-11 Thread Erick Erickson
Are you by any chance using the MERGEINDEXES core admin call? Or using MapReduceIndexerTool? Neither of those delete duplicates This is a fundamental part of Solr though, so it's virtually certain that there's some innocent-seeming thing you're doing that's causing this... Best, Erick On

Re: Duplicate Documents

2015-09-11 Thread Erick Erickson
OK, this makes no sense whatsoever, so I"m missing something. commitWithin shouldn't matter at all, there's code to handle multiple updates between commits. I'm _really_ shooting in the dark here, but... > did you perhaps change the definition from the default "id" to "key" without blowing

Re: Detect term occurrences

2015-09-11 Thread Sujit Pal
Hi Francisco, >> I have many drug products leaflets, each corresponding to 1 product. In the other hand we have a medical dictionary with about 10^5 terms. I want to detect all the occurrences of those terms for any leaflet document. Take a look at SolrTextTagger for this use case.

Re: Bug or Operator Error?

2015-09-11 Thread Erick Erickson
Oh my. I'll leave it to the DIH guys to suggest whether there's something that can be done with pure DIH, and offer a couple of alternatives: 1> You could put a MappingCharFilterFactory in your analysis chain. In the mapping file you can map things like: "%20" => " " that would work with DIH as

Re: Bug or Operator Error?

2015-09-11 Thread Mark Fenbers
Additional experimenting lead me to the discovery that /dataimport does *not* index words with a preceding %20 (a URL-encoded space), or in fact *any* preceding %xx encoding. I can probably replace each %20 with a '+' in each record of my database -- the dataimporter/indexer doesn't sneeze at

Re: Duplicate Documents

2015-09-11 Thread Vivek Pathak
At query time, you could externally roll in the dups when they have the same signature. If you define your use case, it might be easier.. On 09/11/2015 11:55 AM, Shawn Heisey wrote: On 9/11/2015 9:10 AM, Mr Havercamp wrote: fieldType def: It is not SolrCloud. As

Re: Search results differs with sorting on pagination.

2015-09-11 Thread Upayavira
Are you getting out of order scores? Or does the score change between requests? Can you show us some results that you are getting so we might see what's going on? Upayavira On Fri, Sep 11, 2015, at 05:07 AM, Modassar Ather wrote: > Thanks Erick and Upayavira for the responses. One thing which I

Solr authentication - Error 401 Unauthorized

2015-09-11 Thread Merlin Morgenstern
I have secured solr cloud via basic authentication. Now I am having difficulties creating cores and getting status information. Solr keeps telling me that the request is unothorized. However, I have access to the admin UI after login. How do I configure solr to use the basic authentication

Re: Detect term occurrences

2015-09-11 Thread Upayavira
It sounds to me like you are wanting to *filter* your document to only include terms within that medical dictionary. Or to have a keyword field based upon those of your 100k terms that appear in that doc. Synonyms are your saviour, if that's the case. Create a synonyms list for your terms, they

Morphline for Indexing Nested Document Structure

2015-09-11 Thread Lewin Joy (TMS)
Hi, I am having a huge data of about 600 Million documents. These documents are relational and I need to maintain the relation in solr. So, I am Indexing them as nested documents. It has nested documents within nested documents. Now, my problem is how to index them. We are on Cloudera Solr 4.4

Re: SolrJ CollectionAdminRequest.Reload fails

2015-09-11 Thread Anshum Gupta
This certainly can be fixed. Can you create a JIRA for the same? There might be other calls which might need fixing on similar lines. On Fri, Sep 11, 2015 at 2:32 PM, Shawn Heisey wrote: > On 9/11/2015 3:12 PM, Hendrik Haddorp wrote: > > I'm using Solr 5.3.0 and noticed

Re: SolrJ CollectionAdminRequest.Reload fails

2015-09-11 Thread Hendrik Haddorp
the full stack is: [9/11/15 23:36:17:406 CEST] 0216 SystemErr R Caused by: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://xxx.xxx.xxx.xxx:10001/solr: Missing required parameter: name [9/11/15 23:36:17:406 CEST] 0216 SystemErr R

Re: Morphline for Indexing Nested Document Structure

2015-09-11 Thread Mikhail Khludnev
Hello Lewin, Block Join support is released in Solr 4.5. On Fri, Sep 11, 2015 at 9:05 PM, Lewin Joy (TMS) wrote: > Hi, > > I am having a huge data of about 600 Million documents. > These documents are relational and I need to maintain the relation in solr. > > So, I am

Re: SolrJ CollectionAdminRequest.Reload fails

2015-09-11 Thread Hendrik Haddorp
I created https://issues.apache.org/jira/browse/SOLR-8042 On 11/09/15 23:41, Anshum Gupta wrote: > This certainly can be fixed. Can you create a JIRA for the same? There > might be other calls which might need fixing on similar lines. > > On Fri, Sep 11, 2015 at 2:32 PM, Shawn Heisey

SolrJ CollectionAdminRequest.Reload fails

2015-09-11 Thread Hendrik Haddorp
Hi, I'm using Solr 5.3.0 and noticed that the following code does not work with Solr Cloud: CollectionAdminRequest.Reload reloadReq = new CollectionAdminRequest.Reload(); reloadReq.process(client, collection); It complains that the name parameter is required. When adding

RE: Morphline for Indexing Nested Document Structure

2015-09-11 Thread Lewin Joy (TMS)
Oh Yes. We are upgrading Cloudera to get solr 4.10 just to get this block join feature. But, how do I index a nested document to use for block join for this huge a dataset? I could not find anyway to sculpt the morphline file for this use case. Thank you for the reply, Mikhail -Lewin

Re: How to secure Admin UI with Basic Auth in Solr 5.3.x

2015-09-11 Thread Anshum Gupta
Hi Merlin, Solr 5.2.x only supported Kerberos out of the box and introduced a framework to write your own authentication/authorization plugin. If you don't use Kerberos, the only sensible way forward for you would be to wait for the 5.3.1 release to come out and then move to it. Until then, or

Re: Morphline for Indexing Nested Document Structure

2015-09-11 Thread Mikhail Khludnev
You need to override org.apache.solr.morphlines.solr.LoadSolrBuilder.LoadSolr.doProcess(Record). Now LoadSolrBuilder.LoadSolr.convert(Record) copies all record fields into SolrInputDocuments fields. SolrInputDocument.addChildDocument(SolrInputDocument) nests a doc. On Fri, Sep 11, 2015 at 11:27

Re: Detect term occurrences

2015-09-11 Thread simon
+1 on Sujit's recommendation: we have a similar use case (detecting drug names / disease entities /MeSH terms ) and have been using the SolrTextTagger with great success. We run a separate Solr instance as a tagging service and add the detected tags as metadata fields to a document before it is

adding fields to a managed schema using solr cloud

2015-09-11 Thread Hendrik Haddorp
Hi, I have a simple Solr 5.3 cloud setup with two nodes using a manged schema. I'm creating a collection using a schema that initially only contains the id field. When documents get added I'm dynamically adding the required fields. Currently this fails quite consistently as in bug SOLR-7536 but

Re: SolrJ CollectionAdminRequest.Reload fails

2015-09-11 Thread Shawn Heisey
On 9/11/2015 3:12 PM, Hendrik Haddorp wrote: > I'm using Solr 5.3.0 and noticed that the following code does not work > with Solr Cloud: > CollectionAdminRequest.Reload reloadReq = new > CollectionAdminRequest.Reload(); > reloadReq.process(client, collection); > > It complains that the

Re: Detect term occurrences

2015-09-11 Thread Francisco Andrés Fernández
Many thanks pals. I will walk some of those ways (and return with new questions) ;) Best regards, Francisco El vie., 11 de sept. de 2015 a la(s) 5:41 a. m., Upayavira escribió: > It sounds to me like you are wanting to *filter* your document to only > include terms within

RE: Stemmer and stopword Development

2015-09-11 Thread Imtiaz Shakil Siddique
Thank you all for your precious advice. For now I'll just stick with building a stemmer and test the solr search results. Imtiaz Shakil Siddique On Sep 11, 2015 3:20 AM, "Davis, Daniel (NIH/NLM) [C]" wrote: > Stop words for international indexing seem not too useful to me

Re: Duplicate Documents

2015-09-11 Thread Mr Havercamp
Thanks for the suggestions. No, not using MERGEINDEXES nor MapReduceIndexerTool. I've pasted the XML in case there is something broken there (cut down for brevity, i.e. the "..."): 123456789/3Test SubmissionTest Submission11Test Collectiontest collection|||Test CollectionTest Collectionyoung,

Re: Duplicate Documents

2015-09-11 Thread Mr Havercamp
I'm wondering if the commitWithin is causing issues. On 11 September 2015 at 18:52, Mr Havercamp wrote: > Thanks for the suggestions. No, not using MERGEINDEXES nor > MapReduceIndexerTool. > > I've pasted the XML in case there is something broken there (cut > down for

Re: Detect term occurrences

2015-09-11 Thread Francisco Andrés Fernández
Thanks! El vie, sep 11, 2015 14:39, Sujit Pal escribió: > Hi Francisco, > > >> I have many drug products leaflets, each corresponding to 1 product. In > the > other hand we have a medical dictionary with about 10^5 terms. > I want to detect all the occurrences of those

Re: How to secure Admin UI with Basic Auth in Solr 5.3.x

2015-09-11 Thread Merlin Morgenstern
OK, I downgraded to solr 5.2.x Unfortunatelly still no luck. I followed 2 aproaches: 1. Secure it the old fashioned way like described here: http://stackoverflow.com/questions/28043957/how-to-set-apache-solr-admin-password 2. Using the Basic Authentication Plugin like described here:

RE: Solr Join between two indexes taking too long.

2015-09-11 Thread Russell Taylor
It will take a little while to set-up a 5.3 version, hopefully I'll have some results later next week. From: Mikhail Khludnev [mkhlud...@griddynamics.com] Sent: 11 September 2015 12:59 To: Russell Taylor Subject: Re: Solr Join between two indexes taking too long.

Re: How to secure Admin UI with Basic Auth in Solr 5.3.x

2015-09-11 Thread Noble Paul
There were some bugs with the 5.3.0 release and 5.3.1 is in the process of getting released. try out the option #2 with the RC here https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-5.3.1-RC1-rev1702389/solr/ On Fri, Sep 11, 2015 at 5:16 PM, Merlin Morgenstern

RE: Solr Join between two indexes taking too long.

2015-09-11 Thread Russell Taylor
I'll try that Thanks Upayavira. From: Upayavira [u...@odoko.co.uk] Sent: 09 September 2015 19:30 To: solr-user@lucene.apache.org Subject: Re: Solr Join between two indexes taking too long. I've never reviewed that join query debug info - very interesting.

Bug or Operator Error?

2015-09-11 Thread Mark Fenbers
Greetings! So, I've created my first index and am able to search programmatically (through SolrJ) and through the Web interface. (Yay!) I get non-empty results for my searches! My index was built from database records using /dataimport?command=full-import. I have 9936 records in the table

Re: Duplicate Documents

2015-09-11 Thread Mr Havercamp
Running 4.8.1. I am experiencing the same problem where I get duplicates on index update despite using overwrite=true when adding existing documents. My duplicate ratio is a lot higher with maybe 25 - 50% of records having duplicates (and as the index continues to run the duplicates increase from

Re: Detect term occurrences

2015-09-11 Thread Alexandre Rafalovitch
Assuming the medical dictionary is constant, I would do a copyField of text into a separate field and have that separate field use: http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/analysis/miscellaneous/KeepWordFilterFactory.html with words coming from the dictionary (normalized).

Re: How to secure Admin UI with Basic Auth in Solr 5.3.x

2015-09-11 Thread Merlin Morgenstern
Thank you for the info. I have already downgraded to 5.2.x as this is a production setup. Unfortunatelly I have the same trouble there ... Any suggestions how to fix this? What is the recommended procedure in securing the admin gui on prod setups? 2015-09-11 14:26 GMT+02:00 Noble Paul

Re: Duplicate Documents

2015-09-11 Thread Shawn Heisey
On 9/11/2015 8:25 AM, Mr Havercamp wrote: > Running 4.8.1. I am experiencing the same problem where I get duplicates on > index update despite using overwrite=true when adding existing documents. > My duplicate ratio is a lot higher with maybe 25 - 50% of records having > duplicates (and as the

RE: How to secure Admin UI with Basic Auth in Solr 5.3.x

2015-09-11 Thread Davis, Daniel (NIH/NLM) [C]
The authorization plugin is new in Solr 5.3.It is hard to describe a secure Solr 5.2.1 environment simply - the basics are to protect /solr by placing it behind Apache httpd or nginx, and also a port-based firewall. I am most familiar with Apache httpd and Linux/RedHat family. Within the

Re: Duplicate Documents

2015-09-11 Thread Mr Havercamp
Hi Shawn Thanks for your response. fieldType def: It is not SolrCloud. Cheers Hayden On 11 September 2015 at 16:35, Shawn Heisey wrote: > On 9/11/2015 8:25 AM, Mr Havercamp wrote: > > Running 4.8.1. I am experiencing the same problem where I get

Help storing + highlighting search results in PDF newspapers

2015-09-11 Thread Colin 't Hart
Hi, I'm having trouble negotiating the steep Solr learning curve... 1. I'm trying to store scanned and OCRed newspapers in PDF format into Solr for full-text searching. I've tried most (all?) of the examples and sample configurations that come with Solr 5.3.0 and I can upload the PDFs. Searching

Re: Duplicate Documents

2015-09-11 Thread Shawn Heisey
On 9/11/2015 9:10 AM, Mr Havercamp wrote: > fieldType def: > > > sortMissingLast="true" /> > > It is not SolrCloud. As long as it's not a distributed index, I can't think of any problem those field/type definitions might cause. Even if it were distributed and you had the same