Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Rick Leir
On 2017-10-13 04:19 PM, Kevin Layer wrote: Amrit Sarkar wrote: Kevin, fileType => md is not recognizable format in SimplePostTool, anyway, moving on. OK, thanks. Looks like I'll have to abandon using solr for this project (or find another way to crawl the site). Thank you for all the help,

Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Kevin Layer
Amrit Sarkar wrote: >> Kevin, >> >> fileType => md is not recognizable format in SimplePostTool, anyway, moving >> on. OK, thanks. Looks like I'll have to abandon using solr for this project (or find another way to crawl the site). Thank you for all the help, though. I appreciate it. >> The

zero-day exploit security issue

2017-10-13 Thread Xie, Sean
Is there a tracking to address this issue for SOLR 6.6.x and 7.x? https://lucene.apache.org/solr/news.html#12-october-2017-please-secure-your-apache-solr-servers-since-a-zero-day-exploit-has-been-reported-on-a-public-mailing-list Sean Confidentiality Notice:: This email, including attachments,

Re: book on solr

2017-10-13 Thread Deepak Vohra
Use Docker with Kubernetes, which has autoscaling of Docker containers based on load. Docker image for Solr is https://hub.docker.com/_/solr/ On Thu, 10/12/17, Jay Potharaju wrote: Subject: book on solr To:

Re: Strange Behavior When Extracting Features

2017-10-13 Thread Michael Alcorn
I believe I've discovered a workaround. If you use: { "store": "redhat_efi_feature_store", "name": "case_description_issue_tfidf", "class": "org.apache.solr.ltr.feature.SolrFeature", "params": { "q":"{!dismax qf=text_tfidf}${text}" } }

Re: Parsing of rq queries in LTR

2017-10-13 Thread Michael Alcorn
I believe I've discovered a workaround. If you use: { "store": "redhat_efi_feature_store", "name": "case_description_issue_tfidf", "class": "org.apache.solr.ltr.feature.SolrFeature", "params": { "q":"{!dismax qf=text_tfidf}${text}" } }

Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Amrit Sarkar
Kevin, fileType => md is not recognizable format in SimplePostTool, anyway, moving on. The above is SAXParse, runtime exception. Nothing can be done at Solr end except curating your own data. Some helpful links:

Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Kevin Layer
Amrit Sarkar wrote: >> Kevin, >> >> I am not able to replicate the issue on my system, which is bit annoying >> for me. Try this out for last time: >> >> docker exec -it --user=solr solr bin/post -c handbook >> http://quadra.franz.com:9091/index.md -recursive 10 -delay 0 -filetypes html >> >>

Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Amrit Sarkar
Kevin, I am not able to replicate the issue on my system, which is bit annoying for me. Try this out for last time: docker exec -it --user=solr solr bin/post -c handbook http://quadra.franz.com:9091/index.md -recursive 10 -delay 0 -filetypes html and have Content-Type: "html" and "text/html",

Re: Concern on solr commit

2017-10-13 Thread Erick Erickson
Emir is spot on. Here's another thing though. You cannot see a document until after the commit happens as you well know. Pay close attention to this part of the error message: "Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later." The commit generating this error

Re: is there a way to remove deleted documents from index without optimize

2017-10-13 Thread Harry Yoo
Thanks for the clarification. I use ${lucene.version} in the solrconfig.xml and pass -Dlucene.version when I launch solr, to keep the versions. > On Oct 12, 2017, at 11:01 PM, Erick Erickson wrote: > > You can use the IndexUpgradeTool that ships with

Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Kevin Layer
Amrit Sarkar wrote: >> ah oh, dockers. They are placed under [solr-home]/server/log/solr/log in >> the machine. I haven't played much with docker, any way you can get that >> file from that location. I see these files: /opt/solr/server/logs/archived /opt/solr/server/logs/solr_gc.log.0.current

Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Amrit Sarkar
pardon: [solr-home]/server/log/solr.log Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Fri, Oct 13, 2017 at 8:10 PM, Amrit Sarkar wrote: > ah oh,

Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Amrit Sarkar
ah oh, dockers. They are placed under [solr-home]/server/log/solr/log in the machine. I haven't played much with docker, any way you can get that file from that location. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn:

Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Kevin Layer
Amrit Sarkar wrote: >> Hi Kevin, >> >> Can you post the solr log in the mail thread. I don't think it handled the >> .md by itself by first glance at code. Note that when I use the admin web interface, and click on "Logging" on the left, I just see a spinner that implies it's trying to retrieve

Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Kevin Layer
Amrit Sarkar wrote: >> Hi Kevin, >> >> Can you post the solr log in the mail thread. I don't think it handled the >> .md by itself by first glance at code. How do I extract the log you want? >> >> Amrit Sarkar >> Search Engineer >> Lucidworks, Inc. >> 415-589-9269 >> www.lucidworks.com >>

Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Amrit Sarkar
Hi Kevin, Can you post the solr log in the mail thread. I don't think it handled the .md by itself by first glance at code. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On

Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Kevin Layer
Amrit Sarkar wrote: >> Kevin, >> >> Just put "html" too and give it a shot. These are the types it is expecting: Same thing. >> >> mimeMap = new HashMap<>(); >> mimeMap.put("xml", "application/xml"); >> mimeMap.put("csv", "text/csv"); >> mimeMap.put("json", "application/json"); >>

Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Kevin Layer
Amrit Sarkar wrote: >> Reference to the code: >> >> . >> >> String rawContentType = conn.getContentType(); >> String type = rawContentType.split(";")[0]; >> if(typeSupported(type) || "*".equals(fileTypes)) { >> String encoding = conn.getContentEncoding(); >> >> . >> >> protected

Re: Appending fields to pre-existed document

2017-10-13 Thread Игорь Абрашин
Hi, Yeah, sure but what exactly should i utilize? Because as i see all of them require to use set or add in json..how can i perform that from dataimport or posting file from curl? Also we've tried to use version feature to combine both sources, but nothing at all 13 окт. 2017 г. 17:49

Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Amrit Sarkar
Ah! Only supported type is: text/html; encoding=utf-8 I am not confident of this either :) but this should work. See the code-snippet below: .. if(res.httpStatus == 200) { // Raw content type of form "text/html; encoding=utf-8" String rawContentType = conn.getContentType(); String

Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Amrit Sarkar
Kevin, Just put "html" too and give it a shot. These are the types it is expecting: mimeMap = new HashMap<>(); mimeMap.put("xml", "application/xml"); mimeMap.put("csv", "text/csv"); mimeMap.put("json", "application/json"); mimeMap.put("jsonl", "application/json"); mimeMap.put("pdf",

Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Amrit Sarkar
Reference to the code: . String rawContentType = conn.getContentType(); String type = rawContentType.split(";")[0]; if(typeSupported(type) || "*".equals(fileTypes)) { String encoding = conn.getContentEncoding(); . protected boolean typeSupported(String type) { for(String key :

Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Kevin Layer
Amrit Sarkar wrote: >> Strange, >> >> Can you add: "text/html;charset=utf-8". This is wiki.apache.org page's >> Content-Type. Let's see what it says now. Same thing. Verified Content-Type: quadra[git:master]$ wget -S -O /dev/null http://quadra:9091/index.md |& grep Content-Type

Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Amrit Sarkar
Strange, Can you add: "text/html;charset=utf-8". This is wiki.apache.org page's Content-Type. Let's see what it says now. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On

Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Kevin Layer
OK, so I hacked markserv to add Content-Type text/html, but now I get SimplePostTool: WARNING: Skipping URL with unsupported type text/html What is it expecting? $ docker exec -it --user=solr solr bin/post -c handbook http://quadra:9091/index.md -recursive 10 -delay 0 -filetypes md

Re: Several critical vulnerabilities discovered in Apache Solr (XXE & RCE)

2017-10-13 Thread Rick Leir
Hi all, What is the earliest version which was vulnerable? Thanks -- Rick -- Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Kevin Layer
Amrit Sarkar wrote: >> Kevin, >> >> You are getting NPE at: >> >> String type = rawContentType.split(";")[0]; //HERE - rawContentType is NULL >> >> // related code >> >> String rawContentType = conn.getContentType(); >> >> public String getContentType() { >> return

Re: Appending fields to pre-existed document

2017-10-13 Thread alessandro.benedetti
Hi, "And all what we got only a overwriting doc came first by new one. Ok just put overwrite=false to params, and dublicating docs appeare." What is exactly the doc you get ? Are the fields originally in the first doc before the atomic update stored ? This is what you need to use :

Re: Solr related questions

2017-10-13 Thread alessandro.benedetti
The only way Solr will fetch documents is through the Data Import Handler. Take a look to the URLDataSource[1] to see if it fits. Possibly you will need to customize it. [1]

Re: Appending fields to pre-existed document

2017-10-13 Thread Игорь Абрашин
Hi, Rich. Here what've got: Solr version 7.0.0 Fields definitions in schema.xml for dataimport datasource (our database) And batch of edentical fields url_path Fields definitions in schema.xml for updateExtract hadler And other field wich is not important in our case. So our goal to combine

Re: Solr related questions

2017-10-13 Thread startrekfan
Thank you for your answer. To 3.) The file is on server A, my program is on server B and solr is on server C. If I use a normal http(rest) post, my program has to fetch the file content from server A to Server B and then post it from server B to server C as there is no open connection between A

Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Amrit Sarkar
Kevin, You are getting NPE at: String type = rawContentType.split(";")[0]; //HERE - rawContentType is NULL // related code String rawContentType = conn.getContentType(); public String getContentType() { return getHeaderField("content-type"); } HttpURLConnection conn = (HttpURLConnection)

Re: Solr related questions

2017-10-13 Thread alessandro.benedetti
Nabble mutilated my reply : *Comment*: If you remove this field, you must _also_ disable the update log in solrconfig.xml or Solr won't start. _version_ and update log are required for SolrCloud *Comment*:points to the root document of a block of nested documents. Required for

Re: Concern on solr commit

2017-10-13 Thread Emir Arnautović
Hi Leo, Errors that you are seeing are related to frequent commits - new commits is issued before searcher for previous commit is opened and warmed. I haven’t looked in indexing code in a while, but if assume that it did not change, commits and writes are mutually exclusive - guarded by the

Geometries distance

2017-10-13 Thread Maruska Melucci
Hi I need to obtain distance between point and MULTILINESTRING. MULTILINESTRINGS are stored in Solr as geometry using JST library configured as follow: I'm using geodist to return distance between a coord an the geometries but the funciotn seems to work wrong, it doesn't return the right

Re: Solr related questions

2017-10-13 Thread alessandro.benedetti
1) "_version_" is not "unecessary", actually the contrary, it is fundamendal for Solr to work. The same for types you use across your field definitions. There was a time you could see these comments in the schema.xml (doesn't seem the case anymore): 2)

Re: Solr related questions

2017-10-13 Thread Rick Leir
1/ the _version_ field is necessary. 2/ there is a Solr api for editing the manged schema 3/ not having used solrnet, I suspect you can bypass it and use the solr REST api directly. Cheers -- Rick On October 13, 2017 5:40:26 AM EDT, startrekfan wrote: >Hello, > >I

Re: Concern on solr commit

2017-10-13 Thread Leo Prince
Hi Emir, Thanks for the response. We have specific near realtime search requirements, that is why we are explicitly invoking Solr commits. However we will try to improve on reducing the commits from application. In the meantime, the errors/warnings I mentioned in my previous mail; are they

Re: Appending fields to pre-existed document

2017-10-13 Thread Rick Leir
Hi Show us the solr version, field types, the handler definition, and the query you send. Any log entries? Cheers -- Rick On October 13, 2017 5:57:16 AM EDT, "Игорь Абрашин" wrote: >Hello, solr community. >We are getting strugled with updating already existing docs. For

Re: Solr related questions

2017-10-13 Thread Amrit Sarkar
Hi, 1.) I created a core and tried to simplify the managed-schema file. But if > I remove all "unecessary" fields/fieldtypes, I get errors like: field > "_version_" is missing, type "boolean" is missing and so on. Why do I have > to define this types/fields? Which fields/fieldtypes are required?

Appending fields to pre-existed document

2017-10-13 Thread Игорь Абрашин
Hello, solr community. We are getting strugled with updating already existing docs. For instance, we got indexed one jpg with tika parser and got batch of attributes. Then we want to index database datasource and append those fields to our document with the same uniquekey, stored at schema.xml.

Solr related questions

2017-10-13 Thread startrekfan
Hello, I have some Solr related questions: 1.) I created a core and tried to simplify the managed-schema file. But if I remove all "unecessary" fields/fieldtypes, I get errors like: field "_version_" is missing, type "boolean" is missing and so on. Why do I have to define this types/fields?

Re: Concern on solr commit

2017-10-13 Thread Emir Arnautović
Hi Leo, It is considered a bad practice to commit from your application. You should let Solr handle commits. There is a great article about soft and hard commits: https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Concern on solr commit

2017-10-13 Thread Leo Prince
Hi, I am new to community and thank you for letting me in. Let me get into my concern real quick. Please find my OS and Solr versions Ubuntu 14.04.4 LTS solr-spec 4.10.2 solr-impl 4.10.2 1634293 - mike - 2014-10-26 05:56:21 lucene-spec 4.10.2 lucene-impl 4.10.2 1634293 - mike - 2014-10-26

Accent insensitive search for greek characters

2017-10-13 Thread Chitra
Hi, I want to search greek characters(with accent insensitive) by removing or replacing accent marks with similar characters. Eg: when searching a greek accent word say *πῬοἲὅν*, we expect accent insensitive search ie need equivalent greek accent like *προιον* Moreover, I am not having

Re: book on solr

2017-10-13 Thread Rick Leir
Jay, get info on this with a search: https://www.google.ca/search?q=solr+shard+size cheers -- Rick On 2017-10-13 01:42 AM, Jay Potharaju wrote: Any blog or documentation also that would provide some basic rules or guidelines for scaling would also be great. Thanks Jay Potharaju