Toby,
Your mention of "-recursive" causing a problem reminded me of a simple
crawl (of the 7.0 Ref Guide) using bin/post I was trying to get to
work the other day and couldn't.
The order of the parameters seems to make a difference with what error
you get (this is using 7.1):
1. "./bin/post -c
Amrit Sarkar wrote
> The above is SAXParse, runtime exception. Nothing can be done at Solr end
> except curating your own data.
I'm trying to replace a solr-4.6.0 system (which has been working
brilliantly for 3 years!) with solr-7.1.0. I'm running into this exact same
problem.
I do not believe
On 2017-10-13 04:19 PM, Kevin Layer wrote:
Amrit Sarkar wrote:
Kevin,
fileType => md is not recognizable format in SimplePostTool, anyway, moving
on.
OK, thanks. Looks like I'll have to abandon using solr for this
project (or find another way to crawl the site).
Thank you for all the help,
Amrit Sarkar wrote:
>> Kevin,
>>
>> fileType => md is not recognizable format in SimplePostTool, anyway, moving
>> on.
OK, thanks. Looks like I'll have to abandon using solr for this
project (or find another way to crawl the site).
Thank you for all the help, though. I appreciate it.
>> The
Kevin,
fileType => md is not recognizable format in SimplePostTool, anyway, moving
on.
The above is SAXParse, runtime exception. Nothing can be done at Solr end
except curating your own data.
Some helpful links:
Amrit Sarkar wrote:
>> Kevin,
>>
>> I am not able to replicate the issue on my system, which is bit annoying
>> for me. Try this out for last time:
>>
>> docker exec -it --user=solr solr bin/post -c handbook
>> http://quadra.franz.com:9091/index.md -recursive 10 -delay 0 -filetypes html
>>
>>
Kevin,
I am not able to replicate the issue on my system, which is bit annoying
for me. Try this out for last time:
docker exec -it --user=solr solr bin/post -c handbook
http://quadra.franz.com:9091/index.md -recursive 10 -delay 0 -filetypes html
and have Content-Type: "html" and "text/html",
Amrit Sarkar wrote:
>> ah oh, dockers. They are placed under [solr-home]/server/log/solr/log in
>> the machine. I haven't played much with docker, any way you can get that
>> file from that location.
I see these files:
/opt/solr/server/logs/archived
/opt/solr/server/logs/solr_gc.log.0.current
pardon: [solr-home]/server/log/solr.log
Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
On Fri, Oct 13, 2017 at 8:10 PM, Amrit Sarkar
wrote:
> ah oh,
ah oh, dockers. They are placed under [solr-home]/server/log/solr/log in
the machine. I haven't played much with docker, any way you can get that
file from that location.
Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn:
Amrit Sarkar wrote:
>> Hi Kevin,
>>
>> Can you post the solr log in the mail thread. I don't think it handled the
>> .md by itself by first glance at code.
Note that when I use the admin web interface, and click on "Logging"
on the left, I just see a spinner that implies it's trying to retrieve
Amrit Sarkar wrote:
>> Hi Kevin,
>>
>> Can you post the solr log in the mail thread. I don't think it handled the
>> .md by itself by first glance at code.
How do I extract the log you want?
>>
>> Amrit Sarkar
>> Search Engineer
>> Lucidworks, Inc.
>> 415-589-9269
>> www.lucidworks.com
>>
Hi Kevin,
Can you post the solr log in the mail thread. I don't think it handled the
.md by itself by first glance at code.
Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
On
Amrit Sarkar wrote:
>> Kevin,
>>
>> Just put "html" too and give it a shot. These are the types it is expecting:
Same thing.
>>
>> mimeMap = new HashMap<>();
>> mimeMap.put("xml", "application/xml");
>> mimeMap.put("csv", "text/csv");
>> mimeMap.put("json", "application/json");
>>
Amrit Sarkar wrote:
>> Reference to the code:
>>
>> .
>>
>> String rawContentType = conn.getContentType();
>> String type = rawContentType.split(";")[0];
>> if(typeSupported(type) || "*".equals(fileTypes)) {
>> String encoding = conn.getContentEncoding();
>>
>> .
>>
>> protected
Ah!
Only supported type is: text/html; encoding=utf-8
I am not confident of this either :) but this should work.
See the code-snippet below:
..
if(res.httpStatus == 200) {
// Raw content type of form "text/html; encoding=utf-8"
String rawContentType = conn.getContentType();
String
Kevin,
Just put "html" too and give it a shot. These are the types it is expecting:
mimeMap = new HashMap<>();
mimeMap.put("xml", "application/xml");
mimeMap.put("csv", "text/csv");
mimeMap.put("json", "application/json");
mimeMap.put("jsonl", "application/json");
mimeMap.put("pdf",
Reference to the code:
.
String rawContentType = conn.getContentType();
String type = rawContentType.split(";")[0];
if(typeSupported(type) || "*".equals(fileTypes)) {
String encoding = conn.getContentEncoding();
.
protected boolean typeSupported(String type) {
for(String key :
Amrit Sarkar wrote:
>> Strange,
>>
>> Can you add: "text/html;charset=utf-8". This is wiki.apache.org page's
>> Content-Type. Let's see what it says now.
Same thing. Verified Content-Type:
quadra[git:master]$ wget -S -O /dev/null http://quadra:9091/index.md |& grep
Content-Type
Strange,
Can you add: "text/html;charset=utf-8". This is wiki.apache.org page's
Content-Type. Let's see what it says now.
Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
On
OK, so I hacked markserv to add Content-Type text/html, but now I get
SimplePostTool: WARNING: Skipping URL with unsupported type text/html
What is it expecting?
$ docker exec -it --user=solr solr bin/post -c handbook
http://quadra:9091/index.md -recursive 10 -delay 0 -filetypes md
Amrit Sarkar wrote:
>> Kevin,
>>
>> You are getting NPE at:
>>
>> String type = rawContentType.split(";")[0]; //HERE - rawContentType is NULL
>>
>> // related code
>>
>> String rawContentType = conn.getContentType();
>>
>> public String getContentType() {
>> return
Kevin,
You are getting NPE at:
String type = rawContentType.split(";")[0]; //HERE - rawContentType is NULL
// related code
String rawContentType = conn.getContentType();
public String getContentType() {
return getHeaderField("content-type");
}
HttpURLConnection conn = (HttpURLConnection)
I want to use solr to index a markdown website. The files
are in native markdown, but they are served in HTML (by markserv).
Here's what I did:
docker run --name solr -d -p 8983:8983 -t solr
docker exec -it --user=solr solr bin/solr create_core -c handbook
Then, to crawl the site:
24 matches
Mail list logo