Hi,
I need to add the result of parsing manually to my index using
ParseResult.put().
Everything works fine and the result shows up in my Solr index afterwards
except if the
Url (which I use as key) includes characters like #,? or &.
First I thought that crawl-urlfiler.txt could be the issue and ignore the urls
that do not match the filters
but I already removed -[?*!@=] with no success.
Looking at the source code at of ParseResult.java
I cannot see why some results would be rejected, because the stuff is simply
put in a HashMap<Text,Parse> where Text contains my Url as key.
The Url has some form like this: www.host.com?param=val.
What setting would cause such an issue, and how could I force to put such urls
into the index?
In the filter method of my HtmlParseFilter implementation I add a result like
this:
parseResult.put(URL, new ParseText("myParseText"), new ParseData(
new ParseStatus(ParseStatus.SUCCESS), "aTitle", new Outlink[0],
content.getMetadata()));
and if URL contains none of #,?,& all works fine.
Any ideas?
Thanks for any help.