On Friday 12 August 2011 13:36:54 Max Stricker wrote:
> Hi,
>
> I need to add the result of parsing manually to my index using
> ParseResult.put(). Everything works fine and the result shows up in my
> Solr index afterwards except if the Url (which I use as key) includes
> characters like #,? or &.
> First I thought that crawl-urlfiler.txt could be the issue and ignore the
> urls that do not match the filters but I already removed -[?*!@=] with no
> success.
Did you do a complete recrawl?
> Looking at the source code at of ParseResult.java
> I cannot see why some results would be rejected, because the stuff is
> simply put in a HashMap<Text,Parse> where Text contains my Url as key. The
> Url has some form like this: www.host.com?param=val.
> What setting would cause such an issue, and how could I force to put such
> urls into the index?
>
> In the filter method of my HtmlParseFilter implementation I add a result
> like this:
>
> parseResult.put(URL, new ParseText("myParseText"), new ParseData(
> new ParseStatus(ParseStatus.SUCCESS), "aTitle", new Outlink[0],
> content.getMetadata()));
>
> and if URL contains none of #,?,& all works fine.
>
> Any ideas?
> Thanks for any help.
--
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350