Thanks Shawn. I had actually tried changing &load= to &load=, but still got the error. It sounds like addDocuments is worth a try, though.

On 9/11/2013 4:37 PM, Shawn Heisey wrote:
On 9/11/2013 2:17 PM, Brian Robinson wrote:
I'm in the process of creating my index using a series of
SolrClient::request commands in PHP. I ran into a problem when some of
the fields that I had as "text_general" fieldType contained "&load=" in
a URL, triggering an error because the HTML entity "load" wasn't
recognized. I realized that I should have made my URL fields of type
"string" instead, so that they would be taken as is (they're not being
indexed, just stored), so I removed all docs from my index, updated
schema.xml, and restarted Solr, but I'm still getting the same error. Do
I need to delete the index itself and then restart to get this to work?
Am I correct that changing those fields to "string" type should fix the
issue?

Changing the field type is not going to affect this issue. Because you are not indexing the field, the choice of string or text_general is not really going to matter, but string will probably be more efficient.

What is happening here is an XML issue with the update request itself. The PHP client is sending an XML update request to Solr, and the request includes the URL text as-is in the XML request. It is not properly XML encoded. For an XML update request, that snippet of your text would need to be encoded as "&load=" to work properly.

XML has a much smaller list of valid entities than HTML, but "load" is not a valid entity in either XML or HTML.

I was going to call this a bug in the PHP client library, but then I got a look at what SolrClient::request actually does:

http://php.net/manual/en/solrclient.request.php

It expects you to create the XML yourself, which means you have to do all the encoding of characters which have special meaning to XML.

If you have no desire to figure out proper XML encoding, you should probably be using SolrClient::addDocument or SolrClient::addDocuments instead.

Thanks,
Shawn



Reply via email to