Hi Alexandre, 
Thanks very much for responding my post. Pls. find my response in-line:

1) For your email  address fields, you are escaping the brackets, right? 
Not just "solr solr 
<[hidden email]>" as you show, but the < and > escaped, right? Otherwise, 
those email addresses become part of XML markup and mess it all up 

[Neraj]: Yes, you are right. I used CDATA for escaping < and > or any
special characters in XML

2) Your binary content is encoded in some way inside XML, right? Not just 
random binary, which would make it invalid XML? Like base64 or something? 

[Neeraj]: I want to use random binary(*not base64 encoded*) in some of the
XML fields inside CDATA tag so that XML will not become invalid. I hope I
can do this. 

3) To decode base64 as first step and to feed it through whatever you want
to process actually 
binary with as a second step. So, it might be a custom URP, with similar
functionality to ExtractingRequestHandler with the difference that you
already have a document object and you are mapping one - binary - field in
it into a bunch of other fields with some conventions 
on names, overrides, etc.

[Neeraj]: Now, My XML document is containing some of the fields in plain
text and some of the fields in random binary format. 

I tried to use ExtractingUpdateProcessor but soon came to know that the same
is not rolled out in solr 4.5
I am not sure how to use ExtractingRequestHandler for an XML document having
some of the fields in plain text and some of the fields in random binary
format. It seems to me that ExtractingRequestHandler is used to extract text
from a binary file input but my input document is in XML format not binary.

I am new to Solr so need your valuable suggestion.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-on-plain-text-and-binary-data-in-a-single-HTTP-POST-request-tp4105661p4105706.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to