Re: Indexing on plain text and binary data in a single HTTP POST request

neerajp Mon, 09 Dec 2013 08:21:30 -0800

Hi Alexandre, 
Thanks very much for responding my post. Pls. find my response in-line:

1) For your email address fields, you are escaping the brackets, right?
Not just "solr solr
<[hidden email]>" as you show, but the < and > escaped, right? Otherwise,
those email addresses become part of XML markup and mess it all up

[Neraj]: Yes, you are right. I used CDATA for escaping < and > or any
special characters in XML

2) Your binary content is encoded in some way inside XML, right? Not just
random binary, which would make it invalid XML? Like base64 or something?

[Neeraj]: I want to use random binary(*not base64 encoded*) in some of the
XML fields inside CDATA tag so that XML will not become invalid. I hope I
can do this.

3) To decode base64 as first step and to feed it through whatever you want
to process actually
binary with as a second step. So, it might be a custom URP, with similar
functionality to ExtractingRequestHandler with the difference that you
already have a document object and you are mapping one - binary - field in
it into a bunch of other fields with some conventions
on names, overrides, etc.

[Neeraj]: Now, My XML document is containing some of the fields in plain
text and some of the fields in random binary format.

I tried to use ExtractingUpdateProcessor but soon came to know that the same
is not rolled out in solr 4.5
I am not sure how to use ExtractingRequestHandler for an XML document having
some of the fields in plain text and some of the fields in random binary
format. It seems to me that ExtractingRequestHandler is used to extract text
from a binary file input but my input document is in XML format not binary.

I am new to Solr so need your valuable suggestion.

--
View this message in context:
http://lucene.472066.n3.nabble.com/Indexing-on-plain-text-and-binary-data-in-a-single-HTTP-POST-request-tp4105661p4105706.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Indexing on plain text and binary data in a single HTTP POST request

Reply via email to