The general answer is to add UpdateRequestProcessor pipeline. That gives
you a lot of post processing flexibility.
But you can also try having the xpath specify ..../text(), maybe that will
deal with space specifically. Did not test it myself though, just a
thought.
Regards,
Alex
On Mon., Sep. 6, 2021, 11:10 p.m. Scott Derrick, <[email protected]> wrote:
> I'm indexing .xml documents and using the XPathEntityProcessor for data
> importing. Here is a snippet of my conf file
>
> <entity name="meta"
> dataSource="myfilereader"
> processor="XPathEntityProcessor"
> url="${jcurrent.fileAbsolutePath}"
> stream="false"
> forEach="/TEI/teiHeader/fileDesc"
> xsl="xslt/meta.xsl"
> >
> <field column="title" xpath="/TEI/teiHeader//title"
> flatten="true"/>
> <field column="author" xpath="/TEI/teiHeader//author" />
> <field column="publisher" xpath="/TEI/teiHeader//publisher" />
> <field column="accession" xpath="/TEI/teiHeader//idno" />
> <field column="date" xpath="/TEI/teiHeader//date"
> flatten="true" />
> <field column="origin" xpath="/TEI/teiHeader//origin" />
> <field column="origPlace" xpath="/TEI/teiHeader//origPlace" />
> <field column="origGeo" xpath="/TEI/teiHeader//origGeo" />
> <field column="settlement" xpath="/TEI/teiHeader//settlement" />
> <field column="region" xpath="/TEI/teiHeader//region" />
> <field column="country" xpath="/TEI/teiHeader//country" />
> <field column="when" xpath="/TEI/teiHeader//when" />
> <field column="when-custom" xpath="/TEI/teiHeader//when-custom"
> />
> <field column="notAfter" xpath="/TEI/teiHeader//notAfter" />
> <field column="notBefore" xpath="/TEI/teiHeader//notBefore" />
> <field column="note" xpath="/TEI/teiHeader//note"
> flatten="true" />
> <field column="annotator" xpath="/TEI/teiHeader//annotator" />
> <field column="scribe" xpath="/TEI/teiHeader//scribe" />
> <field column="recipient" xpath="/TEI/teiHeader//recipient" />
> </entity>
>
> I noticed spaces at the ends of my elements when exporting a result into
> json or xml.
>
> I thought is was my javascript fetch call that was appending the string
> but looking at the query page on the solr admin site I can clearly see a
> trailing space. Doesn't matter how the field is stored string or
> text_general is the same.
>
> here is a snippet of the query response
>
> |{ "date":"1884-09-09 September 9, 1884 ", "note":"Handwritten by Mary on
> a postcard from Boston, Massachusetts. ", "country":"USA ",
> "origGeo":"42.3584308 -71.0597732 ", "author":"Mary ", "authorString":"Mary
> ", "origin":"1884-09-09 ",
> "originSort":"1884-09-09 ", "accession":"639P3.65.026 ",
> "accessionSort":"639P3.65.026 ", "title":"\n Mary to Mary Baker Eddy, \n
> September 9, 1884 \n \n ", "titleSort":"\n Mary to Mary Baker Eddy, \n
> September 9, 1884 \n \n ", "when":"1884-09-09 ",
> "settlement":"Boston ", "recipient":"Mary Baker Eddy",
> "recipientString":"Mary Baker Eddy", "publisher":"The Mary Baker Eddy
> Library ", "origPlace":"places.xml#boston_ma ", "region":"MA ",
> "type":"incoming_correspondence", "places":"Boston ",
> "placesString":"Boston ", "people":"Mary ", "peopleString":"Mary ",
> "body":"Paper rec received Thanks, Just looked it over, good . Have moved
> at last! Will find me at cor: Shawmut Ave. & Pleasant St. a few doors from
> 66 S. Ave, further downtown. Hope
> you will find time to come in. Not yet settled, but like much better. Hope
> you are prospering. Wanted to see you last Sabbath eve but too tired In
> love Mary – ", "closer":"Boston Sept 9. 1884 . ",
> "id":"3272bf21-e6c2-4053-85ef-db3ec5a7f0ae",
> "_version_":1710182653070671872},|
>
>
>
> I'm guessing its the XPathEntityProcessor that is doing it but I'm
> certainly open to pilot error!
>
> Any ideas how I can get rid of the trailing space?
>
> thanks,
>
> Scott
>
>
>