For anyone interested, my issue (I think) was because I had specified
the url field as a multivalued field. I wasn't able to create a test
case that emulated my problem. This guess is based on gradual fiddling
with my configs.
My concern is no longer pressing but I do have a couple questions for
the devs to think about:
1. How should a multivalued field be treated in a child entity? The
use case would be the one I presented where I intend url to be
multivalued. I'm thinking a for-each type construct should apply.
2. How should a multivalued field be formatted or custom formatted if
you intend to use the content of a field in another field,
possibly nested?
Tricia Williams wrote:
Hi All,
The DataImportHandler is the most fantastic thing that has recently
come to Solr. Thank you.
I'm noticing that when I use variables in nested entities that
square brackets are wrapped around the variable value when they are
used. For example ${x.url} used in the "tika" entity below resolves
as [http://publicdomain.ca/content/Sample.pdf] (note the square
brackets) so I get the error in my log:
SEVERE: Exception thrown while getting data
java.net.MalformedURLException: no protocol:
[http://publicdomain.ca/content/Sample.pdf]
at java.net.URL.<init>(URL.java:567)
at java.net.URL.<init>(URL.java:464)
at java.net.URL.<init>(URL.java:413)
at
org.apache.solr.handler.dataimport.BinURLDataSource.getData(BinURLDat
aSource.java:78)
at
org.apache.solr.handler.dataimport.BinURLDataSource.getData(BinURLDat
aSource.java:38)
at
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEn
tityProcessor.java:98)
I encountered this previously when I tried to concatenate fields
from different entities into one field. I worked around this by
gathering fields with an xsl. Not being able to resolve the url for
Tika is a little more problematic.
*Is this a bug? If not, how do I remove the brackets so that I can
use my variable as it was meant?*
<dataConfig>
<dataSource type="BinURLDataSource" name="bin"/>
<dataSource type="FileDataSource" name="fileReader"/>
<document>
<entity name="f" processor="FileListEntityProcessor"
baseDir="/home/pgwillia/content" dataSource="null" fileName=".*xml"
rootEntity="false">
<entity name="x" processor="XPathEntityProcessor"
dataSource="fileReader"
transformer="TemplateTransformer,RegexTransformer"
forEach="/RDF/Description" url="${f.fileAbsolutePath}">
...
<field column="url" xpath="/RDF/Description/identifier"
regex="http://privatedomain:8080/content/"
replaceWith="http://publicdomain.ca/content/"/>
<entity name="tika" processor="TikaEntityProcessor"
url="${x.url}" dataSource="bin" format="text">
<field column="fulltext" name="text"/>
</entity>
</entity>
</entity>
</document>
</dataConfig>
Many thanks,
Tricia