On 05.08.2011 13:50, Julien Nioche wrote:
Simply change your solr schema and make the field title multivalued
Thank you Julien. Perfect first aid! :-)
On 5 August 2011 12:38, Marek Bachmann<[email protected]> wrote:
Hey ho,
i have a problem with a url that seems to be an vcf document.
Let me explain:
When I try to build an solr index, this url is responsible for this error
message:
SEVERE: org.apache.solr.common.**SolrException: ERROR: [
http://cms.uni-kassel.de/asl/**en/fb/staff.html?tx_**
wtdirectory_pi1%5BvCard%5D=10<http://cms.uni-kassel.de/asl/en/fb/staff.html?tx_wtdirectory_pi1%5BvCard%5D=10>]
multiple values encountered for non multiValued field title: [Universität
Kassel, Fachbereich 6 ASL: Faculty Members, Lolita_Hörnlein.vcf]
at org.apache.solr.update.**DocumentBuilder.toDocument(**
DocumentBuilder.java:242)
at org.apache.solr.update.**processor.RunUpdateProcessor.**
processAdd(**RunUpdateProcessorFactory.**java:60)
at org.apache.solr.handler.**XMLLoader.processUpdate(**
XMLLoader.java:147)
at org.apache.solr.handler.**XMLLoader.load(XMLLoader.java:**77)
at org.apache.solr.handler.**ContentStreamHandlerBase.**
handleRequestBody(**ContentStreamHandlerBase.java:**67)
at org.apache.solr.handler.**RequestHandlerBase.**handleRequest(**
RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.**execute(SolrCore.java:1360)
at org.apache.solr.servlet.**SolrDispatchFilter.execute(**
SolrDispatchFilter.java:356)
at org.apache.solr.servlet.**SolrDispatchFilter.doFilter(**
SolrDispatchFilter.java:252)
at org.mortbay.jetty.servlet.**ServletHandler$CachedChain.**
doFilter(ServletHandler.java:**1212)
at org.mortbay.jetty.servlet.**ServletHandler.handle(**
ServletHandler.java:399)
at org.mortbay.jetty.security.**SecurityHandler.handle(**
SecurityHandler.java:216)
at org.mortbay.jetty.servlet.**SessionHandler.handle(**
SessionHandler.java:182)
at org.mortbay.jetty.handler.**ContextHandler.handle(**
ContextHandler.java:766)
at org.mortbay.jetty.webapp.**WebAppContext.handle(**
WebAppContext.java:450)
at org.mortbay.jetty.handler.**ContextHandlerCollection.**handle(**
ContextHandlerCollection.java:**230)
at org.mortbay.jetty.handler.**HandlerCollection.handle(**
HandlerCollection.java:114)
at org.mortbay.jetty.handler.**HandlerWrapper.handle(**
HandlerWrapper.java:152)
at org.mortbay.jetty.Server.**handle(Server.java:326)
at org.mortbay.jetty.**HttpConnection.handleRequest(**
HttpConnection.java:542)
at org.mortbay.jetty.**HttpConnection$RequestHandler.**
content(HttpConnection.java:**945)
at org.mortbay.jetty.HttpParser.**parseNext(HttpParser.java:843)
at org.mortbay.jetty.HttpParser.**parseAvailable(HttpParser.**
java:212)
at org.mortbay.jetty.**HttpConnection.handle(**
HttpConnection.java:404)
at org.mortbay.jetty.bio.**SocketConnector$Connection.**
run(SocketConnector.java:228)
at org.mortbay.thread.**QueuedThreadPool$PoolThread.**
run(QueuedThreadPool.java:582)
The url is:
http://cms.uni-kassel.de/asl/**en/fb/staff.html?tx_**
wtdirectory_pi1%5BvCard%5D=10<http://cms.uni-kassel.de/asl/en/fb/staff.html?tx_wtdirectory_pi1%5BvCard%5D=10>
When I download it separately it delivers following response:
Status=OK - 200
Date=Fri, 05 Aug 2011 11:09:12 GMT
Server=Apache/2.2.3 (Debian) mod_ssl/2.2.3 OpenSSL/0.9.8c
X-Powered-By=PHP/5.2.0-8+**etch16
Content-Disposition=**attachment; filename=Lolita_Hörnlein.vcf
Pragma=public
Content-Type=text/directory
Set-Cookie=fe_typo_user=**316c4c91100f95fb57c5e8d39d32f9**9d; path=/asl/
Via=1.1 cms.uni-kassel.de
Vary=Accept-Encoding
Content-Encoding=gzip
Content-Length=5043
Keep-Alive=timeout=15, max=99
Connection=Keep-Alive
I have inspected this file and find out that it is corrupted, it seems that
besides the prober vcf data, there is generated html code in this file. This
seems to be a misbehaviour from some plugin in the cms.
My Question is how to handle such files. It looks like the parser sets to
much values in the title field, so solr can't handle it.
For a quick solution it would be best if I could configure tika in that
way, that it won't parse the vcf. But I don't know how to do that.
Any suggestions for this problem?
Thank you very much.