Simply change your solr schema and make the field title multivalued On 5 August 2011 12:38, Marek Bachmann <[email protected]> wrote:
> Hey ho, > > i have a problem with a url that seems to be an vcf document. > Let me explain: > > When I try to build an solr index, this url is responsible for this error > message: > > SEVERE: org.apache.solr.common.**SolrException: ERROR: [ > http://cms.uni-kassel.de/asl/**en/fb/staff.html?tx_** > wtdirectory_pi1%5BvCard%5D=10<http://cms.uni-kassel.de/asl/en/fb/staff.html?tx_wtdirectory_pi1%5BvCard%5D=10>] > multiple values encountered for non multiValued field title: [Universität > Kassel, Fachbereich 6 ASL: Faculty Members, Lolita_Hörnlein.vcf] > at org.apache.solr.update.**DocumentBuilder.toDocument(** > DocumentBuilder.java:242) > at org.apache.solr.update.**processor.RunUpdateProcessor.** > processAdd(**RunUpdateProcessorFactory.**java:60) > at org.apache.solr.handler.**XMLLoader.processUpdate(** > XMLLoader.java:147) > at org.apache.solr.handler.**XMLLoader.load(XMLLoader.java:**77) > at org.apache.solr.handler.**ContentStreamHandlerBase.** > handleRequestBody(**ContentStreamHandlerBase.java:**67) > at org.apache.solr.handler.**RequestHandlerBase.**handleRequest(** > RequestHandlerBase.java:129) > at org.apache.solr.core.SolrCore.**execute(SolrCore.java:1360) > at org.apache.solr.servlet.**SolrDispatchFilter.execute(** > SolrDispatchFilter.java:356) > at org.apache.solr.servlet.**SolrDispatchFilter.doFilter(** > SolrDispatchFilter.java:252) > at org.mortbay.jetty.servlet.**ServletHandler$CachedChain.** > doFilter(ServletHandler.java:**1212) > at org.mortbay.jetty.servlet.**ServletHandler.handle(** > ServletHandler.java:399) > at org.mortbay.jetty.security.**SecurityHandler.handle(** > SecurityHandler.java:216) > at org.mortbay.jetty.servlet.**SessionHandler.handle(** > SessionHandler.java:182) > at org.mortbay.jetty.handler.**ContextHandler.handle(** > ContextHandler.java:766) > at org.mortbay.jetty.webapp.**WebAppContext.handle(** > WebAppContext.java:450) > at org.mortbay.jetty.handler.**ContextHandlerCollection.**handle(** > ContextHandlerCollection.java:**230) > at org.mortbay.jetty.handler.**HandlerCollection.handle(** > HandlerCollection.java:114) > at org.mortbay.jetty.handler.**HandlerWrapper.handle(** > HandlerWrapper.java:152) > at org.mortbay.jetty.Server.**handle(Server.java:326) > at org.mortbay.jetty.**HttpConnection.handleRequest(** > HttpConnection.java:542) > at org.mortbay.jetty.**HttpConnection$RequestHandler.** > content(HttpConnection.java:**945) > at org.mortbay.jetty.HttpParser.**parseNext(HttpParser.java:843) > at org.mortbay.jetty.HttpParser.**parseAvailable(HttpParser.** > java:212) > at org.mortbay.jetty.**HttpConnection.handle(** > HttpConnection.java:404) > at org.mortbay.jetty.bio.**SocketConnector$Connection.** > run(SocketConnector.java:228) > at org.mortbay.thread.**QueuedThreadPool$PoolThread.** > run(QueuedThreadPool.java:582) > > > The url is: > > http://cms.uni-kassel.de/asl/**en/fb/staff.html?tx_** > wtdirectory_pi1%5BvCard%5D=10<http://cms.uni-kassel.de/asl/en/fb/staff.html?tx_wtdirectory_pi1%5BvCard%5D=10> > > When I download it separately it delivers following response: > > Status=OK - 200 > Date=Fri, 05 Aug 2011 11:09:12 GMT > Server=Apache/2.2.3 (Debian) mod_ssl/2.2.3 OpenSSL/0.9.8c > X-Powered-By=PHP/5.2.0-8+**etch16 > Content-Disposition=**attachment; filename=Lolita_Hörnlein.vcf > Pragma=public > Content-Type=text/directory > Set-Cookie=fe_typo_user=**316c4c91100f95fb57c5e8d39d32f9**9d; path=/asl/ > Via=1.1 cms.uni-kassel.de > Vary=Accept-Encoding > Content-Encoding=gzip > Content-Length=5043 > Keep-Alive=timeout=15, max=99 > Connection=Keep-Alive > > I have inspected this file and find out that it is corrupted, it seems that > besides the prober vcf data, there is generated html code in this file. This > seems to be a misbehaviour from some plugin in the cms. > > My Question is how to handle such files. It looks like the parser sets to > much values in the title field, so solr can't handle it. > > For a quick solution it would be best if I could configure tika in that > way, that it won't parse the vcf. But I don't know how to do that. > > Any suggestions for this problem? > > Thank you very much. > > > -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com

