Thanks so much for your help, Jan Høydahl. Have a great weekend! Xiaohui
-----Original Message----- From: Jan Høydahl / Cominvent [mailto:jan....@cominvent.com] Sent: Friday, September 03, 2010 3:46 AM To: solr-user@lucene.apache.org Subject: Re: how to deal with virtual collection in solr? You did not supply your actual query. Try to add a &q=foobar parameter, also you don't need a & before shards since you have the ?. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 1. sep. 2010, at 20.14, Ma, Xiaohui (NIH/NLM/LHC) [C] wrote: > Thank you, Jan. Unfortunately I got following exception when I use > http://localhost:8983/solr/aapublic/select?&shards=localhost:8983/solr/aaprivate,localhost:8983/solr/aapublic/ > . > > ********************************* > Aug 31, 2010 4:54:42 PM org.apache.solr.common.SolrException log > SEVERE: java.lang.NullPointerException > at java.io.StringReader.<init>(StringReader.java:33) > at > org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:197) > at > org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:78) > at org.apache.solr.search.QParser.getQuery(QParser.java:131) > at > org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:89) > at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) > at org.mortbay.jetty.Server.handle(Server.java:285) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) > at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) > at > org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) > at > org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) > ********************************* > > -----Original Message----- > From: Jan Høydahl / Cominvent [mailto:jan....@cominvent.com] > Sent: Tuesday, August 31, 2010 2:15 PM > To: solr-user@lucene.apache.org > Subject: Re: how to deal with virtual collection in solr? > > Hi, > > If you have multiple cores defined in your solr.xml you need to issue your > queries to one of the cores. Below it seems as if you are lacking core name. > Try instead: > > > http://localhost:8983/solr/aapublic/select?&shards=localhost:8983/solr/aaprivate,localhost:8983/solr/aapublic/ > > And as Lance pointed out, make sure your XML files conform to the Solr XML > format (http://wiki.apache.org/solr/UpdateXmlMessages). > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > Training in Europe - www.solrtraining.com > > On 27. aug. 2010, at 15.04, Ma, Xiaohui (NIH/NLM/LHC) [C] wrote: > >> Thank you, Jan Høydahl. >> >> I used >> http://localhost:8983/solr/select?&shards=localhost:8983/solr/aaprivate,localhost:8983/solr/aapublic/. >> I got a error "Missing solr core name in path". I have aapublic and >> aaprivate cores. I also got a error if I used >> http://localhost:8983/solr/aapublic/select?&shards=localhost:8983/solr/aaprivate,localhost:8983/solr/aapublic/. >> I got a null exception "java.lang.NullPointerException". >> >> My collections are xml files. Please let me if I can use the following way >> you suggested. >> curl >> "http://localhost:8983/solr/update/extract?literal.collection=aaprivate&literal.id=doc1&commit=true" >> -F "fi...@myfile.xml" >> >> Thanks so much as always! >> Xiaohui >> >> >> -----Original Message----- >> From: Jan Høydahl / Cominvent [mailto:jan....@cominvent.com] >> Sent: Friday, August 27, 2010 7:42 AM >> To: solr-user@lucene.apache.org >> Subject: Re: how to deal with virtual collection in solr? >> >> Hi, >> >> Version 1.4.1 does not support the SolrCloud style sharding. In 1.4.1, >> please use this style: >> &shards=localhost:8983/solr/aaprivate,localhost:8983/solr/aapublic/ >> >> >> However, since schema is the same, I'd opt for one index with a >> "collections" field as the filter. >> >> You can add that field to your schema, and then inject it as metadata on the >> ExtractingRequestHandler call: >> >> curl >> "http://localhost:8983/solr/update/extract?literal.collection=aaprivate&literal.id=doc1&commit=true" >> -F "fi...@myfile.pdf" >> >> -- >> Jan Høydahl, search solution architect >> Cominvent AS - www.cominvent.com >> Training in Europe - www.solrtraining.com >> >> On 26. aug. 2010, at 20.41, Ma, Xiaohui (NIH/NLM/LHC) [C] wrote: >> >>> Thanks so much for your help! I will try it. >>> >>> >>> -----Original Message----- >>> From: Thomas Joiner [mailto:thomas.b.joi...@gmail.com] >>> Sent: Thursday, August 26, 2010 2:36 PM >>> To: solr-user@lucene.apache.org >>> Subject: Re: how to deal with virtual collection in solr? >>> >>> I don't know about the shards, etc. >>> >>> However I recently encountered that exception while indexing pdfs as well. >>> The way that I resolved it was to upgrade to a nightly build of Solr. (You >>> can find them https://hudson.apache.org/hudson/view/Solr/job/Solr-trunk/). >>> >>> The problem is that the version of Tika that 1.4.1 using is a very old >>> version of Tika, which uses a old version of PDFBox to do its parsing. (You >>> might be able to fix the problem just by replacing the Tika jars...however I >>> don't know if there have been any API changes so I can't really suggest >>> that.) >>> >>> We didn't upgrade to trunk in order for that functionality, but it was nice >>> that it started working. (The PDFs we'll be indexing won't be of later >>> versions, but a test file was). >>> >>> On Thu, Aug 26, 2010 at 1:27 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] < >>> xiao...@mail.nlm.nih.gov> wrote: >>> >>>> Thanks so much for your help, Jan Høydahl! >>>> >>>> I made multiple cores (aa public, aa private, bb public and bb private). I >>>> knew how to query them individually. Please tell me if I can do a >>>> combinations through shards parameter now. If yes, I tried to append >>>> &shards=aapub,bbpub after query string. Unfortunately it didn't work. >>>> >>>> Actually all of content is the same. I don't have "collection" field in xml >>>> files. Please tell me how I can set a "collection" field in schema and >>>> simply search collection through filter. >>>> >>>> I used curl to index pdf files. I use Solr 1.4.1. I got the following error >>>> when I index pdf with version 1.5 and 1.6. >>>> >>>> ************************************* >>>> <html> >>>> <head> >>>> <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/> >>>> <title>Error 500 </title> >>>> </head> >>>> <body><h2>HTTP ERROR: 500</h2><pre>org.apache.tika.exception.TikaException: >>>> Unexpected RuntimeException from >>>> org.apache.tika.parser.pdf.pdfpar...@134ae32 >>>> >>>> org.apache.solr.common.SolrException: >>>> org.apache.tika.exception.TikaException: Unexpected RuntimeException from >>>> org.apache.tika.parser.pdf.pdfpar...@134ae32 >>>> at >>>> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211) >>>> at >>>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) >>>> at >>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) >>>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) >>>> at >>>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) >>>> at >>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) >>>> at >>>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) >>>> at >>>> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) >>>> at >>>> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) >>>> at >>>> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) >>>> at >>>> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) >>>> at >>>> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) >>>> at >>>> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) >>>> at >>>> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) >>>> at >>>> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) >>>> at org.mortbay.jetty.Server.handle(Server.java:285) >>>> at >>>> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) >>>> at >>>> org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835) >>>> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) >>>> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202) >>>> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) >>>> at >>>> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) >>>> at >>>> org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) >>>> Caused by: org.apache.tika.exception.TikaException: Unexpected >>>> RuntimeException from org.apache.tika.parser.pdf.pdfpar...@134ae32 >>>> at >>>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121) >>>> at >>>> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:105) >>>> at >>>> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190) >>>> ... 22 more >>>> Caused by: java.lang.NullPointerException >>>> at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:194) >>>> at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:182) >>>> at >>>> org.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.java:226) >>>> at >>>> org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:216) >>>> at org.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:149) >>>> at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:53) >>>> at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:51) >>>> at >>>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:119) >>>> ... 24 more >>>> </pre> >>>> <p>RequestURI=/solr/lhcpdf/update/extract</p><p><i><small><a href=" >>>> http://jetty.mortbay.org/">Powered by Jetty://</a></small></i></p><br/> >>>> <br/> >>>> *************************************** >>>> >>>> >>>> -----Original Message----- >>>> From: Jan Høydahl / Cominvent [mailto:jan....@cominvent.com] >>>> Sent: Wednesday, August 25, 2010 4:34 PM >>>> To: solr-user@lucene.apache.org >>>> Subject: Re: how to deal with virtual collection in solr? >>>> >>>>> 1. Currently we use Verity and have more than 20 collections, each >>>> collection has a index for public items and a index for private items. So >>>> there are virtual collections which point to each collection and a virtual >>>> collection which points to all. For example, we have AA and BB collections. >>>>> >>>>> AA virtual collection --> (AA index for public items and AA index for >>>> private items). >>>>> BB virtual collection --> (BB index for public items and BB index for >>>> private items). >>>>> All virtual collection --> (AA index for public items and AA index for >>>> private items, BB index for public items and BB index for private items). >>>>> >>>>> Would you please tell me what I should do for this if I use Solr? >>>> >>>> There are multiple ways to solve this, depending on the nature of your >>>> collections. If they have somewhat different schemas, a natural choice >>>> would >>>> be to make multiple cores: AA-private, AA-public, BB-private, BB-public. >>>> Now >>>> you can query them individually or in combinations through the shards >>>> parameter. From next Solr version you can use virtual collections for the >>>> shard parameter, e.g. &shards=AA,BB etc. (See >>>> http://wiki.apache.org/solr/SolrCloud#Distributed_Requests) >>>> >>>> If all your content is (roughly) the same kind of data, you could also >>>> solve your virtual collection issue through a "collection" field in your >>>> schema, and simply select collection through filters: &fq=collection:AA. >>>> You >>>> could even write a Search Component which translates a &collection= >>>> parameter in the request into the correct filters if you want to hide this >>>> implementation to the front ends. >>>> >>>>> 2. Our project has different kind format files I need index them. For >>>> example, xml files, pdf files and text files. Is it possible for Solr to >>>> return a search result from all? >>>> >>>> Sure. PDF and text files can be indexed through the >>>> ExtractingRequestHandler. XML can be indexed from XMLUpdateHandler or >>>> DataImportHandler. Solr uses Apache Tika internally to extract text from >>>> PDFs and other rich document formats. >>>> >>>>> >>>>> 3. I got a error when I index pdf files which are version 1.5 or 1.6. >>>> Would you please tell me if there is a patch to fix it? >>>> >>>> How did you try to index these PDFs? What version of Solr are you using? >>>> Exactly what error message did you get? >>>> >>>> -- >>>> Jan Høydahl, search solution architect >>>> Cominvent AS - www.cominvent.com >>>> Training in Europe - www.solrtraining.com >>>> >>>> >> >