Re: SolrCell and indexing HTML

Greg Walters Fri, 21 Mar 2014 10:09:39 -0700

I've never tried indexing via groovy or using solrCell but I think you might be 
working a bit too low level in solrj if you're just adding documents. You might 
try checking out https://wiki.apache.org/solr/Solrj#Adding_Data_to_Solr and I 
might be way off base :)


Thanks,
Greg

On Mar 21, 2014, at 11:56 AM, Liz Sommers <lizswo...@gmail.com> wrote:

> I am trying to write a POC about indexing URL's with Solr using solrJ and
> solrCell.  (The code is written in groovy).
> 
> The relevant code is here
> 
> ContentStreamUpdateRequest req = new
> ContentStreamUpdateRequest("/update/extract");
> 
>        req.setParam("literal.id",p.id.toString())
>        req.setParam("extractOnly","true")
>        URL url = new URL(p.url)
>        ContentStream stream = new ContentStreamBase.URLStream(url)
>        req.addContentStream(stream)
> 
>        def result = server.request(req)
>        println "result: ${result}"
> 
> 
> When I set extractOnly to true I get everything in the URL.  All the tags,
> all the stylesheets.  When I set it to false I get a response that has
> nothing in it except
> 
> result: {responseHeader={status=0,QTime=19}}
> 
> When I test it with the admin tools, nothing in the url has been indexed as
> far as I can tell.
> I know I am doing something wrong with the params, but I haven't figured
> out what.  Can somebody please help me.
> 
> Thanks
> Liz Sommers
> lizzy...@gmail.com
> lizswo...@gmail.com

Re: SolrCell and indexing HTML

Reply via email to