Re: indexing comments with Apache Solr

Alexandre Rafalovitch Wed, 06 Aug 2014 05:48:33 -0700

You can index comments as child records. The structure of the Solr
document should be able to incorporate both parents and children
fields and you need to index them all together. Then, just search for
JOIN syntax for nested documents. Also, latest Solr (4.9) has some
extra functionality that allows you to find all parent pages and then
expand children pages to match.


E.g.: http://heliosearch.org/expand-block-join/ seems relevant

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On Wed, Aug 6, 2014 at 11:18 AM, Ali Nazemian <alinazem...@gmail.com> wrote:
> Dear Gora,
> I think you misunderstood my problem. Actually I used nutch for crawling
> websites and my problem is in index side and not crawl side. Suppose page
> is fetch and parsed by Nutch and all comments and the date and source of
> comments are identified by parsing. Now what can I do for indexing these
> comments? What is the document granularity?
> Best regards.
>
>
> On Wed, Aug 6, 2014 at 1:29 PM, Gora Mohanty <g...@mimirtech.com> wrote:
>
>> On 6 August 2014 14:13, Ali Nazemian <alinazem...@gmail.com> wrote:
>> >
>> > Dear all,
>> > Hi,
>> > I was wondering how can I mange to index comments in solr? suppose I am
>> > going to index a web page that has a content of news and some comments
>> that
>> > are presented by people at the end of this page. How can I index these
>> > comments in solr? consider the fact that I am going to do some analysis
>> on
>> > these comments. For example I want to have such query flexibility for
>> > retrieving all comments that are presented between 24 June 2014 to 24
>> July
>> > 2014! or all the comments that are presented by specific person.
>> Therefore
>> > defining these comment as multi-value field would not be the solution
>> since
>> > in this case such query flexibility is not feasible. So what is you
>> > suggestion about document granularity in this case? Can I consider all of
>> > these comments as a new document inside main document (tree based
>> > structure). What is your suggestion for this case? I think it is a common
>> > case of indexing webpages these days so probably I am not the only one
>> > thinking about this situation. Please share you though and perhaps your
>> > experiences in this condition with me. Thank you very much.
>>
>> Parsing a web page, and breaking up parts up for indexing into different
>> fields
>> is out of the scope of Solr. You might want to look at Apache Nutch which
>> can index into Solr, and/or other web crawlers/scrapers.
>>
>> Regards,
>> Gora
>>
>
>
>
> --
> A.Nazemian

Re: indexing comments with Apache Solr

Reply via email to