Dear Alexandre, Hi, Thank you very much. I think nested document is what I need. Do you have more information about how can I define such thing in solr schema? Your mentioned blog post was all about retrieving nested docs. Best regards.
On Wed, Aug 6, 2014 at 5:16 PM, Alexandre Rafalovitch <arafa...@gmail.com> wrote: > You can index comments as child records. The structure of the Solr > document should be able to incorporate both parents and children > fields and you need to index them all together. Then, just search for > JOIN syntax for nested documents. Also, latest Solr (4.9) has some > extra functionality that allows you to find all parent pages and then > expand children pages to match. > > E.g.: http://heliosearch.org/expand-block-join/ seems relevant > > Regards, > Alex. > Personal: http://www.outerthoughts.com/ and @arafalov > Solr resources and newsletter: http://www.solr-start.com/ and @solrstart > Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 > > > On Wed, Aug 6, 2014 at 11:18 AM, Ali Nazemian <alinazem...@gmail.com> > wrote: > > Dear Gora, > > I think you misunderstood my problem. Actually I used nutch for crawling > > websites and my problem is in index side and not crawl side. Suppose page > > is fetch and parsed by Nutch and all comments and the date and source of > > comments are identified by parsing. Now what can I do for indexing these > > comments? What is the document granularity? > > Best regards. > > > > > > On Wed, Aug 6, 2014 at 1:29 PM, Gora Mohanty <g...@mimirtech.com> wrote: > > > >> On 6 August 2014 14:13, Ali Nazemian <alinazem...@gmail.com> wrote: > >> > > >> > Dear all, > >> > Hi, > >> > I was wondering how can I mange to index comments in solr? suppose I > am > >> > going to index a web page that has a content of news and some comments > >> that > >> > are presented by people at the end of this page. How can I index these > >> > comments in solr? consider the fact that I am going to do some > analysis > >> on > >> > these comments. For example I want to have such query flexibility for > >> > retrieving all comments that are presented between 24 June 2014 to 24 > >> July > >> > 2014! or all the comments that are presented by specific person. > >> Therefore > >> > defining these comment as multi-value field would not be the solution > >> since > >> > in this case such query flexibility is not feasible. So what is you > >> > suggestion about document granularity in this case? Can I consider > all of > >> > these comments as a new document inside main document (tree based > >> > structure). What is your suggestion for this case? I think it is a > common > >> > case of indexing webpages these days so probably I am not the only one > >> > thinking about this situation. Please share you though and perhaps > your > >> > experiences in this condition with me. Thank you very much. > >> > >> Parsing a web page, and breaking up parts up for indexing into different > >> fields > >> is out of the scope of Solr. You might want to look at Apache Nutch > which > >> can index into Solr, and/or other web crawlers/scrapers. > >> > >> Regards, > >> Gora > >> > > > > > > > > -- > > A.Nazemian > -- A.Nazemian