Indexing decisions should always be based primarily by query and access requirements. So please tell us what your query and access requirements are. For example, what query terms might a user enter and what exactly might they want to see for results?

I mean, on the indexing side you can index almost anything you want, but its up to you to decide what you want to index. IOW, it is your obligation to come up with a data model. And the data model should be driven in large part by the query and access requirements mentioned above.

-- Jack Krupansky

-----Original Message----- From: Ali Nazemian
Sent: Wednesday, August 6, 2014 5:18 AM
To: solr-user@lucene.apache.org
Subject: Re: indexing comments with Apache Solr

Dear Gora,
I think you misunderstood my problem. Actually I used nutch for crawling
websites and my problem is in index side and not crawl side. Suppose page
is fetch and parsed by Nutch and all comments and the date and source of
comments are identified by parsing. Now what can I do for indexing these
comments? What is the document granularity?
Best regards.


On Wed, Aug 6, 2014 at 1:29 PM, Gora Mohanty <g...@mimirtech.com> wrote:

On 6 August 2014 14:13, Ali Nazemian <alinazem...@gmail.com> wrote:
>
> Dear all,
> Hi,
> I was wondering how can I mange to index comments in solr? suppose I am
> going to index a web page that has a content of news and some comments
that
> are presented by people at the end of this page. How can I index these
> comments in solr? consider the fact that I am going to do some analysis
on
> these comments. For example I want to have such query flexibility for
> retrieving all comments that are presented between 24 June 2014 to 24
July
> 2014! or all the comments that are presented by specific person.
Therefore
> defining these comment as multi-value field would not be the solution
since
> in this case such query flexibility is not feasible. So what is you
> suggestion about document granularity in this case? Can I consider all > of
> these comments as a new document inside main document (tree based
> structure). What is your suggestion for this case? I think it is a > common
> case of indexing webpages these days so probably I am not the only one
> thinking about this situation. Please share you though and perhaps your
> experiences in this condition with me. Thank you very much.

Parsing a web page, and breaking up parts up for indexing into different
fields
is out of the scope of Solr. You might want to look at Apache Nutch which
can index into Solr, and/or other web crawlers/scrapers.

Regards,
Gora




--
A.Nazemian

Reply via email to