RE: Crawling an SCM to update a Solr index

Van Tassell, Kristian Sun, 22 Apr 2012 08:32:02 -0700

Otis,

Thanks for the input! Were it not the metadata I need to extract and the slight 
possibility a sync error/file system error or inconsistency could occur, I 
would take that same route.


-Kristian

-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] 
Sent: Friday, April 20, 2012 10:13 PM
To: solr-user@lucene.apache.org
Subject: Re: Crawling an SCM to update a Solr index

Kristian,

For what it's worth, for http://search-lucene.com and http://search-hadoop.com 
we simply check out the source code from the SCM and index from the file 
system.  It works reasonably well.  The only issues that I can recall us having 
is with the source code organization under SCM - modules get moved around and 
sometimes this requires us to update stuff on our end to match those changes.

Otis
----
Performance Monitoring for Solr - 
http://sematext.com/spm/solr-performance-monitoring/index.html



>________________________________
> From: "Van Tassell, Kristian" <kristian.vantass...@siemens.com>
>To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org> 
>Sent: Friday, April 20, 2012 3:26 PM
>Subject: Crawling an SCM to update a Solr index
> 
>Hello everyone,
>
>I'm in the process of pulling together requirements for a SCM (source code 
>manager) crawling mechanism for our Solr index. I probably don't need to argue 
>the need for a crawler, but to be specific, we have an index which receives 
>its updates from a custom built application. I would, however, like to 
>periodically crawl the SCM to ensure the index is up to date. In addition, if 
>updates are made which require a complete reindex (such as schema.xml 
>modifications), I could utilize this crawler to update everything or specific 
>areas.
>
>I'm wondering if there are any initiatives, tools (like Nutch) or whitepapers 
>out there, which crawl an SCM. More specifically, I'm looking for a Perforce 
>solution. I'm guessing that there is nothing specific and I'm prepared to 
>design to our specific requirements, but wanted to check with the Solr 
>community prior to getting too far in.
>
>I'm most likely going to build the solution to interact with the SCM directly 
>(via their API) versus sync'ing the SCM repository to the filesystem and crawl 
>that way, since there could be filesystem problem syncing the data and because 
>there may be relevant metadata information that can be retrieved from the SCM.
>
>Thanks in advance for any information you may have,
>Kristian
>
>
>

RE: Crawling an SCM to update a Solr index

Reply via email to