Some time ago I faced a roughly similar challenge. After many
trials and tests I ended up creating my own programs to accomplish the
tasks of fetching files, selecting which are allowed to be indexed, and
feeding them into Solr (POST style). This work is open source, found on
https://netlab1.net/, web page section titled Presentations of long term
utility, item Solr/Lucene Search Service. This is a set of docs, three
small PHP programs, and a Solr schema etc bundle, all within one
downloadable zip file.
On filtering found files, my solution uses a list of regular
expressions which are simple to state and to process. The docs discuss
the rules. Luckily, the code dealing with rules per se and doing the
filtering is very short and simple; see crawler.php for convertfilter()
and filterbyname(). Thus you may wish to consider them or equivalents
for inclusion in your system, whatever that may be.
Thanks,
Joe D.
On 27/08/2020 20:32, Alexandre Rafalovitch wrote:
If you are indexing from Drupal into Solr, that's the question for
Drupal's solr module. If you are doing it some other way, which way
are you doing it? bin/post command?
Most likely this is not the Solr question, but whatever you have
feeding data into Solr.
Regards,
Alex.
On Thu, 27 Aug 2020 at 15:21, Staley, Phil R - DCF
<phil.sta...@wisconsin.gov> wrote:
Can you or how do you exclude a specific folder/directory from indexing in SOLR
version 7.x or 8.x? Also our CMS is Drupal 8
Thanks,
Phil Staley
DCF Webmaster
608 422-6569
phil.sta...@wisconsin.gov