Hello all,

I'm new to Solr.  From what little I have seen, Solr has made great strides
in open source search, but is lacking some significant features that would
really allow it to become a viable alternative to things like FAST and
Autonomy for enterprise search.  I am sure these issues have been discussed
on the list before, but I would like to help push these issues forward if I
can:

1) Crawling--ShareHound does windows shares, but it ignores document-level
permissions.  A modular approach to crawling file systems, websites,
intranet sites, etc, would be huge.  Also, I realize Nutch has a crawler but
Solr looks much more full-featured in terms of things like faceted search,
etc, so I'd rather help push Solr forward.

2) ACLs and document-level security--The lack of doc-level security is a
real deal-breaker in terms of indexing enterprise fileshares.  I could
envision this type of functionality to be embedded in the various crawlers
above, on an OS-dependent or web app-dependent basis.  For example, when
indexing a file from a share, the ACL should be indexed as well, that way a
results list can be brought back and the permissions would not need to be
re-checked against the original file server.  Also, this implies that ACL
changes need to be monitored and updated as well as file content changes.

There are other differences, obviously, between the leading commercial
products and Solr, but those two features alone would make a huge difference
in the power of Solr, in my opinion. I have little Java experience, but I
could easily prototype this functionality in other languages and work with
others to integrate them into the code base in Java.  Also, I headed up an
enterprise search request for information for a large pharmaceutical company
in the past, so I am familiar with the feature sets of FAST and Autonomy,
and I could help manage the project in terms of competing feature sets.

Best,
Dave

Reply via email to