On 9/13/2016 5:42 PM, Aaron Greenspan wrote: > I get this on digest mode (and wasn’t even sure my initial message > went through to the list), so please forgive the delay in responding.
I've added you as BCC so you'll get this as soon as I send it. I wrote most of it last night, and left it to complete in the morning -- and now I see that Jan has replied with similar information. > I think the various reactions to my post suggest that a sizable number > of users (and by "users" I mean those who are not affiliated with > Apache and who are not core contributors) find Solr difficult to use. > For me, this was confirmed many months ago when a family friend—a > non-technical CEO twice my age of a company recently acquired for a > very sizable sum—came over for dinner and without any prompting from > anyone began complaining about this impossible program at work called > Solr that none of his engineers could get to work. By his telling, he > had several experienced engineers working on it. I've been using Solr for about six years now. When I first got started, I spent a HUGE amount of time figuring out the most basic things, and I asked plenty of dumb questions right here on this list. I think it took me about three days to get from that initial download of the 1.4.0 archive to a working server that had something besides "collection1" on it. It took another month or so beyond that before I could demonstrate anything usable to my team, and after that had to start writing tools that would actually create the index without manual intervention. One of those tools was an init script. Now Solr will install an init script on Unix-like operating systems. My active production indexes are running on a couple of different 4.x versions. I have production 5.x indexes on servers serving a hot standby role, but they have not been fully vetted, so the primaries remain on older versions. It'll be a while before I get around to 6.x. > I’m aware that issues with Java are not Solr’s fault. But most > programs still manage to gracefully fail when they are missing a > dependency, and then clearly report what’s missing. If you’re not > actually a Java programmer, which I am not, "major.minor 52.0" (for > example) is meaningless gibberish. "Please download and install JRE > 1.8 to run this software" would be considerably clearer. How is it > that Solr can search through millions of files, but it can’t do that? I know that in the 5.x days, we had Java version detection in the start script, so that the start would complain if certain buggy versions of Java 7 were detected. I think it would even refuse to start if the version wasn't new enough. If we have lost that with 6.x, that needs to go back in, and we will look at that problem immediately. On password security: I hear you. Part of the issue is that Solr can't *directly* do security. It's sitting behind another piece of software that handles the network and HTTP -- Jetty. Until recently, Solr really didn't touch the servlet container, allowing it to do its thing according to its config files. Part of this was due to the fact that before 5.0, we did not know what container was being used -- the user had the option of deploying in several different containers, and none of them handled security in quite the same way. Since 5.0, the only officially supported container is the Jetty that Solr includes, so we CAN put container-specific code into Solr. This is why 5.3 and later have good support for authentication. TL;DR info: When you password protect Solr, the admin UI actually doesn't get protected. It is nothing more than static HTML, CSS, Javascript, and images. The admin UI actually runs in your browser, not on the server. What gets password protection is the HTTP API used for information, queries, and updates. You're absolutely right that our documentation and error messages are completely inadequate for a novice user. The error messages sometimes aren't even adequate for an experienced Java developer to know what went wrong, at least not without examining the source code. > As for Bram Van Dam’s question about how a settings database would > work, I don’t think it’s worth getting too specific here, but my > general response would be, if you need a good model for how to widely > deploy software—not a perfect model, but a good one—look at WordPress. > A lot of people use WordPress. Like any software, it has its flaws. > But average people are able to sign in, with a password (!), change > their admin settings, and save those settings I’m pretty certain to a > MySQL schema. I’d love to be able to do that with Solr. I concur with what Alexandre said about Wordpress compared to Solr. The target audience and deployment method are quite different ... but I take your point too -- we can learn a lot from projects like WordPress, which has had to address "first contact" issues in their documentation. The addition of Zookeeper capability to Solr in version 4.0 created SolrCloud, which automates the job of using multiple Solr machines as a scalable and redundant cluster. Unfortunately, Zookeeper is not trivial to set up, so we traded a super-hard problem for a different and slightly less challenging problem. Managing zookeeper is easier than setting up all of the infrastructure required for a cluster when SolrCloud is NOT used. Solr has a running mode where it embeds a Zookeeper server inside of Solr itself. The port number is usually 9983. The startup script will set that port number to the Solr port plus 1000, unless a specific port number is configured. The cloud example (bin/solr start -e cloud) starts an embedded zookeeper on the first Solr instance. This arrangement is suitable for a demo or proof of concept, but a different setup is highly recommended for production: an external ensemble of three or more zookeeper instances running on separate physical boxes. Three servers are required for zookeeper redundancy, and running them outside Solr is recommended so that the entire ZK ensemble stays up even when a Solr process is stopped. Zookeeper does NOT need to run on separate hardware from Solr, though a large and busy cluster will perform better if zookeeper uses a different storage volume than Solr does. I hate that you've had a bad experience with Solr. Your feedback has given us some pointers about specific things we can improve. I hope you'll be willing to continue providing guidance. Thanks, Shawn