Re: Flow Chart of Solr

Alexandre Rafalovitch Tue, 02 Apr 2013 09:25:57 -0700

I think there is a gap in the support of one's path of learning Solr . I'll
try to describe it based on my own experience. Hopefully, it is helpful.

At First, there is a "Solr is a blackbox" stage, where the person may not
know Java and is just using out of the box components. Wiki is reasonably
helpful there and there are other resources (blogs, etc). At this point,
Lucene is a black box within the black box and is something that is safely
ignored.

At the second stage, one hits the period where he/she understands what is
going on in their basic scenario and is trying to get into more advanced
case. This could be putting together a complex analyzer chain, trying to
use Update Request Processors or optimizing slow/OOM imports or doing
complex queries. Suddenly, they are pointed directly at Javadocs and have
to figure out the way around Java-based instructions. A Java programmer can
bridge that gap and get over the curve, but I suspect others get lost very
quickly and get stuck even when they don't need to be good programmers. An
example in my mind would be something like RegexReplaceProcessor. One has
to climb up and down the inheritance chain of the Javadoc to figure out
what can be done and what the parameters are. And the parameters syntax is
Java regular expressions rather than something used in copyField, so they
need to jump over and figure that out. So, it is fairly hard to envisage
those pieces and how they can combine together. Similarly, some of the
stuff is described in Jira requests, but also in a way that requires a
programmer's mind-set to parse it out. I think a lot of people drop out at
this stage and fall-back to 'black-box' view of Solr. Most of the questions
I see on Stack Overflow are conceptual troubles at this stage.

And then, those who get to the third stage, jump to the advanced level
where one could just read the source code to figure out what is going on. I
found www.grepcode.com to be useful (though it is quite slow now and is a
bit behind for Solr). Somewhere around here, one also starts to realize the
fuzzy relation between the Lucene and Solr code and becomes somewhat
clearer what Solr's benefits actually are (as opposed to bare Lucene's).
This also generates its own frustration and confusion of course, because
suddenly one starts to wish for Lucene's features that Solr does not use
(e.g. split/sync analyzer chains, some alternative facet implementation
features, etc).

And finally (at the end of the beginning....), you become the contributor
and become very familiar with subversion/ant/etc. Though, I suspect, the
contributors become more specialized and actually understand less about
other parts of the system (e.g. Is anyone still fully understanding DIH?).

I am not blaming anyone with this story for the lack of support. I think
Solr is - in many ways - better documented than many other open source
projects. And the new manual being contributed to replace Wiki will (soon?)
make this even better. And, of course, this mailing list
is indescribably awesome. I am just trying to provide a fresh view of what
I went through and where I see people getting stuck.

I think a bit more effort in documenting that second stage would bring more
people to the community. I am trying to do my share through Wiki updates,
questions here, Jira issues, my upcoming book and some other little things.
I see others do the same. Perhaps, the diagram is something that we should
explicitly try to do. Though, I think it would be more fun to do it as a
Scrollorama Inception Explained style (
http://www.inception-explained.com/). :-)

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)

On Tue, Apr 2, 2013 at 11:22 AM, Furkan KAMACI <furkankam...@gmail.com>wrote:

> You are right about mentioning developer doc and user doc. Users separate
> about it. Some of them uses Solr for indexing and monitoring via admin face
> and that is quietly enough for them however some people wants to modify it
> so it would be nice if there had been some documentation for developer side
> too.
>
>
> 2013/4/2 Yago Riveiro <yago.rive...@gmail.com>
>
> > For beginners is complicate understand the complexity of solr / lucene,
> > I'm trying devel a custom search component and it's too hard keep in mind
> > the flow, inheritance and iteration between classes. I think that there
> is
> > a gap between software doc and user doc, or maybe I don't search enough
> > T_T. Java doc not always is clear always.
> >
> > The fact that I'm beginner in solr world don't help.
> >
> > Either way, this thread was very helpful, I found some very good
> resources
> > here :)
> >
> > Cumprimentos
> >
> > --
> > Yago Riveiro
> > Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
> >
> >
> > On Tuesday, April 2, 2013 at 3:51 PM, Furkan KAMACI wrote:
> >
> > > Actually maybe one the most important core thing is that Analysis part
> at
> > > last diagram but there is nothing about it i.e. stamming, lemmitazing
> > etc.
> > > at any of them.
> > >
> > >
> > > 2013/4/2 Andre Bois-Crettez <andre.b...@kelkoo.com (mailto:
> > andre.b...@kelkoo.com)>
> > >
> > > >
> > > > On 04/02/2013 04:20 PM, Koji Sekiguchi wrote:
> > > >
> > > > > (13/04/02 21:45), Furkan KAMACI wrote:
> > > > >
> > > > > > Is there any documentation something like flow chart of Solr.
> i.e.
> > > > > > Documents comes into Solr(maybe indicating which classes get
> > documents)
> > > > > > and
> > > > > > goes to parsing process (i.e. stemming processes etc.) and then
> > reverse
> > > > > > indexes are get so on so forth?
> > > > > >
> > > > > > There is an interesting ticket:
> > > > >
> > > > > Architecture Diagrams needed for Lucene, Solr and Nutch
> > > > > https://issues.apache.org/**jira/browse/LUCENE-2412<
> > https://issues.apache.org/jira/browse/LUCENE-2412>
> > > > >
> > > > > koji
> > > >
> > > > I like this one, it is a bit more detailed :
> > > >
> > > > http://www.cominvent.com/2011/**04/04/solr-architecture-**diagram/<
> > http://www.cominvent.com/2011/04/04/solr-architecture-diagram/>
> > > >
> > > > --
> > > > André Bois-Crettez
> > > >
> > > > Search technology, Kelkoo
> > > > http://www.kelkoo.com/
> > > >
> > > >
> > > > Kelkoo SAS
> > > > Société par Actions Simplifiée
> > > > Au capital de € 4.168.964,30
> > > > Siège social : 8, rue du Sentier 75002 Paris
> > > > 425 093 069 RCS Paris
> > > >
> > > > Ce message et les pièces jointes sont confidentiels et établis à
> > > > l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
> > > > destinataire de ce message, merci de le détruire et d'en avertir
> > > > l'expéditeur.
> > > >
> > >
> > >
> > >
> >
> >
> >
>

Re: Flow Chart of Solr

Reply via email to