Re: Flow Chart of Solr

Furkan KAMACI Fri, 05 Apr 2013 13:18:59 -0700

I have read books and wikis of Solr and Lucene and I had to debug the code
to find which parts comes from other. I will tidy up my notes and share the
pig picture flow and the detailed one. After that I will ask you for your
opinions, thanks.



2013/4/5 Erick Erickson <erickerick...@gmail.com>

> Then there's my lazy method. Fire up the IDE and find a test case that
> looks close to something you want to understand further. Step through
> it all in the debugger. I admit there'll be some fumbling at the start
> to _find_ the test case, but they're pretty well named. In IntelliJ,
> all you have to do is right-click on the test case and the context
> menu says "debug blahbalbhabl".... You can chart the class
> relationships you actually wind up in as you go. This seems tedious,
> but it saves me getting lost in the class hierarchy.
>
> Also, there are some convenient tools in the IDE that will show you
> class hierarchies as you need.
>
> Or attach your debugger to a running Solr, which is actually very
> easy. In IntelliJ (and Eclipse has something very similar), create a
> "remote" project. That'll specify some parameters you start up with,
> e.g.:
> java -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=5900
> -jar start.jar
>
> Now start up the remote debugging session you just created in the IDE
> and you are attached to a live solr instance and able to step through
> any code you want.
>
> Either way, you can make the IDE work for you!
>
> FWIW,
> Erick
>
> On Wed, Apr 3, 2013 at 12:03 PM, Jack Krupansky <j...@basetechnology.com>
> wrote:
> > We're using the 4.x branch code as the basis for our writing. So,
> > effectively it will be for at least 4.3 when the book comes out in the
> > summer.
> >
> > Early access will be in about a month or so. O'Reilly will be showing a
> > galley proof for 200 pages of the book next week at Big Data TechCon next
> > week in Boston.
> >
> >
> > -- Jack Krupansky
> >
> > -----Original Message----- From: Jack Park
> > Sent: Wednesday, April 03, 2013 12:56 PM
> >
> > To: solr-user@lucene.apache.org
> > Subject: Re: Flow Chart of Solr
> >
> > Jack,
> >
> > Is that new book up to the 4.+ series?
> >
> > Thanks
> > The other Jack
> >
> > On Wed, Apr 3, 2013 at 9:19 AM, Jack Krupansky <j...@basetechnology.com>
> > wrote:
> >>
> >> And another one on the way:
> >>
> >>
> http://www.amazon.com/Lucene-Solr-Definitive-comprehensive-realtime/dp/1449359957
> >>
> >> Hopefully that help a lot as well. Plenty of diagrams. Lots of examples.
> >>
> >> -- Jack Krupansky
> >>
> >> -----Original Message----- From: Jack Park
> >> Sent: Wednesday, April 03, 2013 11:25 AM
> >>
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Flow Chart of Solr
> >>
> >> There are three books on Solr, two with that in the title, and one,
> >> Taming Text, each of which have been very valuable in understanding
> >> Solr.
> >>
> >> Jack
> >>
> >> On Wed, Apr 3, 2013 at 5:25 AM, Jack Krupansky <j...@basetechnology.com
> >
> >> wrote:
> >>>
> >>>
> >>> Sure, yes. But... it comes down to what level of detail you want and
> need
> >>> for a specific task. In other words, there are probably a dozen or more
> >>> levels of detail. The reality is that if you are going to work at the
> >>> Solr
> >>> code level, that is very, very different than being a "user" of Solr,
> and
> >>> at
> >>> that point your first step is to become familiar with the code itself.
> >>>
> >>> When you talk about "parsing" and "stemming", you are really talking
> >>> about
> >>> the user-level, not the Solr code level. Maybe what you really need is
> a
> >>> cheat sheet that maps a user-visible feature to the main Solr code
> >>> component
> >>> for that implements that user feature.
> >>>
> >>> There are a number of different forms of "parsing" in Solr - parsing of
> >>> what? Queries? Requests? Solr documents? Function queries?
> >>>
> >>> Stemming? Well, in truth, Solr doesn't even do stemming - Lucene does
> >>> that.
> >>> Lucene does all of the "token filtering". Are you asking for details on
> >>> how
> >>> Lucene works? Maybe you meant to ask how "term analysis" works, which
> is
> >>> split between Solr and Lucene. Or maybe you simply wanted to know when
> >>> and
> >>> where term analysis is done. Tell us your specific problem or specific
> >>> question and we can probably quickly give you an answer.
> >>>
> >>> In truth, NOBODY uses "flow charts" anymore. Sure, there are some
> >>> user-level
> >>> diagrams, but not down to the code level.
> >>>
> >>> If you could focus on specific questions, we could give you specific
> >>> answers.
> >>>
> >>> "Main steps"? That depends on what level you are working at. Tell us
> what
> >>> problem you are trying to solve and we can point you to the relevant
> >>> areas.
> >>>
> >>> In truth, if you become generally familiar with Solr at the user level
> >>> (study the wikis), you will already know what the "main steps" are.
> >>>
> >>> So, it is not "main steps of Solr", but main steps of some specific
> >>> "request" of Solr, and for a specified level of detail, and for a
> >>> specified
> >>> area of Solr if greater detail is needed. Be more specific, and then we
> >>> can
> >>> be more specific.
> >>>
> >>> For now, the general advice for people who need or want to go far
> beyond
> >>> the
> >>> user level is to "get familiar with the code" - just LOOK at it - a lot
> >>> of
> >>> the package and class names are OBVIOUS, really, and follow the class
> >>> hierarchy and code flow using the standard features of any modern Java
> >>> IDE.
> >>> If you are wondering where to start for some specific user-level
> feature,
> >>> please ask specifically about that feature. But... make a diligent
> effort
> >>> to
> >>> discover and learn on your own before asking open-ended questions.
> >>>
> >>> Sure, there are lots of things in Lucene and Solr that are rather
> complex
> >>> and seemingly convoluted, and not obvious, but people are more than
> >>> willing
> >>> to help you out if you simply ask a specific question. I mean, not
> >>> everybody
> >>> needs to know the fine detail of query parsing, analysis, building a
> >>> Lucene-level stemmer, etc. If we tried to put all of that in a diagram,
> >>> most
> >>> people would be more confused than enlightened.
> >>>
> >>> At which step are scores calculated? That's more of a Lucene question.
> >>> Or,
> >>> are you really asking what code in Solr invokes Lucene search methods
> >>> that
> >>> calculate basic scores?
> >>>
> >>> In short, you need to be more specific. Don't force us to guess what
> >>> problem
> >>> you are trying to solve.
> >>>
> >>> -- Jack Krupansky
> >>>
> >>> -----Original Message----- From: Furkan KAMACI
> >>> Sent: Wednesday, April 03, 2013 6:52 AM
> >>> To: solr-user@lucene.apache.org
> >>> Subject: Re: Flow Chart of Solr
> >>>
> >>>
> >>> So, all in all, is there anybody who can write down just main steps of
> >>> Solr(including parsing, stemming etc.)?
> >>>
> >>>
> >>> 2013/4/2 Furkan KAMACI <furkankam...@gmail.com>
> >>>
> >>>> I think about myself as an example. I have started to make research
> >>>> about
> >>>> Solr just for some weeks. I have learned Solr and its related
> projects.
> >>>> My
> >>>> next step writing down the main steps Solr. We have separated learning
> >>>> curve of Solr into two main categories.
> >>>> First one is who are using it as out of the box components. Second one
> >>>> is
> >>>> developer side.
> >>>>
> >>>> Actually developer side branches into two way.
> >>>>
> >>>> First one is general steps of it. i.e. document comes into Solr (i.e.
> >>>> crawled data of Nutch). which analyzing processes are going to done
> >>>> (stamming, hamming etc.), what will be doing after parsing step by
> step.
> >>>> When a search query happens what happens step by step, at which step
> >>>> scores
> >>>> are calculated so on so forth.
> >>>> Second one is more code specific i.e. which handlers takes into
> account
> >>>> data that will going to be indexed(no need the explain every handler
> at
> >>>> this step) . Which are the analyzer, tokenizer classes and what are
> the
> >>>> flow between them. How response handlers works and what are they.
> >>>>
> >>>> Also explaining about cloud side is other work.
> >>>>
> >>>> Some of explanations are currently presents at wiki (but some of them
> >>>> are
> >>>> at very deep places at wiki and it is not easy to find the parent
> topic
> >>>> of
> >>>> it, maybe starting wiki from a top age and branching all other topics
> as
> >>>> possible as from it could be better)
> >>>>
> >>>> If we could show the big picture, and beside of it the smaller
> pictures
> >>>> within it, it would be great (if you know the main parts it will be
> easy
> >>>> to
> >>>> go deep into the code i.e. you don't need to explain every handler, if
> >>>> you
> >>>> show the way to the developer he/she could debug and find the needs)
> >>>>
> >>>> When I think about myself as an example, I have to write down the
> steps
> >>>> of
> >>>> Solr a bit detail  even I read many pages at wiki and a book about
> it, I
> >>>> see that it is not easy even writing down the big picture of developer
> >>>> side.
> >>>>
> >>>>
> >>>> 2013/4/2 Alexandre Rafalovitch <arafa...@gmail.com>
> >>>>
> >>>>> Yago,
> >>>>>
> >>>>> My point - perhaps lost in too much text - was that Solr is
> presented -
> >>>>> and
> >>>>> can function - as a black-box. Which makes it different from more
> >>>>> traditional open-source project. So, the stage-2 happens exactly when
> >>>>> the
> >>>>> non-programmers have to cross the boundary from the black-box into
> >>>>> code-first approach and the hand-off is not particularly smooth. Or
> >>>>> even
> >>>>> when - say - php or .Net programmer  tries to get beyond the basic
> >>>>> operations their client library and has the understand the
> server-side
> >>>>> aspects of Solr.
> >>>>>
> >>>>> Regards,
> >>>>>    Alex.
> >>>>>
> >>>>> On Tue, Apr 2, 2013 at 1:19 PM, Yago Riveiro <yago.rive...@gmail.com
> >
> >>>>> wrote:
> >>>>>
> >>>>> > Alexandre,
> >>>>> >
> >>>>> > You describe the normal path when a beginner try to use a source
> of >
> >>>>> > code
> >>>>> > that doesn't understand, black-box, reading code, hacking, ok now
> I >
> >>>>> > know
> >>>>> > 10% of the project, with lucky :p.
> >>>>> >
> >>>>>
> >>>>>
> >>>>> Personal blog: http://blog.outerthoughts.com/
> >>>>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> >>>>> - Time is the quality of nature that keeps events from happening all
> at
> >>>>> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> >>>>> book)
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >
>

Re: Flow Chart of Solr

Reply via email to