I have read books and wikis of Solr and Lucene and I had to debug the code to find which parts comes from other. I will tidy up my notes and share the pig picture flow and the detailed one. After that I will ask you for your opinions, thanks.
2013/4/5 Erick Erickson <erickerick...@gmail.com> > Then there's my lazy method. Fire up the IDE and find a test case that > looks close to something you want to understand further. Step through > it all in the debugger. I admit there'll be some fumbling at the start > to _find_ the test case, but they're pretty well named. In IntelliJ, > all you have to do is right-click on the test case and the context > menu says "debug blahbalbhabl".... You can chart the class > relationships you actually wind up in as you go. This seems tedious, > but it saves me getting lost in the class hierarchy. > > Also, there are some convenient tools in the IDE that will show you > class hierarchies as you need. > > Or attach your debugger to a running Solr, which is actually very > easy. In IntelliJ (and Eclipse has something very similar), create a > "remote" project. That'll specify some parameters you start up with, > e.g.: > java -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=5900 > -jar start.jar > > Now start up the remote debugging session you just created in the IDE > and you are attached to a live solr instance and able to step through > any code you want. > > Either way, you can make the IDE work for you! > > FWIW, > Erick > > On Wed, Apr 3, 2013 at 12:03 PM, Jack Krupansky <j...@basetechnology.com> > wrote: > > We're using the 4.x branch code as the basis for our writing. So, > > effectively it will be for at least 4.3 when the book comes out in the > > summer. > > > > Early access will be in about a month or so. O'Reilly will be showing a > > galley proof for 200 pages of the book next week at Big Data TechCon next > > week in Boston. > > > > > > -- Jack Krupansky > > > > -----Original Message----- From: Jack Park > > Sent: Wednesday, April 03, 2013 12:56 PM > > > > To: solr-user@lucene.apache.org > > Subject: Re: Flow Chart of Solr > > > > Jack, > > > > Is that new book up to the 4.+ series? > > > > Thanks > > The other Jack > > > > On Wed, Apr 3, 2013 at 9:19 AM, Jack Krupansky <j...@basetechnology.com> > > wrote: > >> > >> And another one on the way: > >> > >> > http://www.amazon.com/Lucene-Solr-Definitive-comprehensive-realtime/dp/1449359957 > >> > >> Hopefully that help a lot as well. Plenty of diagrams. Lots of examples. > >> > >> -- Jack Krupansky > >> > >> -----Original Message----- From: Jack Park > >> Sent: Wednesday, April 03, 2013 11:25 AM > >> > >> To: solr-user@lucene.apache.org > >> Subject: Re: Flow Chart of Solr > >> > >> There are three books on Solr, two with that in the title, and one, > >> Taming Text, each of which have been very valuable in understanding > >> Solr. > >> > >> Jack > >> > >> On Wed, Apr 3, 2013 at 5:25 AM, Jack Krupansky <j...@basetechnology.com > > > >> wrote: > >>> > >>> > >>> Sure, yes. But... it comes down to what level of detail you want and > need > >>> for a specific task. In other words, there are probably a dozen or more > >>> levels of detail. The reality is that if you are going to work at the > >>> Solr > >>> code level, that is very, very different than being a "user" of Solr, > and > >>> at > >>> that point your first step is to become familiar with the code itself. > >>> > >>> When you talk about "parsing" and "stemming", you are really talking > >>> about > >>> the user-level, not the Solr code level. Maybe what you really need is > a > >>> cheat sheet that maps a user-visible feature to the main Solr code > >>> component > >>> for that implements that user feature. > >>> > >>> There are a number of different forms of "parsing" in Solr - parsing of > >>> what? Queries? Requests? Solr documents? Function queries? > >>> > >>> Stemming? Well, in truth, Solr doesn't even do stemming - Lucene does > >>> that. > >>> Lucene does all of the "token filtering". Are you asking for details on > >>> how > >>> Lucene works? Maybe you meant to ask how "term analysis" works, which > is > >>> split between Solr and Lucene. Or maybe you simply wanted to know when > >>> and > >>> where term analysis is done. Tell us your specific problem or specific > >>> question and we can probably quickly give you an answer. > >>> > >>> In truth, NOBODY uses "flow charts" anymore. Sure, there are some > >>> user-level > >>> diagrams, but not down to the code level. > >>> > >>> If you could focus on specific questions, we could give you specific > >>> answers. > >>> > >>> "Main steps"? That depends on what level you are working at. Tell us > what > >>> problem you are trying to solve and we can point you to the relevant > >>> areas. > >>> > >>> In truth, if you become generally familiar with Solr at the user level > >>> (study the wikis), you will already know what the "main steps" are. > >>> > >>> So, it is not "main steps of Solr", but main steps of some specific > >>> "request" of Solr, and for a specified level of detail, and for a > >>> specified > >>> area of Solr if greater detail is needed. Be more specific, and then we > >>> can > >>> be more specific. > >>> > >>> For now, the general advice for people who need or want to go far > beyond > >>> the > >>> user level is to "get familiar with the code" - just LOOK at it - a lot > >>> of > >>> the package and class names are OBVIOUS, really, and follow the class > >>> hierarchy and code flow using the standard features of any modern Java > >>> IDE. > >>> If you are wondering where to start for some specific user-level > feature, > >>> please ask specifically about that feature. But... make a diligent > effort > >>> to > >>> discover and learn on your own before asking open-ended questions. > >>> > >>> Sure, there are lots of things in Lucene and Solr that are rather > complex > >>> and seemingly convoluted, and not obvious, but people are more than > >>> willing > >>> to help you out if you simply ask a specific question. I mean, not > >>> everybody > >>> needs to know the fine detail of query parsing, analysis, building a > >>> Lucene-level stemmer, etc. If we tried to put all of that in a diagram, > >>> most > >>> people would be more confused than enlightened. > >>> > >>> At which step are scores calculated? That's more of a Lucene question. > >>> Or, > >>> are you really asking what code in Solr invokes Lucene search methods > >>> that > >>> calculate basic scores? > >>> > >>> In short, you need to be more specific. Don't force us to guess what > >>> problem > >>> you are trying to solve. > >>> > >>> -- Jack Krupansky > >>> > >>> -----Original Message----- From: Furkan KAMACI > >>> Sent: Wednesday, April 03, 2013 6:52 AM > >>> To: solr-user@lucene.apache.org > >>> Subject: Re: Flow Chart of Solr > >>> > >>> > >>> So, all in all, is there anybody who can write down just main steps of > >>> Solr(including parsing, stemming etc.)? > >>> > >>> > >>> 2013/4/2 Furkan KAMACI <furkankam...@gmail.com> > >>> > >>>> I think about myself as an example. I have started to make research > >>>> about > >>>> Solr just for some weeks. I have learned Solr and its related > projects. > >>>> My > >>>> next step writing down the main steps Solr. We have separated learning > >>>> curve of Solr into two main categories. > >>>> First one is who are using it as out of the box components. Second one > >>>> is > >>>> developer side. > >>>> > >>>> Actually developer side branches into two way. > >>>> > >>>> First one is general steps of it. i.e. document comes into Solr (i.e. > >>>> crawled data of Nutch). which analyzing processes are going to done > >>>> (stamming, hamming etc.), what will be doing after parsing step by > step. > >>>> When a search query happens what happens step by step, at which step > >>>> scores > >>>> are calculated so on so forth. > >>>> Second one is more code specific i.e. which handlers takes into > account > >>>> data that will going to be indexed(no need the explain every handler > at > >>>> this step) . Which are the analyzer, tokenizer classes and what are > the > >>>> flow between them. How response handlers works and what are they. > >>>> > >>>> Also explaining about cloud side is other work. > >>>> > >>>> Some of explanations are currently presents at wiki (but some of them > >>>> are > >>>> at very deep places at wiki and it is not easy to find the parent > topic > >>>> of > >>>> it, maybe starting wiki from a top age and branching all other topics > as > >>>> possible as from it could be better) > >>>> > >>>> If we could show the big picture, and beside of it the smaller > pictures > >>>> within it, it would be great (if you know the main parts it will be > easy > >>>> to > >>>> go deep into the code i.e. you don't need to explain every handler, if > >>>> you > >>>> show the way to the developer he/she could debug and find the needs) > >>>> > >>>> When I think about myself as an example, I have to write down the > steps > >>>> of > >>>> Solr a bit detail even I read many pages at wiki and a book about > it, I > >>>> see that it is not easy even writing down the big picture of developer > >>>> side. > >>>> > >>>> > >>>> 2013/4/2 Alexandre Rafalovitch <arafa...@gmail.com> > >>>> > >>>>> Yago, > >>>>> > >>>>> My point - perhaps lost in too much text - was that Solr is > presented - > >>>>> and > >>>>> can function - as a black-box. Which makes it different from more > >>>>> traditional open-source project. So, the stage-2 happens exactly when > >>>>> the > >>>>> non-programmers have to cross the boundary from the black-box into > >>>>> code-first approach and the hand-off is not particularly smooth. Or > >>>>> even > >>>>> when - say - php or .Net programmer tries to get beyond the basic > >>>>> operations their client library and has the understand the > server-side > >>>>> aspects of Solr. > >>>>> > >>>>> Regards, > >>>>> Alex. > >>>>> > >>>>> On Tue, Apr 2, 2013 at 1:19 PM, Yago Riveiro <yago.rive...@gmail.com > > > >>>>> wrote: > >>>>> > >>>>> > Alexandre, > >>>>> > > >>>>> > You describe the normal path when a beginner try to use a source > of > > >>>>> > code > >>>>> > that doesn't understand, black-box, reading code, hacking, ok now > I > > >>>>> > know > >>>>> > 10% of the project, with lucky :p. > >>>>> > > >>>>> > >>>>> > >>>>> Personal blog: http://blog.outerthoughts.com/ > >>>>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > >>>>> - Time is the quality of nature that keeps events from happening all > at > >>>>> once. Lately, it doesn't seem to be working. (Anonymous - via GTD > >>>>> book) > >>>>> > >>>> > >>>> > >>> > >> > > >