Re: Flow Chart of Solr

Lance Norskog Sun, 07 Apr 2013 18:37:27 -0700

Seconded. Single-stepping really is the best way to follow the logicchains and see how the data mutates.


On 04/05/2013 06:36 AM, Erick Erickson wrote:

Then there's my lazy method. Fire up the IDE and find a test case that
looks close to something you want to understand further. Step through
it all in the debugger. I admit there'll be some fumbling at the start
to _find_ the test case, but they're pretty well named. In IntelliJ,
all you have to do is right-click on the test case and the context
menu says "debug blahbalbhabl".... You can chart the class
relationships you actually wind up in as you go. This seems tedious,
but it saves me getting lost in the class hierarchy.


Also, there are some convenient tools in the IDE that will show you
class hierarchies as you need.

Or attach your debugger to a running Solr, which is actually very
easy. In IntelliJ (and Eclipse has something very similar), create a
"remote" project. That'll specify some parameters you start up with,
e.g.:
java -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=5900
-jar start.jar

Now start up the remote debugging session you just created in the IDE
and you are attached to a live solr instance and able to step through
any code you want.

Either way, you can make the IDE work for you!

FWIW,
Erick

On Wed, Apr 3, 2013 at 12:03 PM, Jack Krupansky <j...@basetechnology.com> wrote:

We're using the 4.x branch code as the basis for our writing. So,
effectively it will be for at least 4.3 when the book comes out in the
summer.

Early access will be in about a month or so. O'Reilly will be showing a
galley proof for 200 pages of the book next week at Big Data TechCon next
week in Boston.


-- Jack Krupansky

-----Original Message----- From: Jack Park
Sent: Wednesday, April 03, 2013 12:56 PM

To: solr-user@lucene.apache.org
Subject: Re: Flow Chart of Solr

Jack,

Is that new book up to the 4.+ series?

Thanks
The other Jack

On Wed, Apr 3, 2013 at 9:19 AM, Jack Krupansky <j...@basetechnology.com>
wrote:

And another one on the way:

http://www.amazon.com/Lucene-Solr-Definitive-comprehensive-realtime/dp/1449359957

Hopefully that help a lot as well. Plenty of diagrams. Lots of examples.

-- Jack Krupansky

-----Original Message----- From: Jack Park
Sent: Wednesday, April 03, 2013 11:25 AM

To: solr-user@lucene.apache.org
Subject: Re: Flow Chart of Solr

There are three books on Solr, two with that in the title, and one,
Taming Text, each of which have been very valuable in understanding
Solr.

Jack

On Wed, Apr 3, 2013 at 5:25 AM, Jack Krupansky <j...@basetechnology.com>
wrote:


Sure, yes. But... it comes down to what level of detail you want and need
for a specific task. In other words, there are probably a dozen or more
levels of detail. The reality is that if you are going to work at the
Solr
code level, that is very, very different than being a "user" of Solr, and
at
that point your first step is to become familiar with the code itself.

When you talk about "parsing" and "stemming", you are really talking
about
the user-level, not the Solr code level. Maybe what you really need is a
cheat sheet that maps a user-visible feature to the main Solr code
component
for that implements that user feature.

There are a number of different forms of "parsing" in Solr - parsing of
what? Queries? Requests? Solr documents? Function queries?

Stemming? Well, in truth, Solr doesn't even do stemming - Lucene does
that.
Lucene does all of the "token filtering". Are you asking for details on
how
Lucene works? Maybe you meant to ask how "term analysis" works, which is
split between Solr and Lucene. Or maybe you simply wanted to know when
and
where term analysis is done. Tell us your specific problem or specific
question and we can probably quickly give you an answer.

In truth, NOBODY uses "flow charts" anymore. Sure, there are some
user-level
diagrams, but not down to the code level.

If you could focus on specific questions, we could give you specific
answers.

"Main steps"? That depends on what level you are working at. Tell us what
problem you are trying to solve and we can point you to the relevant
areas.

In truth, if you become generally familiar with Solr at the user level
(study the wikis), you will already know what the "main steps" are.

So, it is not "main steps of Solr", but main steps of some specific
"request" of Solr, and for a specified level of detail, and for a
specified
area of Solr if greater detail is needed. Be more specific, and then we
can
be more specific.

For now, the general advice for people who need or want to go far beyond
the
user level is to "get familiar with the code" - just LOOK at it - a lot
of
the package and class names are OBVIOUS, really, and follow the class
hierarchy and code flow using the standard features of any modern Java
IDE.
If you are wondering where to start for some specific user-level feature,
please ask specifically about that feature. But... make a diligent effort
to
discover and learn on your own before asking open-ended questions.

Sure, there are lots of things in Lucene and Solr that are rather complex
and seemingly convoluted, and not obvious, but people are more than
willing
to help you out if you simply ask a specific question. I mean, not
everybody
needs to know the fine detail of query parsing, analysis, building a
Lucene-level stemmer, etc. If we tried to put all of that in a diagram,
most
people would be more confused than enlightened.

At which step are scores calculated? That's more of a Lucene question.
Or,
are you really asking what code in Solr invokes Lucene search methods
that
calculate basic scores?

In short, you need to be more specific. Don't force us to guess what
problem
you are trying to solve.

-- Jack Krupansky

-----Original Message----- From: Furkan KAMACI
Sent: Wednesday, April 03, 2013 6:52 AM
To: solr-user@lucene.apache.org
Subject: Re: Flow Chart of Solr


So, all in all, is there anybody who can write down just main steps of
Solr(including parsing, stemming etc.)?


2013/4/2 Furkan KAMACI <furkankam...@gmail.com>

I think about myself as an example. I have started to make research
about
Solr just for some weeks. I have learned Solr and its related projects.
My
next step writing down the main steps Solr. We have separated learning
curve of Solr into two main categories.
First one is who are using it as out of the box components. Second one
is
developer side.

Actually developer side branches into two way.

First one is general steps of it. i.e. document comes into Solr (i.e.
crawled data of Nutch). which analyzing processes are going to done
(stamming, hamming etc.), what will be doing after parsing step by step.
When a search query happens what happens step by step, at which step
scores
are calculated so on so forth.
Second one is more code specific i.e. which handlers takes into account
data that will going to be indexed(no need the explain every handler at
this step) . Which are the analyzer, tokenizer classes and what are the
flow between them. How response handlers works and what are they.

Also explaining about cloud side is other work.

Some of explanations are currently presents at wiki (but some of them
are
at very deep places at wiki and it is not easy to find the parent topic
of
it, maybe starting wiki from a top age and branching all other topics as
possible as from it could be better)

If we could show the big picture, and beside of it the smaller pictures
within it, it would be great (if you know the main parts it will be easy
to
go deep into the code i.e. you don't need to explain every handler, if
you
show the way to the developer he/she could debug and find the needs)

When I think about myself as an example, I have to write down the steps
of
Solr a bit detail  even I read many pages at wiki and a book about it, I
see that it is not easy even writing down the big picture of developer
side.


2013/4/2 Alexandre Rafalovitch <arafa...@gmail.com>

Yago,

My point - perhaps lost in too much text - was that Solr is presented -
and
can function - as a black-box. Which makes it different from more
traditional open-source project. So, the stage-2 happens exactly when
the
non-programmers have to cross the boundary from the black-box into
code-first approach and the hand-off is not particularly smooth. Or
even
when - say - php or .Net programmer  tries to get beyond the basic
operations their client library and has the understand the server-side
aspects of Solr.

Regards,
    Alex.

On Tue, Apr 2, 2013 at 1:19 PM, Yago Riveiro <yago.rive...@gmail.com>
wrote:

Alexandre,

You describe the normal path when a beginner try to use a source of >
code
that doesn't understand, black-box, reading code, hacking, ok now I >
know
10% of the project, with lucky :p.


Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)

Re: Flow Chart of Solr

Reply via email to