Hello all,

thanks for your generous help.

I think I now know everything:  (What I want to do is to build a web crawler
and index the documents found). I will start with the setup as suggested by
Ephraim (Several sharded masters, each with at least one slave for reads and
some aggregators for querying). This is only a prototype to learn more...

And the Google PDF from Walter is very interesting, that is something that I
can then try if I hit the limits with the setup above.  But before that, I
have to learn much more about all this indexing / index building and
solr/lucene stuff.

Thanks again for your help!!
best regards
jens



2011/4/7 Walter Underwood <wun...@wunderwood.org>

> On Apr 6, 2011, at 10:29 PM, Jens Mueller wrote:
>
> > Walter, thanks for the advice: Well you are right, mentioning google. My
> > question was also to understand how such large systems like
> google/facebook
> > are actually working. So my numbers are just theoretical and made up. My
> > system will be smaller,  but I would be very happy to understand how such
> > large systems are build and I think the approach Ephraim showd should be
> > working quite well at large scale.
>
> Understanding what Google does will NOT help you build your engine. Just
> like understanding a F1 race car does not help you build a Toyota Camry. One
> is built for performance only, and requires LOTS of support, the other for
> supportability and stability. Very different engineering goals and designs.
>
> Here is one view of Google's search setup:
> http://www.linesave.co.uk/google_search_engine.html
>
> This talk gives a lot more detail. Summary in the blog post, slides in the
> PDF. Google's search is entirely in-memory. They load off disk and run.
>
> http://glinden.blogspot.com/2009/02/jeff-dean-keynote-at-wsdm-2009.html
> http://research.google.com/people/jeff/WSDM09-keynote.pdf
>
> How big will your system be? Does it require real-time updates?
>
> wunder
> --
> Walter Underwood
> Lead Engineer, MarkLogic
>
>

Reply via email to