This is brilliant. I love it!

Is a computer game a document? How about each level, each room, each player?

If you want some fancy linguistics besides stemming, try compounding or what I 
call "one word or two?" English loves to glom words together.

schoolroom or school room?
babysitter, baby-sitter, or baby sitter?
Ghost Busters or Ghostbusters? 

Note: the poster and movie titles for Ghostbusters disagree, I have screenshots 
of that.

wunder

On Mar 24, 2010, at 9:53 AM, Erick Erickson wrote:

> Erik:
> 
> In a former incarnation, I thought I was going to teach 6th graders. Until I
> found out I can't deal with 25 kids for 6 hours at a stretch for years on
> end....
> 
> My thoughts, presented in a "feel free to ignore but this is what I'd do"
> spirit.
> There are some random thoughts below, but here's what I'd think about...
> 
> Do a bit of an intro to the game. 10 minutes tops.
> 
> Make a game of sorts out of it. Some teams are the "indexers" and some are
> the "searchers". Give them some simple rules to follow, perhaps different
> ones for different pairs. Make sure some get surprising results (e.g. have
> one indexing team stem, the paired search team not stem). The searchers
> should rank the documents, you'll get some really surprising results.
> Emphasize that the game isn't pass/fail, it's to show the kinds of things we
> have to deal with.
> 
> Find some random near age-mates and try it once or twice before you present,
> you'll undoubtedly change something. Maybe run it by a teacher or two.
> 
> Use that as a basis to discuss the fact that people who write the programs
> that index/search have to cope with all the stuff they did, and the rules
> are imperfect. And each decision is made to serve a need, and when the user
> needs something *else*, it probably isn't a good match. And how horrible
> things happen when one part of the team assumes something different than the
> other part. And how end users don't care about all the internal stuff, they
> just care about how well their needs were served....
> 
> ***here're my random musings, they may even be useful***
> Outline what you want to cover. Then cut out 75% of it. Really. Forget
> running SOLR, the kids don't care. Think about questions like "what's a
> word?" "How is a stupid computer going to figure out what *you* want?"
> "what's a document?"
> 
> Certainly do the exercise of presenting sentences and asking what they'd
> expect, e.g.
> "The dog is running", would you expect "run" to be a hit? ran? the? You can
> work tokenizing in here, perhaps under the guise of "what's important when
> searching?" Maybe even before the game above if you decide to do that.
> 
> Why or why not? Perhaps ask/talk about how a really stupid computer program
> is supposed to figure stuff like this out.
> 
> Back up and tell them what a document is. How hard that is to define. Chris
> M. is right on when he talks about hooking what they're interested in.
> 
> Maybe come up with some examples of really surprising results from searches,
> and do a really *simple* explanation of how it got that way.
> 
> If you decide to go into scoring, stick with simplicity. Like "the more
> times a word appears in a document, the more relevant it is". Can you even
> guarantee that they'd understand phrasing this in terms of percentages?
> 
> FWIW
> Erick
> 
> On Wed, Mar 24, 2010 at 10:40 AM, Erik Hatcher <erik.hatc...@gmail.com>wrote:
> 
>> I've got a couple of questions for the community...
>> 
>> * what's the simplest way to get Solr up and running with a relatively
>> richly schema'd index of a Wikipedia dump?
>> 
>> What I'm looking for is something as easy as something along these lines:
>> 
>> java -Dsolr.solr.home=./wikipedia_solr_home -jar start.jar
>> 
>> cat wikipedia.bz2 | wikipedia_solr_indexer
>> 
>> My goal is to index wikipedia in order to demonstrate search to a class of
>> middle school kids that I've volunteered to teach for a couple of hours.
>> Which brings me to my next question...
>> 
>> * anyone have ideas on some basic hands-on ways of teaching search engine
>> fundamentals?
>> 
>> One idea I have is to bring some actual "documents", say a poster board
>> with a sentence written largely on it, have the students physically
>> *tokenize* the document by cutting it up and lexicographically building the
>> term dictionary.  Thoughts on taking it further welcome!
>> 
>> Thanks all.
>> 
>>       Erik
>> 




Reply via email to