This is brilliant. I love it! Is a computer game a document? How about each level, each room, each player?
If you want some fancy linguistics besides stemming, try compounding or what I call "one word or two?" English loves to glom words together. schoolroom or school room? babysitter, baby-sitter, or baby sitter? Ghost Busters or Ghostbusters? Note: the poster and movie titles for Ghostbusters disagree, I have screenshots of that. wunder On Mar 24, 2010, at 9:53 AM, Erick Erickson wrote: > Erik: > > In a former incarnation, I thought I was going to teach 6th graders. Until I > found out I can't deal with 25 kids for 6 hours at a stretch for years on > end.... > > My thoughts, presented in a "feel free to ignore but this is what I'd do" > spirit. > There are some random thoughts below, but here's what I'd think about... > > Do a bit of an intro to the game. 10 minutes tops. > > Make a game of sorts out of it. Some teams are the "indexers" and some are > the "searchers". Give them some simple rules to follow, perhaps different > ones for different pairs. Make sure some get surprising results (e.g. have > one indexing team stem, the paired search team not stem). The searchers > should rank the documents, you'll get some really surprising results. > Emphasize that the game isn't pass/fail, it's to show the kinds of things we > have to deal with. > > Find some random near age-mates and try it once or twice before you present, > you'll undoubtedly change something. Maybe run it by a teacher or two. > > Use that as a basis to discuss the fact that people who write the programs > that index/search have to cope with all the stuff they did, and the rules > are imperfect. And each decision is made to serve a need, and when the user > needs something *else*, it probably isn't a good match. And how horrible > things happen when one part of the team assumes something different than the > other part. And how end users don't care about all the internal stuff, they > just care about how well their needs were served.... > > ***here're my random musings, they may even be useful*** > Outline what you want to cover. Then cut out 75% of it. Really. Forget > running SOLR, the kids don't care. Think about questions like "what's a > word?" "How is a stupid computer going to figure out what *you* want?" > "what's a document?" > > Certainly do the exercise of presenting sentences and asking what they'd > expect, e.g. > "The dog is running", would you expect "run" to be a hit? ran? the? You can > work tokenizing in here, perhaps under the guise of "what's important when > searching?" Maybe even before the game above if you decide to do that. > > Why or why not? Perhaps ask/talk about how a really stupid computer program > is supposed to figure stuff like this out. > > Back up and tell them what a document is. How hard that is to define. Chris > M. is right on when he talks about hooking what they're interested in. > > Maybe come up with some examples of really surprising results from searches, > and do a really *simple* explanation of how it got that way. > > If you decide to go into scoring, stick with simplicity. Like "the more > times a word appears in a document, the more relevant it is". Can you even > guarantee that they'd understand phrasing this in terms of percentages? > > FWIW > Erick > > On Wed, Mar 24, 2010 at 10:40 AM, Erik Hatcher <erik.hatc...@gmail.com>wrote: > >> I've got a couple of questions for the community... >> >> * what's the simplest way to get Solr up and running with a relatively >> richly schema'd index of a Wikipedia dump? >> >> What I'm looking for is something as easy as something along these lines: >> >> java -Dsolr.solr.home=./wikipedia_solr_home -jar start.jar >> >> cat wikipedia.bz2 | wikipedia_solr_indexer >> >> My goal is to index wikipedia in order to demonstrate search to a class of >> middle school kids that I've volunteered to teach for a couple of hours. >> Which brings me to my next question... >> >> * anyone have ideas on some basic hands-on ways of teaching search engine >> fundamentals? >> >> One idea I have is to bring some actual "documents", say a poster board >> with a sentence written largely on it, have the students physically >> *tokenize* the document by cutting it up and lexicographically building the >> term dictionary. Thoughts on taking it further welcome! >> >> Thanks all. >> >> Erik >>