Rachel McConnell wrote:
Our Solr use consists of several rather different data types, some of
which have one-to-many relationships with other types.  We don't need
to do any searching of quite the kind you describe, but I have an idea
about it, depending on what you need to do with the book data.  It is
rather hacky, but maybe you can improve it.

coolio, thanks :)

[snip]


If your 'authors' 'write' 'books' with great frequency, you'd need to
update a lot...

yeah, unfortunately that's the case :)

I was using the book analogy because I figured it was simple to explain, not necessarily because I was trying to be vague :)

Another possibility is to do two searches, with this kind of
structure, which sort of mimics an RDBMS:
* everything in Solr has a field, type (book, author, library, etc).
these can be filtered on a search by search basis
* books have a field, authorId, uniquely referencing the author
* your first search will restricted to just authors, from which you
will extract the IDs.
* your second search will be restricted to just books, whose authorId
field is exactly one of the IDs from the first search

I think this approach solves the mindset issues I was having - I didn't want to be left with a schema like this

  authorId
  bookID1
  bookID2
  ...

but since lucene allows for all kinds of slots to exist and be empty, it seems I can simplify that to

  authorId
  bookId

and use multiple queries to satisfy the display needs. it's probably more a duh! moment for the majority, but lucene is sufficiently different from what I'm used to that it's taking me a bit of time :)


As you have noticed, Lucene is not an RDBMS.  Searching through all
the text of all the books is more the use it was designed around; of
course the analogy might not be THAT strong with your need!

I think the fulltext search capabilities will serve us well for some aspects of our search needs. the stemming, language, and other filters will definitely be a help to just about everything we do.

speaking of language, this is my last question for now...

what's the idiomatic way to represent multiple languages? left to my own devices I'd probably do something like

   name_en-us
   name-es-us

anyway, thanks so much for your help.

--Geoff

Reply via email to