Hello all,

    I am using Solr Cloud today and I have the following need:

   - My queries focus on counting how many users attend to some criteria.
   So my main document is "user" (parent table)
   - Each user can access several web pages (a child table) and each web
   page might have several attributes.
   - I need to lookup for users where there is some page accessed by them
   which matches a set of attributes. For example, I have two scenarios:
      1. if a user accessed a web page WP1 with a URL that starts with
      "www." and with a title that includes "solr", then the user is a match.
      2. However, if there is a webpage WP1 with such url and ANOTHER WP2
      that includes "solr" in the title, this is not a match.


    If I were modeling this on a relational DB, user would be a table and
url would be other. However, as I using solr, my first option would be
denormalizing first. Simply storing all the fields in the user document
wouldn't work, as I would work as described in scenario 2.
     I thought in two solutions for these:

   - Using the idea of an inverted index - Having several kinds of
   documents (user, web page, entity 3, entity 4, etc.) where each entity (web
   page, for instance) would have a field to relate to the user id. Then,
   using a cross join in solr to get the results where there was a match on
   user (parent table) and also on each child entity (in other words, to merge
   the results of several queries that might return user ids). This has a
   drawback of using a join.
   - Having just a user document and storing each web page as only one
   field (like a json). To search, the same field would need to match a
   regular expression that includes both conditions. This would make my search
   slower and I would not be able to apply the same technique if the child
   tables also had children.

    Am I missing any obvious solution here? I would love to receive critics
on this, as I am probably not the only one who have this problem...  I
would like more ideas on how to denormalize data in this case.  Is the join
my best option here?

Best regards,
-- 
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr

Reply via email to