-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Just so we're all aware, we do have an initial Index implementation that handles 'lists' from James Westby: https://code.launchpad.net/~james-w/u1db/index-transformations/+merge/81069
On 11/17/2011 11:01 PM, John Rowland Lenton wrote: > On Thu, 17 Nov 2011 20:06:29 +0000, Stuart Langridge > <[email protected]> wrote: >> >> how would I do an index on people who have a work phone number? >> create_index("worknums", [ "phones.name" ]) ? That feels weird; >> the indexer would act differently depending on whether the value >> of "phones" is a dict or a list of dicts. Then again, maybe >> that's the answer; if a part of an index expression resolves to a >> list, then we do the remainder of the index expression for *each >> item in the list*. This would also cope with the above colours >> example, ignoring my reservations about it feeling weird. To me >> that makes a certain amount of sense. Thoughts? > > More questions: do you want to be able to create an index on the > names of an object? Do we want partial indexes? If we have an index > expression that transforms a string into a list of strings, do we > need to explicitly say that we want each of those added separately > to the index, rather than the list itself? 1) I think you mean by this something like: create_index('names', ['names()']) create_doc('{"a": 1, "b": 2}') get_from_index('names', ["a"]) Would then return the document. I can see where that could be useful, though if there are only a small number of names that you care about, then you can create an index for each one. 2) I'm not 100% sure what you mean by partial indexes here. If part of an index evaluates to 'null', then that document is not put into the index. Maybe you are taking it a step further and having an equality check? create_index('john', ['equal(name, "john")']) or create_index('john', ['name == "john"]) The former fits into our current syntax ok, the latter would be a possible transformation, but I imagine the syntax parser gets crazy when you start layering them. 3) I think here you mean do we want something like: create_index('favcolor', ["any(colour)"]) rather than just writing it as: create_index('favcolor', ["colour"]) And if the 'colour' field is a list, we just evaluate each item of the list. I think I agree that 'any()' seems superfluous. The question that remains is if we want an 'all()' function (flatten a list into a single item). As an example: create_index('all_colour', ['all(colours)']) get_from_index('all_colour', ['green']) returns Samuel get_from_index('all_colour', ['red']) returns [], nobody likes *just* red. get_from_index('all_colour', ['red|blue']) returns Stuart I don't think we want all() because its syntax is probably a set operation (red,blue) is the same as (blue,red)? And I think users can approximate it in user-space with: create_index('colour', ['colours']) docs = get_from_index('colour', ['red', 'blue']) for doc in docs: if 'red' not in doc.colours or 'blue' not in doc.colours: # doesn't like both continue ... > > I think the answer to those is no, yes, and no: I think the rule > for index expressions should be that they either resolve to a > single "scalar" value (one of string|number|true|false|null), which > is added to the index, or to a list, which scalar elements are > added sequentially to the index, and that if neither of those > happens it's not an error, it simply isn't added (I'm on the fence > as to whether lists that have list elements should have the > elements of the list elemenet added recursively; having to explain > that makes my head hurt a little. man perllol). That we should > provide no index functions to address individual items of a list; > if you need to treat the second item differently from the first, > then it should be an object, not a list. that > "name.split().lower()" (or "lower(split(name))", or > name|split|lower, or whatever) should result in the same values > added to the index as "name.lower().split()". And that we should > continue to enforce the semantics (in the same way i said "you > shouldn't care about the nth element of the list") by saying that > you shouldn't get into the situation where you have to create an > index on the keys. > > I also think that after describing what we want for the indexing > language, we need to look at what is the minimal thing we can do > that is useful, and do that first. That we shouldn't spend too much > time worrying about how we'd create an index of an object with 3 > layers of nested dicts and lists of lists; we can put hard limits > to the complexity of the expressions we admit, especially at > first. > > We're going to want to throw away the indexing language in a few > years (WHAT WERE WE THINKING?!? *hair pull*) and rewrite it, and > still admit the old expressions for backwards compatibility, so the > smaller it is (while still being useful) the less we'll have to > hack it up later. Yes? (probably preaching to the choir by now). I think you have some good points here. Something simple that is functional enough to get work done, and then iterate to find a better solution. John =:-> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Cygwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk7GF6oACgkQJdeBCYSNAAMRuwCfdtS2ihPUr0aeYqZWUZZAG9Do jIsAnjxpAlUei1lyMuglI3CgiMFrC5o7 =Q54u -----END PGP SIGNATURE----- -- Mailing list: https://launchpad.net/~u1db-discuss Post to : [email protected] Unsubscribe : https://launchpad.net/~u1db-discuss More help : https://help.launchpad.net/ListHelp

