Another message that I sent this morning, and it didn't make it though.....until now.
Thanks Marklogic for opening up the blockade. I guess the MarkLogic lawyers needed a little bit of time to scratch their heads about what to do.....(and BTW, silencing me isn't a solution... I lived in a communist country for 22 years... they've tried that ... didn't work) But the following message is a serious discussion about the state of affairs in the query languages universe for NoSQL databases. > On May 28, 2015, at 2:20 PM, daniela florescu <[email protected]> wrote: > > The NoSQl industry is extremely successful, used everywhere, and considered > by many the child prodigee of the database industry. > > > They are proud of themselves because they satisfy user needs, aka: they > store data: > (a) which is not in 1st normal form (aka nested, pre-aggregated) > (b) without schema > > …to the practical benefit of: > (a) the application getting the data out of the database exactly as the > application needs it, and not > altered through a normalization phase. > (b) the lack of fixed schema helps with data flexibility… things change > extremely quickly inside an application > those days (fields being added, deleted, changed, etc) > > > So far so good, and I think until here they are all right. > > [[ One may think that this looks a little bit like … XML, but hey, they don’t > like XML. Fine.]] > > The problems comes when they try to QUERY this data. > > > The NoSQL industry is re-inventing the wheel from scratch, and in a very > chaotic and ad-hoc manner. > > Just look at the sad state of affairs in terms of query languages and their > semantics. > > I am just look at the ones who claim that they can store nested and > schema-less data (JSON-like, or XML-lIke) > > (1) MongoDB > http://docs.mongodb.org/manual/tutorial/query-documents/ > <http://docs.mongodb.org/manual/tutorial/query-documents/> > > Note: pure JSON. Couldn’t find a simple sort, for example. Etc. Etc. > > (2) Cassandra/DataStax > http://www.datastax.com/wp-content/uploads/2013/03/cql_3_ref_card.pdf > <http://www.datastax.com/wp-content/uploads/2013/03/cql_3_ref_card.pdf> > > Nore: not even an OR, or a NOT. And does it mean to sort on schema-less data ? > > (3) Spark/DataBricks > https://databricks.com/blog/2015/02/02/an-introduction-to-json-support-in-spark-sql.html > > <https://databricks.com/blog/2015/02/02/an-introduction-to-json-support-in-spark-sql.html> > > Note: sounds more like an import/export facility… but they call it a JSON > Query language > > (4) Elastic Search > https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html > > <https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html> > > Note: very sophisticated full text,but not structured search of any serious > kind. Just some simple aggregates (sum, etc) > > > (5) Mulesoft > https://www.mulesoft.com/press-center/new-release-june-2015?utm_source=linkedin&sthash.axJqiSBn.mjjo > > <https://www.mulesoft.com/press-center/new-release-june-2015?utm_source=linkedin&sthash.axJqiSBn.mjjo> > > Note: not only they seem to have their own JSON query language, but even > their own XML query language, it seems. couldn’t find more details. > > (6) Hive > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF > <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF> > > Note: multiple languages (Xpath, some json, some SQL, glued together somehow > chaotically) > > I can fill in tons of pages with YET-ANOTHER-LANGUGAGE-LIKE-THIS. > > (7) MarkLogic > > https://docs.marklogic.com/8.0/guide/app-dev/json > <https://docs.marklogic.com/8.0/guide/app-dev/json> > > > > ============== > > Now I can spot several mistake here: > > 1. None of those query language has a clearly designed, mathematical data > model. in the absence of such a data model, that describes the input, the > output > and the intermediate results of a query, how can we define a clean semantics ? > > 2. All of them have a hacky semantics — “let’s run it and we’ll se what the > result is” kind of thing. The semantics in most cost corner cases — and by > definition > semi-structured data is ONLY corner cases -- is not defined. > > 3. Some try to piggy back on the SQL semantics, ignoring the fact that the > SQL was designed to work on relations, and JSON (or in general, nested data) > has nothing to do with relations. SQL semantics cannot be “ported”….just > because we reuse the same keywords. > > 4. None attempted to define a type system (even a basic one for atomic types > like dates, and arithmetics on them..) and a schema language. > > ============== > > > Now maybe it’s clear why I am so sad that the XQuery community, instead of > trying to help the younger and naive NoSQL community, which still believes > that > SQL is “good enough”, and using the SELECT-FROM-WHERE keywords is the magic > bullet to define the semantics of any kind of query language, the XQuery > community > is still looking at their own navel, and marveling, like the well known CEO: > "we can handle flexible data" !!! > > Just compare those languages I listed above with the work that has been done > in the past 16 years in XQuery, and the correctness and the complexity of the > result > vs, the hacky solutions above. > > P.S. And yes, that work from XQuery was used 100% in the design of JSONiq, > which was designed with the dual goal in mind: > (a) reuse 100% of the experience of design and implementation of XQuery and > (b) provide a query language that is synactically and semantically acceptable > for the JSON community. > > if we succeeded or not, that’s another story, but I am not aware of any other > solution that even comes CLOSE to that goal. > > > Best regards > Dana > > > > > > > > > > >
_______________________________________________ [email protected] http://x-query.com/mailman/listinfo/talk
