Re: [xquery-talk] the sad state of query languages for semi-structured data in the NoSQL industry

daniela florescu Thu, 28 May 2015 22:37:36 -0700

Another message that I sent this morning, and it didn't make it 
though.....until now.


Thanks Marklogic for opening up the blockade.

I guess the MarkLogic  lawyers needed a little bit of time to scratch their 
heads about what to do.....(and BTW,
silencing me isn't a solution... I lived in a communist country for 22 years... 
they've tried that ... didn't work)

But the following message is a serious discussion about the state of affairs in 
the query languages universe for NoSQL
databases.


> On May 28, 2015, at 2:20 PM, daniela florescu <[email protected]> wrote:
> 
> The NoSQl industry is extremely successful, used everywhere, and  considered 
> by many the child prodigee of the database industry.
> 
> 
> They are proud of themselves because they satisfy user needs, aka:  they 
> store data:
> (a) which is not in 1st normal form (aka nested, pre-aggregated)
> (b) without schema
> 
> …to the practical  benefit of:
> (a) the application getting the data out of the database exactly as the 
> application needs it, and not 
> altered through a normalization phase.
> (b) the lack of fixed schema helps with data flexibility… things change 
> extremely quickly inside an application
> those days (fields being added, deleted, changed, etc)
> 
> 
> So far so good, and I think until here they are all right.
> 
> [[ One may think that this looks a little bit like … XML, but hey, they don’t 
> like XML. Fine.]]
> 
> The problems comes when they try to QUERY this data.
> 
> 
> The NoSQL industry is re-inventing the wheel from scratch, and in a very 
> chaotic and ad-hoc manner.
> 
> Just  look at the sad state of affairs in terms of  query languages and their 
> semantics.
> 
> I am just look at the ones who claim that they can store nested and 
> schema-less data (JSON-like, or XML-lIke)
> 
> (1) MongoDB
> http://docs.mongodb.org/manual/tutorial/query-documents/ 
> <http://docs.mongodb.org/manual/tutorial/query-documents/>
> 
> Note: pure JSON. Couldn’t find a simple sort, for example. Etc. Etc.
> 
> (2) Cassandra/DataStax
> http://www.datastax.com/wp-content/uploads/2013/03/cql_3_ref_card.pdf 
> <http://www.datastax.com/wp-content/uploads/2013/03/cql_3_ref_card.pdf>
> 
> Nore: not even an OR, or a NOT. And does it mean to sort on schema-less data ?
> 
> (3) Spark/DataBricks
> https://databricks.com/blog/2015/02/02/an-introduction-to-json-support-in-spark-sql.html
>  
> <https://databricks.com/blog/2015/02/02/an-introduction-to-json-support-in-spark-sql.html>
> 
> Note: sounds more like an import/export facility… but they call it a JSON 
> Query language
> 
> (4) Elastic Search
> https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html
>  
> <https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html>
> 
> Note: very sophisticated full text,but not structured search of any serious 
> kind. Just some simple aggregates (sum, etc)
> 
> 
> (5) Mulesoft
> https://www.mulesoft.com/press-center/new-release-june-2015?utm_source=linkedin&sthash.axJqiSBn.mjjo
>  
> <https://www.mulesoft.com/press-center/new-release-june-2015?utm_source=linkedin&sthash.axJqiSBn.mjjo>
> 
> Note: not only they seem to have their own JSON query language, but even 
> their own XML query language, it seems. couldn’t find more details.
> 
> (6) Hive
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF 
> <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF>
> 
> Note: multiple languages (Xpath, some json, some SQL, glued together somehow 
> chaotically)
> 
> I can fill in tons of pages with YET-ANOTHER-LANGUGAGE-LIKE-THIS. 
> 
> (7) MarkLogic
> 
> https://docs.marklogic.com/8.0/guide/app-dev/json 
> <https://docs.marklogic.com/8.0/guide/app-dev/json>
> 
> 
> 
> ==============
> 
> Now I can spot several mistake here:
> 
> 1. None of those query language has a clearly designed, mathematical data 
> model. in the absence of such a data model, that describes the input, the 
> output
> and the intermediate results of a query, how can we define a clean semantics ?
> 
> 2. All of them have a hacky semantics — “let’s run it and we’ll se what the 
> result is” kind of thing. The semantics in most cost corner cases — and by 
> definition
> semi-structured data is ONLY corner cases -- is not defined.
> 
> 3. Some try to piggy back on the SQL semantics, ignoring the fact that the 
> SQL was designed to work on relations, and JSON (or in general, nested data) 
> has nothing to do with relations.  SQL semantics cannot be “ported”….just 
> because we reuse the same keywords.
> 
> 4. None attempted to define a type system (even a basic one for atomic types 
> like dates, and arithmetics on them..) and a schema language.
> 
> ==============
> 
> 
> Now maybe it’s clear why I am so sad that the XQuery community, instead of 
> trying to help the younger and naive NoSQL community, which still believes 
> that
> SQL is “good enough”, and using the SELECT-FROM-WHERE keywords is the magic 
> bullet to define the semantics of any kind of query language, the XQuery 
> community
>  is still looking at their own navel, and marveling, like the well known CEO: 
> "we can handle flexible data" !!!
> 
> Just compare those languages I listed above with the work that has been done 
> in the past 16 years in XQuery, and the correctness and the complexity of the 
> result
> vs, the hacky solutions above.
> 
> P.S. And yes, that work from XQuery was used 100% in the design of JSONiq, 
> which was designed with the dual goal in mind:
> (a) reuse 100% of the experience of design and implementation of XQuery and
> (b) provide a query language that is synactically and semantically acceptable 
> for the JSON community.
> 
> if we succeeded or not, that’s another story, but I am not aware of any other 
> solution that even comes CLOSE to that goal.
> 
> 
> Best regards
> Dana
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>

_______________________________________________
[email protected]
http://x-query.com/mailman/listinfo/talk

Re: [xquery-talk] the sad state of query languages for semi-structured data in the NoSQL industry

Reply via email to