Is the schema consistent or do you have arrays, etc that are missing or empty?
Functions such as flatten will help to navigate, but you may want to use
predicates to filter out some unwanted records.
Example in the case of twitter data, where some records may not have hashtags,
but you are only interested in the ones that do. (Simple example to give you
the idea)
select t.id, t.entities.hashtags[0].text from `/twitter` t where
t.entities.hashtags[0].text is not null limit 10;
+---------------------+-------------+
| id | EXPR$1 |
+---------------------+-------------+
| 674336417162641408 | ETF |
| 674336421516214272 | AAPL |
| 674336424984903680 | mergers |
| 674336429250490368 | mergers |
| 674336432689823744 | mergers |
| 674336438352281601 | Yahoo |
| 674336460217040896 | MLB |
| 674336462406475776 | MLB |
| 674336469700317184 | BB8 |
| 674336495650611200 | SiliStocks |
+---------------------+-------------+
Flatten gives you a better result, but not always used if you are working on
deeper levels.
select t.id, flatten(t.entities.hashtags) from `/twitter` t limit 10;
+---------------------+-----------------------------------------+
| id | EXPR$1 |
+---------------------+-----------------------------------------+
| 674336417162641408 | {"indices":[91,95],"text":"ETF"} |
| 674336417162641408 | {"indices":[96,101],"text":"ETFs"} |
| 674336417162641408 | {"indices":[102,109],"text":"Stocks"} |
| 674336421516214272 | {"indices":[95,100],"text":"AAPL"} |
| 674336424984903680 | {"indices":[79,87],"text":"mergers"} |
| 674336429250490368 | {"indices":[79,87],"text":"mergers"} |
| 674336432689823744 | {"indices":[79,87],"text":"mergers"} |
| 674336438352281601 | {"indices":[68,74],"text":"Yahoo"} |
| 674336438352281601 | {"indices":[75,81],"text":"Multi"} |
| 674336438352281601 | {"indices":[82,92],"text":"Messenger"} |
+---------------------+-----------------------------------------+
In later versions of Drill you can use typeof to actually see the data type,
but what you will notice is that Drill identifies it as a MAP that you can't
cast as VARCHAR.
select t.id, typeof(flatten(t.entities.hashtags)) from `/twitter` t limit 10;
+---------------------+---------+
| id | EXPR$1 |
+---------------------+---------+
| 674336417162641408 | MAP |
| 674336417162641408 | MAP |
| 674336417162641408 | MAP |
You are better of to use dot notation and nested data functions to navigate the
nested MAPS and arrays; and leverage predicates as needed.
https://drill.apache.org/docs/nested-data-functions/
<https://drill.apache.org/docs/nested-data-functions/>
sqlline and Drill explorer are good tools as they display the MAPs as strings,
squirrel tends to just show <Unknown Type>. You may want to look into other
tools that potentially will work better.
Only the MapR Drill package include security on the WebUI/REST API to provide
the ability to enter user/pass, but you need to configure it.
http://doc.mapr.com/display/MapR/Configuring+Web+Console+and+REST+API+Security
<http://doc.mapr.com/display/MapR/Configuring+Web+Console+and+REST+API+Security>
I believe the REST API will return the JSON elements. sqlline uses JDBC, your
issue is likely the tools you are using that can't interpret MAPS correctly.
--Andries
> On Jan 6, 2016, at 3:45 PM, Eric Pederson <[email protected]> wrote:
>
> Dear all:
>
> I'm a new Drill user using version 1.0 (we are going to be upgrading to the
> latest MapR version, just not sure when).
>
> We are querying JSON data that is not too deep but has a few levels. My
> questions:
>
> When I query one of the top level elements using sqlline the output
> contains a string with the JSON nested under that element name. When I use
> SQuirrel to do the same queries I get "<Error>" for any nested fields or
> arrays. I was able to query elements that are arrays using
> flatten(element). Is there a similar function to convert nested
> JSON objects to strings? I tried using CAST(element as STRING) and
> CAST(element as VARCHAR) and got this error:
>
> Error: SYSTEM ERROR: org.apache.drill.exec.exception.SchemaChangeException:
> Failure while trying to materialize incoming schema. Errors:
>
> Error in expression at index -1. Error: Missing function implementation:
> [castVARCHAR(MAP-REQUIRED, BIGINT-REQUIRED)]. Full expression: --UNKNOWN
> EXPRESSION--..
>
>
> I also tried CONVERT_FROM/CONVERT TO and got these.
>
> Error: SYSTEM ERROR: org.apache.drill.exec.exception.SchemaChangeException:
> Failure while trying to materialize incoming schema. Errors:
>
> Error in expression at index -1. Error: Missing function implementation:
> [convert_fromjson(MAP-REQUIRED)]. Full expression: --UNKNOWN EXPRESSION--..
>
>
> Is it possible to convert from JSON to string using SQuirrel? I haven't
> tried with raw JDBC Java code but that's next on my list.
>
> I tried to use the JDBC driver in DB Visualizer and got this error. No
> queries worked.
>
> 18:33:22 [ERROR ExecutorRunner-pool-3-thread-1 C.?] Internal error while
> executing: select b.wcmo_deal.`INTEX$TRAN...
>
> java.lang.UnsupportedOperationException
>
> at
> org.apache.drill.exec.vector.accessor.GenericAccessor.isNull(GenericAccessor.java:33)
>
> at
> org.apache.drill.exec.vector.accessor.BoundCheckingAccessor.isNull(BoundCheckingAccessor.java:49)
>
> at
> org.apache.drill.jdbc.impl.TypeConvertingSqlAccessor.isNull(TypeConvertingSqlAccessor.java:51)
>
> at
> org.apache.drill.jdbc.AvaticaDrillSqlAccessor.wasNull(AvaticaDrillSqlAccessor.java:81)
>
> at
> org.apache.drill.jdbc.DrillAccessorList.wasNull(DrillAccessorList.java:59)
>
> at
> org.apache.drill.jdbc.DrillCursor.wasNull(DrillCursor.java:210)
>
> at
> net.hydromatic.avatica.AvaticaResultSet.wasNull(AvaticaResultSet.java:201)
>
> at com.onseven.dbvis.J.B.U.?(Z:399)
>
>
> Are the newer JDBC drivers backwards compatible with the older servers?
>
> A couple other questions. We have impersonation enabled. When I log in to
> sqlline or SQuirrel with an account that as permission to read the
> directory and files everything works ok. Otherwise I get an error "Table
> xxxx not found". That is fine. How do I supply a username to the Drill
> web interface? All queries using the web interface fail with the "Table
> xxxx not found" error. I also ran across the REST interface information in
> the user mailing list archives and tried that. I can query the
> INFORMATION_SCHEMA but not our tables. I tried passing the username as
> http://user:password@host:8047/query.json but that didn't work. How do I
> supply a username to the REST API?
>
> Finally, speaking of the REST API, once we have the authentication part
> resolved, will it return actual nested JSON elements in the results if that
> is what is in the file? (That is, not as a string, but as nested elements
> in the result?). The JDBC interface seems to be strictly tabular (which
> makes sense).
>
> Thanks!
>
>
>
>
> --
> Sent from Gmail Mobile