Is it possible to run a single node drillbit without Zookeeper, as a
"service" without the need for coordination across multiple nodes?
`zk.connect: "local"` is not accepted as the equivalent of "zk=local" with
drill-embedded.
Perhaps an "awesome-drill" repo on GitHub would be a place to back fill the
book, and serve as a central location for thins like the list you supplied:
https://github.com/topics/awesome
On Tue, Nov 5, 2019 at 9:14 AM Charles Givre wrote:
> One more thing: I've found code for storage plugins
It works using 9.4 java8 version. Thanks!
On Thu, Jul 25, 2019 at 12:07 PM wrote:
> Hi Matt, I tried with 9.4 jt400.rar and it works for me
> With this parameters
>
> {
> "type": "jdbc",
> "driver": "com.ibm.as400.access.AS400JDBCDriver
Is anyone successfully using the jt400 jdbc driver with Drill? I am trying
to add a storage plugin but when I go to create it in the web gui I'm
getting an error:
Please retry: Error while creating / updating storage :
java.sql.SQLException: Cannot create PoolableConnectionFactory (The
Have 4 nodes running drillbits version 1.14 for queries over JSON files in
the regular filesystem (not HDFS).
Each node has an identical directory structure, but not all file names
exist on all nodes, and any query in the form of "SELECT ... FROM
dfs.logs.`logs*.json.gz`" fails with:
Error:
Have 4 nodes running drillbits version 1.14 for queries over JSON files in
the regular filesystem (not HDFS).
Each node has an identical directory structure, but not all file names
exist on all nodes, and any query in the form of "SELECT ... FROM
dfs.logs.`logs*.json.gz`" fails with:
Error:
https://issues.apache.org/jira/browse/DRILL-6723
On Mon, Aug 27, 2018 at 12:27 PM Matt wrote:
> I have a Kafka topic with some non-JSON test messages in it, resulting in
> errors "Error: DATA_READ ERROR: Failure while reading messages from kafka.
> Recordreader was at record:
I have a Kafka topic with some non-JSON test messages in it, resulting in
errors "Error: DATA_READ ERROR: Failure while reading messages from kafka.
Recordreader was at record: 1"
I don't seem to be able to bypass these topic messages with
"store.json.reader.skip_invalid_records" or even an
I note there are some old Jira issues about Cassandra storage, and have
this concept as to why it could be very valuable for Drill. Can anyone
support or refute the idea?
Cassandra is an excellent engine for high volume ingest, but support for
aggregations and scans is very limited. Would a Drill
A counter point: I would be concerned that Drill would be overshadowed by
more “popular” or more entrenched platforms.
Drill is an excellent and somewhat unique tech that needs more exposure to
grow. An event that focuses purely on Drill may have better success at that.
The caveat may be that a
Using a calendar table with monthly start and end dates, I am attempting
to count records in another table that has cycle start and end dates.
In PostgreSQL I would either use a date range type, or in standard SQL
do something like:
```
SELECT m.startdate as monthdate, COUNT(distinct
Drill is not SQL Server, and not expected to work identically.
Using the upper() and lower() functions is a common approach, unless you find
options to set the collation sort order in the Drill docs.
> On Feb 8, 2017, at 1:13 PM, Dechang Gu wrote:
>
> Sanjiv,
>
> Can you
I have JSON data with with a nested list and am using FLATTEN to extract
two of three list elements as:
~~~
SELECT id, FLATTEN(data)[0] AS dttm, FLATTEN(data)[1] AS result FROM ...
~~~
This works, but each FLATTEN seems to slow the query down dramatically,
3x slower with the second flatten.
r, similar to what you
reported. I had to set the "skipFirstLine" option to true, for it to
work.
Strangely, for subsequent queries, it works even after removing /
disabling
the "skipFirstLine" option. This could be a bug, but I'm not able to
reproduce it right now. Will file a JIRA once
With files in the local filesystem, and an embedded drill bit from the
download on drill.apache.org, I can successfully query csv data by
column name with the extractHeader option on, as in SELECT customer_if
FROM `file`;
But in a MapR cluster (v. 5.1.0.37549.GA) with the data in MapR-FS, the
ion solved the
problem for me:
CAST(COALESCE(t_total, 0.0) AS double)
On Fri, Mar 11, 2016 at 12:45 AM, Matt <bsg...@gmail.com> wrote:
~~~
00-01 Project(date_tm=[CAST($23):TIMESTAMP(0)],
id_1=[CAST($11):VARCHAR(1) CHARACTER SET "ISO-8859-1" COLLATE
"ISO-8859-1$en_US$prima
}, {
"ref" : "`b_1250`",
"expr" : "cast( ( ( if (isnotnull(`b_1250`) ) then (`b_1250` )
else (0 ) end ) ) as BIGINT )"
}, {
"ref" : "`t_1250`",
"expr" : "cast( ( ( if (isnotnull(`t_1250`) )
(which may be empty string) out of a CSV file. You
should instead write out a full case statement that checks for empty
string
and provides your default value of 0 in that case.
- Jason
Jason Altekruse
Software Engineer at Dremio
Apache Drill Committer
On Thu, Mar 10, 2016 at 2:32 PM, Matt <
PM, Matt <bsg...@gmail.com> wrote:
The CTAS with fails with:
~~~
Error: SYSTEM ERROR: IllegalArgumentException: length: -260
(expected: >=
0)
Fragment 1:2
[Error Id: 1807615e-4385-4f85-8402-5900aaa568e9 on es07:31010]
(java.lang.IllegalArgumentException) length: -260 (expec
Getting some errors when attempting to create Parquet files from CSV
data, and trying to determine if it is due to the format of the source
data.
Its a fairly simple format of
"datetime,key,key,key,numeric,numeric,numeric, ..." with 32 of those
numeric columns in total.
The source data
sqlline -u ... -q 'SELECT * FROM dfs.`/path/to/files/file.csv` LIMIT 10'
seems to emit a list of files in the local path (pwd), along with a
parsing error.
Putting the query in a file and passing that file name to sqlline or
using an explicit column list runs the query as expected.
Is this
On 26 Jan 2016, at 12:55, Abdel Hakim Deneche wrote:
Does a select * on the same data also fail ?
On Tue, Jan 26, 2016 at 9:44 AM, Matt <bsg...@gmail.com> wrote:
Getting some errors when attempting to create Parquet files from CSV
data,
and trying to determine if it is due to the format of
Can you try enabling verbose errors and
run
the query again, this should provide us with more details about the
error.
You can enable verbose error by running the following before the
select *:
alter session set `exec.errors.verbose`=true;
thanks
On Tue, Jan 26, 2016 at 11:01 AM, Matt <bsg...@
Running a CTAS from csv files in a 4 node HDFS cluster into a Parquet
file, and I note the physical plan in the Drill UI references scans of
all the csv sources on a single node.
collectl implies read and write IO on all 4 nodes - does this imply that
the full cluster is used for scanning the
Converting CSV files to Parquet with CTAS, and getting errors on some
larger files:
With a source file of 16.34GB (as reported in the HDFS explorer):
~~~
create table `/parquet/customer_20151017` partition by (date_tm) AS
select * from `/csv/customer/customer_20151017.csv`;
Error: SYSTEM
I think Tableau also uses the first query to fetch the structure /
metadata of the expected result set.
We have often eliminated performance issues using Tableau by hiding
the structure of queries by putting them in database views. Could that
be a possible solution here?
On 17 Aug 2015, at
On 23 Jul 2015, at 10:53, Abdel Hakim Deneche wrote:
When you try to read schema-less data, Drill will first investigate
the
1000 rows to figure out a schema for your data, then it will use this
schema for the remaining of the query.
To clarify, if the JSON schema changes on the 1001st 1MMth
I have seen some discussions on the Parquet storage format suggesting
that sorting time series data on the time key prior to converting to the
Parquet format will improve range query efficiency via min/max values on
column chunks - perhaps analogous to skip indexes?
Is this a recommended
10:57 AM, Matt wrote:
Did you check the log files for any errors?
No messages related to this query containing errors or warning, nor
nothing mentioning memory or heap. Querying now to determine what is
missing in the parquet destination.
drillbit.out on the master shows no error messages
memory per node.
DRILL_HEAP is for the heap size per node.
More info here
http://drill.apache.org/docs/configuring-drill-memory/
—Andries
On May 28, 2015, at 11:09 AM, Matt bsg...@gmail.com wrote:
Referencing http://drill.apache.org/docs/configuring-drill-memory/
Is DRILL_MAX_DIRECT_MEMORY
, at 13:42, Andries Engelbrecht wrote:
It should execute multi threaded, need to check on text file.
Did you check the log files for any errors?
On May 28, 2015, at 10:36 AM, Matt bsg...@gmail.com wrote:
The time seems pretty long for that file size. What type of file is
it?
Tab delimited UTF-8
for the query. I
believe writing parquet may still be the most heap-intensive operation in
Drill, despite our efforts to refactor the write path to use direct memory
instead of on-heap for large buffers needed in the process of creating
parquet files.
On Thu, May 28, 2015 at 8:43 AM, Matt bsg
On May 28, 2015, at 8:43 AM, Matt bsg...@gmail.com wrote:
Is 300MM records too much to do in a single CTAS statement?
After almost 23 hours I killed the query (^c) and it returned:
~~~
+---++
| Fragment | Number of records written
bits.
How large is the data set you are working with, and your
cluster/nodes?
—Andries
On May 28, 2015, at 9:17 AM, Matt bsg...@gmail.com wrote:
To make sure I am adjusting the correct config, these are heap
parameters within the Drill configure path, not for Hadoop or
Zookeeper?
On May
FS source as long
as it is consistent to all nodes in the cluster, but keep in mind that
Drill can process a lot of data quickly, and for best performance and
consistency you will likely find that the sooner you get the data to
the DFS the better.
On May 26, 2015, at 5:58 PM, Matt bsg
wrote:
Perhaps I’m missing something here.
Why not create a DFS plug in for HDFS and put the file in HDFS?
On May 26, 2015, at 4:54 PM, Matt bsg...@gmail.com wrote:
New installation with Hadoop 2.7 and Drill 1.0 on 4 nodes, it appears text
files need to be on all nodes in a cluster
mechanisms from remote systems you can look at using NFS,
MapR has a really robust NFS integration and you can use it with the
community edition.
On May 26, 2015, at 5:11 PM, Matt bsg...@gmail.com wrote:
That might be the end goal, but currently I don't have an HDFS ingest
mechanism
involved:
http://tshiran.github.io/drill/docs/querying-plain-text-files/#example-of-querying-a-tsv-file
Kristine Hahn
Sr. Technical Writer
415-497-8107 @krishahn
On Sun, May 24, 2015 at 1:56 PM, Matt bsg...@gmail.com wrote:
I have used a single node install (unzip and run) to query local text
involved:
http://tshiran.github.io/drill/docs/querying-plain-text-files/#example-of-querying-a-tsv-file
Kristine Hahn
Sr. Technical Writer
415-497-8107 @krishahn
On Sun, May 24, 2015 at 1:56 PM, Matt bsg...@gmail.com wrote:
I have used a single node install (unzip and run) to query local text
Is each file a single json array object?
If so, would converting the files to a format with one line per record a
potential solution?
Example using jq (http://stedolan.github.io/jq/): jq -c '.[]'
On 19 Mar 2015, at 12:22, Jim Bates wrote:
I constantly, constantly, constantly hit this.
I
40 matches
Mail list logo