Re: Issue faced in Apache drill

2019-04-09 Thread rahul challapalli
My above solution made an implicit assumption that we return null even if a single value in the column b is null. However you can modify the query to replace nulls with 0's if that is what you want to do. On Tue, Apr 9, 2019 at 4:41 PM rahul challapalli wrote: > I haven't tried

Re: Issue faced in Apache drill

2019-04-09 Thread rahul challapalli
I haven't tried it myself but something like the below workaround should be helpful select a, case when exists (select 1 from dfs.`sample.json` where b is null) then null else sum(b) end from dfs.`sample.json` group by a - Rahul On Tue, Apr 9, 2019 at 4:32 PM Gayathri Selvaraj < gay

Re: Apache Drill issue

2018-06-04 Thread rahul challapalli
Additionally to what padma said, it would be helpful if you can post the query that you are trying to execute. Also as a sanity check can you list the tables present in hive? Run the below commands use hive; show tables; On Mon, Jun 4, 2018 at 8:05 AM, Padma Penumarthy wrote: > Did you verify t

Re: question about views

2018-03-19 Thread rahul challapalli
First I would suggest to ignore the view and try out a query which has the required filters as part of the subqueries on both sides of the union (for both the database and partitioned parquet data). The plan for such a query should have the answers to your question. If both the subqueries independe

Re: [MongoDB] How does drill aggregate data

2017-10-02 Thread rahul challapalli
This will largely depend on the implementation of the Mongo DB storage plugin. Based on my glimpse at the plugin code [1], it looks like we read all the data from Mongo DB and then aggregation in drill [1] https://github.com/apache/drill/blob/master/contrib/storage-mongo/src/main/java/org/apache/d

Re: Query Optimization

2017-08-17 Thread rahul challapalli
Could you be running into https://issues.apache.org/jira/browse/DRILL-3846 ? - Rahul On Thu, Aug 17, 2017 at 9:13 PM, Padma Penumarthy wrote: > It is supposed to work like you expected. May be you are running into a > bug. > Why is it reading all files after metadata refresh ? That is difficult

Re: append data to already existing table saved in parquet format

2017-07-25 Thread rahul challapalli
I am not aware of any clean way to do this. However if your data is partitioned based on directories, then you can use the below hack which leverages temporary tables [1]. Essentially, you backup your partition to a temp table, then override it by taking the union of new partition data and existing

Re: Index out of bounds for SELECT * from 'directory'

2017-07-13 Thread rahul challapalli
With the amount of information provided its hard to guide you. Index out of bounds errors are generally bad as they indicate some accounting corruption. I suggest that you go ahead and file a jira with the below information 1. Query 2. Drill Version 3. Data sets used 4. Logs and profiles 5. files

Re: Reading Parquet files with array or list columns

2017-06-30 Thread rahul challapalli
crashed. The Spark script to do it > runs in 14 minutes successfully. > > - Dave > > On Fri, Jun 30, 2017 at 1:38 PM, rahul challapalli < > challapallira...@gmail.com> wrote: > > > Like I suggested in the comment for DRILL-5183, can you try using a view > as > > a

Re: Reading Parquet files with array or list columns

2017-06-30 Thread rahul challapalli
Like I suggested in the comment for DRILL-5183, can you try using a view as a workaround until the issue gets resolved? On Fri, Jun 30, 2017 at 10:41 AM, David Kincaid wrote: > As far as I was able to discern it is not possible to actually use this > column as an array in Drill at all. It just d

Re: Pushing down Joins, Aggregates and filters, and data distribution questions

2017-06-01 Thread rahul challapalli
I would first recommend you spend some time reading the execution flow inside drill [1]. Try to understand specifically what major/minor fragments are and that different major fragments can have different levels of parallelism. Let us take a simple query which runs on a 2 node cluster select * fr

Re: Partitioning for parquet

2017-05-31 Thread rahul challapalli
-11' and '2017-01-23'; > > Is that correct? > > On Wed, May 31, 2017 at 4:49 PM, rahul challapalli < > challapallira...@gmail.com> wrote: > > > How to partition data is dependent on how you want to access your data. > If > > you can foresee that

Re: Partitioning for parquet

2017-05-31 Thread rahul challapalli
How to partition data is dependent on how you want to access your data. If you can foresee that most of the queries use year and month, then go-ahead and partition the data on those 2 columns. You can do that like below create table partitioned_data partition by (yr, mnth) as select extract(year f

Re: Apache Drill takes 5-6 secs in fetching 1000 records from PostgreSQL table

2017-05-30 Thread rahul challapalli
5-6 seconds is a lot of time for the query and dataset size you mentioned. Did you check the profile to see where the time is being spent? On Tue, May 30, 2017 at 2:53 AM, wrote: > Hi, > > I am creating an UNLOGGED table in PostgreSQL and reading it using Apache > Drill. Table contains just one

Re: External Sort - Unable to Allocate Buffer error

2017-05-02 Thread rahul challapalli
This is clearly a bug and like zelaine suggested the new sort is still work in progress. We have a few similar bugs open for the new sort. I could have pointed to the jira's but unfortunately JIRA is not working for me due to firewall issues. Another suggestion is build drill from the latest maste

Re: Apache Drill Query Planning Performance

2017-04-26 Thread rahul challapalli
If your hive metastore contains a lot of metadata (many databases, tables, columns etc), then drill might spend a significant time in fetching the metadata the first time. It caches the metadata, so subsequent runs should be faster. The fact that other queries are run in-between the first and secon

Re: Support for ORC files

2017-04-13 Thread rahul challapalli
What you need is a format plugin. You can take a look at the Text Format plugin while reading paul's documentation which abhishek already shared. Don't look at parquet as it is more complicated. A short summary of what you need : (maybe too short to be any useful :) ) 1. A group of classes which m

Re: Support for ORC files

2017-04-13 Thread rahul challapalli
Drill indirectly supports reading ORC files through the hive plugin. Apart from that I am not aware of any efforts in coming up with a format plugin for orc from the community. Rahul On Apr 13, 2017 2:19 PM, "Manoj Murumkar" wrote: > Hi! > > I am wondering if someone is actively working on ORC

Re: Quoting queries

2017-03-30 Thread rahul challapalli
Hmm...strange. It works for me on drill 1.9.0 from the sqlline client. Can you try running it from sqlline just so that we can eliminate other tools trying to do some validation and failing? 0: jdbc:drill:zk=x.x.x.x:5181> select * from `a/b/c.json`; *+-+* *| **id ** |* *+-+* *| *1 * |

Re: JDBC disconnections over remote networks

2017-03-30 Thread rahul challapalli
ar less severe? > > Wes > > > On Tue, Mar 28, 2017 at 1:42 PM, rahul challapalli < > challapallira...@gmail.com> wrote: > > > Also how much memory did you configure your client to use? If the client > > does not have sufficient memory to run, then garbage

Re: Apache Drill Clarification on Reading Parquet files

2017-03-29 Thread rahul challapalli
Welcome to the community and we are glad you are considering drill for your use-case. 1. There are a few ways in which you can make drill avoid reading all the files. Take a look at the below items a) Partition your data and store the partition information in the parquet footer. Documentatio

Re: JDBC disconnections over remote networks

2017-03-28 Thread rahul challapalli
sometimes works and sometimes doesn't. Is that useful to know? I can do > a little binary searching on values if that would help. > > Wes > > > On Mon, Mar 27, 2017 at 4:13 PM, rahul challapalli < > challapallira...@gmail.com> wrote: > > > Do you think th

Re: JDBC disconnections over remote networks

2017-03-27 Thread rahul challapalli
Do you think that the error you are seeing is related to DRILL-4708 ? If not kindly provide more information about the error (message, stack trace etc). And also does the connection error happen consistently after returning X number of records or i

Re: Minimise query plan time for dfs plugin for local file system on tsv file

2017-03-07 Thread rahul challapalli
i am not sure where planning time of the order of 30 secs is consumed. > > Please help > > Regards, > Projjwal > > > > > > > > On Mon, Mar 6, 2017 at 11:23 PM, rahul challapalli < > challapallira...@gmail.com> wrote: > >> You can try the below things. F

Re: Minimise query plan time for dfs plugin for local file system on tsv file

2017-03-06 Thread rahul challapalli
You can try the below things. For each of the below check the planning time individually 1. Run explain plan for a simple "select * from ` /scratch/localdisk/drill/testdata/Cust_1G_tsv`" 2. Replace the '*' in your query with explicit column names 3. Remove the extract header from your storage plug

Re: Explain Plan for Parquet data is taking a lot of timre

2017-03-06 Thread rahul challapalli
> > > > > >> On Feb 24, 2017, at 7:26 AM, Andries Engelbrecht aengelbre...@mapr.com"aengelbre...@mapr.com> wrote: > > >> > > >> Looks like the metadata cache is being used "usedMetadataFile=true, ". > But to be sure did you perfor

Re: Metadata Caching

2017-03-06 Thread rahul challapalli
There is no need to refresh the metadata for every query. You only need to generate the metadata cache once for each folder. Now if your data gets updated, then any subsequent query you submit will automatically refresh the metadata cache. Again you need not run the "refresh table metadata " comman

Re: Drill 1.9 Null pointer Exception

2017-03-03 Thread rahul challapalli
It looks like you are trying to query a hive table (backed by a hbase table) from drill. Can you try querying the same table from hive itself? I would also login to hbase and check whether the underlying table exists or not On Thu, Mar 2, 2017 at 2:14 AM, Khurram Faraaz wrote: > Can you please s

Re: RetriesExhaustedException in drill

2017-03-02 Thread rahul challapalli
It would be helpful if you provide more context. 1. What sort of query are you running 2. Where is the data stored and in what format 3. What is the size of the data 4. Full stack trace of the exception - Rahul On Thu, Mar 2, 2017 at 3:11 AM, prasanna lakshmi wrote: > Hi All, > > In regu

Re: Explain Plan for Parquet data is taking a lot of timre

2017-02-23 Thread rahul challapalli
You said there are 2144 parquet files but the plan suggests that you only have a single parquet file. In any case its a long time to plan the query. Did you try the metadata caching feature [1]? Also how many rowgroups and columns are present in the parquet file? [1] https://drill.apache.org/docs

Re: Issue with drill query

2017-02-07 Thread rahul challapalli
Your query is the longest query I have heard of :) In any case, lets try the below steps : 1. Can you first try your query directly on hive? If hive reports an error during its metadata operations, then you can expect drill to fail as well during planning. 2. Increase the heap memory : Since the

Re: Storage Plugin for accessing Hive ORC Table from Drill

2017-01-22 Thread rahul challapalli
As chunhui mentioned this could very well be a compatibility issue of drill with hive 2.0. Since drill has never been tested against hive 2.0, this is not a total surprise. Can you try the below 2 things 1. Make sure you can read the table with hive. 2. Create a very simple hive orc table with a s

Re: Array or list type attributes from MongoDB

2017-01-18 Thread rahul challapalli
I suggest that you try using "flatten" [1] along with the mongo db storage plugin. I did not understand what you meant by "entities could be offered to QlikSense as if it were different tables". Below is an example of flatten usage select d.id, flatten(d.doc.files), d.doc.requestId from ( sel

Re: Stored Procedure & Function in Apache

2017-01-18 Thread rahul challapalli
I believe you have to re-write your sqlserver procedures leveraging the CTAS/DROP commands of drill keeping in mind that drill does not support INSERT/UPDATE commands. [1] https://drill.apache.org/docs/create-table-as-ctas/ [2] https://drill.apache.org/docs/drop-table/ - Rahul On Wed, Jan 18, 20

Re: WARC files

2017-01-17 Thread rahul challapalli
I believe what you you need is a format plugin. Once you manage to read a file and populate drill's internal data structures(value vectors), then the format of the file no longer comes into picture. So from here on you can use any sql operators (filter, join etc) or UDF's To my knowledge there is

Re: Directory Based Partition Pruning Documentation

2016-11-16 Thread rahul challapalli
lease file a JIRA indicating the doc gap, > reference any related JIRAs, and assign it to me. > > Thanks, > Bridget > > On Wed, Nov 16, 2016 at 11:33 AM, rahul challapalli < > challapallira...@gmail.com> wrote: > > > Folks, > > > > After a quick glance thro

Directory Based Partition Pruning Documentation

2016-11-16 Thread rahul challapalli
Folks, After a quick glance through our documentation, I couldn't find much about directory based partition pruning feature in drill. All I could find was [1]. Can someone point me to the relevant docs on this feature? [1] https://drill.apache.org/docs/partition-pruning/ - Rahul

Re: Drill where clause vs Hive on non-partition column

2016-11-16 Thread rahul challapalli
are partitioned by? > > I think option 3 is probably the way to go. Is there a ticket tracking > work on this? > > Thanks again > > On Tue, Nov 15, 2016 at 10:25 AM, rahul challapalli < > challapallira...@gmail.com> wrote: > > > Robert's suggestion is with

Re: MySQL CONNECTION_ID() equivalent in Drill

2016-11-15 Thread rahul challapalli
I couldn't find any documented functions which do what you are describing. On Mon, Nov 14, 2016 at 2:01 AM, Nagarajan Chinnasamy < nagarajanchinnas...@gmail.com> wrote: > Hi, > > I would like to know if there is a function or column in system tables that > is equivalent to MySQL's CONNECTION_ID f

Re: Drill where clause vs Hive on non-partition column

2016-11-15 Thread rahul challapalli
; Might > > > be a hive question but all we do is 'STORED AS parquet' and then during > > > insert set the parquet.* properties. I'm just trying to see if #2 is > an > > > option for us to utilize filter pushdown via dfs > > > > > > On

Re: Drill where clause vs Hive on non-partition column

2016-11-14 Thread rahul challapalli
rt filter pushdown for > #1? Do you know if we run analyze stats through hive on a parquet file if > that will have enough info to do the pushdown? > > Thanks again. > > On Mon, Nov 14, 2016 at 9:50 AM, rahul challapalli < > challapallira...@gmail.com> wrote: > > >

Re: INTERVAL date arithmetic

2016-11-14 Thread rahul challapalli
0:00.000Z" > > ​thanks > ​ > > On 10 November 2016 at 16:50, rahul challapalli < > challapallira...@gmail.com> > wrote: > > > Can you try the below query? > > > > select extract(day from cast(p.post.published_at as interval day)) >

Re: Drill where clause vs Hive on non-partition column

2016-11-14 Thread rahul challapalli
Sonny, If the underlying data in the hive table is in parquet format, there are 3 ways to query from drill : 1. Using the hive plugin : This does not support filter pushdown for any formats (ORC, Parquet, Text...etc) 2. Directly Querying the folder in maprfs/hdfs which contains the parquet files

Re: SYSTEM ERROR: CompileException

2016-11-11 Thread rahul challapalli
This is a bug and its weird that changing the literal in the condition makes it work. Can you go ahead and raise a jira for the same? On Thu, Nov 10, 2016 at 10:58 AM, Josson Paul wrote: > Hi, > > My query is below > > select MIN(case when (CMP__acIds like '%6%') then A__ln__intrRt else > null

Re: INTERVAL date arithmetic

2016-11-10 Thread rahul challapalli
Can you try the below query? select extract(day from cast(p.post.published_at as interval day)) from dfs.data.ghost_posts_rm p; - Rahul On Thu, Nov 10, 2016 at 3:01 AM, Robin Moffatt < robin.moff...@rittmanmead.com> wrote: > Hi, > I have a date in a table, that I want to calculate how many days

Re: Parquet Date Format Problem

2016-11-01 Thread rahul challapalli
The fix will be available with the Drill 1.9 release unless you want to build from source yourself. On Tue, Nov 1, 2016 at 11:24 AM, Lee, David wrote: > Nevermind.. Found the problem.. > > https://issues.apache.org/jira/browse/DRILL-4203 > > > David Lee > Vice President | BlackRock > Phone: +1.4

Re: Drill issue - Reading DATE & TIME data type from Parquet

2016-10-17 Thread rahul challapalli
Amarnath, This is a known issue and the work is already in progress. You can track it using [1] [1] https://issues.apache.org/jira/browse/DRILL-4203 - Rahul On Mon, Oct 17, 2016 at 9:53 AM, Amarnath Vibhute < amarnath.vibh...@gmail.com> wrote: > Dear Team, > > I have started using Drill recent

Re: Reading column from parquet file saved using spark.1.6

2016-10-17 Thread rahul challapalli
This is tracked by https://issues.apache.org/jira/browse/DRILL-4203 On Mon, Oct 17, 2016 at 10:14 AM, Tushar Pathare wrote: > > Hello Team, > I am getting wrong values for date > columns(START_DT,END_DT)timestamp while querying. > Please see the attached screenshot.I am usin

Re: Query hangs on planning

2016-09-01 Thread rahul challapalli
While planning we use heap memory. 2GB of heap should be sufficient for what you mentioned. This looks like a bug to me. Can you raise a jira for the same? And it would be super helpful if you can also attach the data set used. Rahul On Wed, Aug 31, 2016 at 9:14 AM, Oscar Morante wrote: > Sure,

Re: Querying Delimited Sequence file

2016-08-31 Thread rahul challapalli
57-01-node-01.moffatt.me:> SELECT > split_part(CONVERT_FROM(binary_value, 'UTF8'),chr(1),2) from > `/user/oracle/seq/pdb.soe.logon` limit 5; > Error: SYSTEM ERROR: IllegalArgumentException: length: -6 (expected: >= 0) > > Fragment 0:0 > > [Error Id: b4f18223-

Re: Drill Queries Timing Out

2016-08-31 Thread rahul challapalli
Scott, Can u post the drill profile for the run with 40 nodes and 60 nodes? I am assuming that you are using the same version of drill in both scenarios. If not let us know. Rahul On Aug 31, 2016 8:28 AM, "scott" wrote: > > Hello, > I'm having some performance issues testing Drill on a large

Re: Querying Delimited Sequence file

2016-08-30 Thread rahul challapalli
Also you can refer to [1] for the list of string functions implemented. [1] https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/StringFunctions.java On Tue, Aug 30, 2016 at 11:06 AM, rahul challapalli < challapallira...@gmail.com>

Re: Querying Delimited Sequence file

2016-08-30 Thread rahul challapalli
You should be able to use split_part function (I haven't tried it myself...but it is supported). With this function you can extract individual columns. Unfortunately I couldn't find the documentation for this function as well. But it should be similar to how other databases implement this function.

Re: Partition reading problem (like operator) while using hive partition table in drill

2016-08-03 Thread rahul challapalli
DRILL-4665 has been fixed. Can you try it out with the latest master and see if it works for you now? - Rahul On Wed, Aug 3, 2016 at 10:28 AM, Shankar Mane wrote: > has any 1 started working on this ? > > On Wed, Jun 1, 2016 at 8:27 PM, Zelaine Fong wrote: > > > Shankar, > > > > Work on this i

Re: How drill works internally

2016-07-25 Thread rahul challapalli
You can start with the high level architecture [1]. Then the community might help you if you have any specific questions. [1] https://drill.apache.org/architecture/ On Sun, Jul 24, 2016 at 11:36 PM, Sanjiv Kumar wrote: > How drill runs query internally. I want to know how drill execute query fo

Re: Performance with multiple FLATTENs

2016-07-19 Thread rahul challapalli
Matt, Having multiple flatten's in your query leads to cross-join between the output of each flatten. So a performance hit is expected with the addition of each flatten. And there could also be a genuine performance bug for this scenario. To be sure it is a bug we need more information as Abhishek

Re: Best way to set schema to handle different json structures

2016-07-11 Thread rahul challapalli
Did you try creating a view with the merged schema? Then you can try running all your queries on top of that view. - Rahul On Mon, Jul 11, 2016 at 3:23 PM, Scott Kinney wrote: > We have several different json structures we want to run queries across. I > can take a sample of each and merge the

Re: CHAR data type

2016-07-11 Thread rahul challapalli
gt; > > On Mon, Jul 4, 2016 at 12:06 PM, rahul challapalli < > challapallira...@gmail.com> wrote: > > > Can you point us to where you are looking? The documentation should only > > say that "CHAR" datatype in hive is supported from Drill 1.7 onward. > > &

Re: Looking for workaround to Schema detection problems

2016-07-08 Thread rahul challapalli
In the past setting the below parameter still did not fix the issue. But still worth a try ALTER SESSION SET `store.json.all_text_mode` = true; You might also want to try explicit casting to varchar for this specific column. On Fri, Jul 8, 2016 at 8:14 AM, Zelaine Fong wrote: > Have you tried

Re: What are the JDBC/ODBC requirements to connect to Drill?

2016-07-06 Thread rahul challapalli
Few Answers inline On Tue, Jul 5, 2016 at 3:40 AM, Juan Diego Ruiz Perea wrote: > Hello, > > We want to test connecting Oracle BI (OBI) to Apache Drill. We saw the > JDBC/ODBC drivers option and have the following questions: > >- Do you know if someone has already tested connecting OBI to Ap

Re: Help with the Optimizer of Apache Drill

2016-07-05 Thread rahul challapalli
For a start, below is the relevant piece from the documentation [1]. You can also prepend any query with "explain plan for" to view the exact plan generated by Drill's Optimizer. • Optimizer: Drill uses various standard database optimizations such as rule based/cost based, as well as data locality

Re: Initial Feed Back on 1.7.0 Release

2016-07-05 Thread rahul challapalli
John, Once you add/update data in one of your sub-folders, the immediate next query should update the metadata cache automatically and all subsequent queries should fetch metadata from the cache. If this is not the case, its a bug. Can you confirm your findings? - Rahul On Tue, Jul 5, 2016 at 9:

Re: CHAR data type

2016-07-04 Thread rahul challapalli
Can you point us to where you are looking? The documentation should only say that "CHAR" datatype in hive is supported from Drill 1.7 onward. - Rahul On Mon, Jul 4, 2016 at 9:53 AM, Santosh Kulkarni < santoshskulkarn...@gmail.com> wrote: > Thanks Shankar. I was looking in Drill documentation but

Re: Querying Parquet: Filtering on a sorted column

2016-07-01 Thread rahul challapalli
This is something which is not currently supported. The "parquet filter pushdown" feature should be able to achieve this. Its still under development. - Rahul On Fri, Jul 1, 2016 at 12:10 PM, Dan Wild wrote: > Hi, > > I'm attempting to query a directory of parquet files that are partitioned > o

Re: Drill with mapreduce

2016-06-28 Thread rahul challapalli
This looks like a bug in the JDBC driver packaging. Can you raise a JIRA for the same? On Tue, Jun 28, 2016 at 9:10 PM, GameboyNO1 <7304...@qq.com> wrote: > Hi, > I'm trying to use drill with mapreduce. > Details are: > I put a list of drill queries in a file as mapper's input, some to query > hb

Re: Is this normal view behavior?

2016-06-23 Thread rahul challapalli
p_day from dfs.tmp.myview; > ++ > | p_day | > ++ > | 1990 | > > > select dir0 from dfs.tmp.myview; > > Error: VALIDATION ERROR: From line 1, column 8 to line 1, column 11: > Column 'dir0' not found in any table > > > >

Re: Is this normal view behavior?

2016-06-23 Thread rahul challapalli
This looks like a bug. If you renamed the dir0 column as p_day, then you should see that in sqlline as well. And I have never seen "_DEFAULT_COL_TO_READ_" before. Can you file a jira? - Rahul On Thu, Jun 23, 2016 at 12:33 PM, John Omernik wrote: > I have a table that is a directory of parquet f

Re: Apache Drill vs PrestoDB

2016-06-08 Thread rahul challapalli
The post on quora gives a good overview. It would be helpful if you can provide some insights into what you are trying to achieve. Few questions to that end 1. Who will be the users of your application 2. Where does your data live and in what format 3. What is the scale of data you want to t

Re: HiveMetastore HA with Drill

2016-06-02 Thread rahul challapalli
Not sure if our hive storage plugin supports this feature. Even if the feature is available, we haven't tested it. - Rahul On Tue, May 31, 2016 at 12:01 PM, Veera Naranammalpuram < vnaranammalpu...@maprtech.com> wrote: > Anyone has any insights into how the Hive storage plug-in can handle Hive >

Re: Error with flatten function on MongoDB documents that contain array of key-value pairs

2016-05-25 Thread rahul challapalli
Just to be sure, can you run the below query which does not contain flatten? If this query also fails, then it could be bad data in "Pnl" column ( may be an empty string?) SELECT x.DateValueCollection FROM `mongo`.`db_name`.` some.random.collection.name` AS x; On Wed, May 25, 2016 at 10:32 AM,

Re: [ANNOUNCE] New PMC Chair of Apache Drill

2016-05-25 Thread rahul challapalli
Congratulations Parth! Thank You Jacques for your leadership over the last few years. On Wed, May 25, 2016 at 10:26 AM, Gautam Parai wrote: > Congratulations Parth! > > On Wed, May 25, 2016 at 9:02 AM, Jinfeng Ni wrote: > > > Big congratulations, Parth! > > > > Thank you, Jacques, for your con

Re: File size limit for CTAS?

2016-01-21 Thread rahul challapalli
Ignoring the CTAS part can you try running the select query and see if it completes. My suspicion is that some record/field in your large file is causing drill to break. Also it would be helpful if you can give more information from the drillbit.log when this error happens (Search for da53d687-a8d5

Re: Efficient joins in Drill - avoiding the massive overhead of scan based joins

2016-01-17 Thread rahul challapalli
16 at 12:11 PM, Stefán Baxter > > wrote: > > > Hi Jacques, > > > > Thank you for taking the time, it's appreciated. > > > > I'm trying to contribute to the Lucene reader for Drill (Started by Rahul > > Challapalli). We would like to use it for st

Re: Lucene Plugin :: Join Filter and pushdown

2016-01-14 Thread rahul challapalli
Use Case : In the case of a left join between a non-index table and a lucene index, it is more efficient to read the join keys from the non-index table and push them into the LuceneGroupScan. This way we can avoid reading the whole index. I was suggesting converting the plan for Q1 into a plan simi

Re: Classpath scanning & udfs

2016-01-12 Thread rahul challapalli
hub.com/apache/drill/blob/master/exec/java-exec/src/main/resources/drill-module.conf > > > > - Jason > > On Mon, Jan 11, 2016 at 11:24 AM, rahul challapalli < > challapallira...@gmail.com> wrote: > > > Sure! > > > > On Mon, Jan 11, 2016 at 11:0

Re: Classpath scanning & udfs

2016-01-11 Thread rahul challapalli
es to scan when > you add the empty file. > > Do you want to jump on slack to chat about this? > > - Jason > > On Mon, Jan 11, 2016 at 10:52 AM, rahul challapalli < > challapallira...@gmail.com> wrote: > > > Julien, > > > > I have an empty drill-mod

Re: Classpath scanning & udfs

2016-01-11 Thread rahul challapalli
ot of the udf jar > - add the package to drill.classpath.scanning.packages in the drill conf > (possibly using drill-override.conf) > > However if you are adding the drill-module.conf file to the jar, you might > as well add the package in it. (unless there's some other reason) > > On Mon, Jan 11, 2016 a

Re: Classpath scanning & udfs

2016-01-11 Thread rahul challapalli
rill-module.conf will not get scanned. > > > > On Mon, Jan 11, 2016 at 10:17 AM, rahul challapalli < > challapallira...@gmail.com> wrote: > > > Thanks for your reply Jason. > > > > If we cannot override the global configuration file, then for existing >

Re: Classpath scanning & udfs

2016-01-11 Thread rahul challapalli
an scanning all of the jars for all > paths given in the override file). > > On Fri, Jan 8, 2016 at 4:32 PM, rahul challapalli < > challapallira...@gmail.com> wrote: > > > Before 1.2, my udfs project contained an empty drill-override.conf file > and > > I used to

Classpath scanning & udfs

2016-01-08 Thread rahul challapalli
Before 1.2, my udfs project contained an empty drill-override.conf file and I used to update the drill-override.conf on all the drillbits to specify the package of my UDF. This is no longer working for me. I tried a few things and below is how my drill-override.conf file looks now drill.classpath.

Re: Announcing new committer: Kristine Hahn

2015-12-04 Thread rahul challapalli
Congratulations Kristine :) On Fri, Dec 4, 2015 at 9:43 AM, Abdel Hakim Deneche wrote: > Congrats Kristine :D > > On Fri, Dec 4, 2015 at 9:36 AM, Sudheesh Katkam > wrote: > > > Congratulations and welcome, Kris! > > > > > On Dec 4, 2015, at 9:19 AM, Jacques Nadeau wrote: > > > > > > The Apache

Re: Order of records read in a parquet file

2015-11-06 Thread rahul challapalli
toString. If we changed > the toString behavior, it could be a problem. Maybe do a ctas to a json > file to confirm. > > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > > On Fri, Nov 6, 2015 at 5:40 PM, rahul challapalli < > challapallira...@gmail.com> wrote: &g

Re: Order of records read in a parquet file

2015-11-06 Thread rahul challapalli
string > versus an actual data return? > > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > > On Fri, Nov 6, 2015 at 5:31 PM, rahul challapalli < > challapallira...@gmail.com> wrote: > > > Jason, > > > > You were partly correct. We are not dropping

Re: Order of records read in a parquet file

2015-11-06 Thread rahul challapalli
lineitem data set. I will raise a jira and I think this should be treated critical. Thoughts? - Rahul On Fri, Nov 6, 2015 at 4:30 PM, rahul challapalli < challapallira...@gmail.com> wrote: > Jason, > > I missed that. Let me check whether we are dropping any records. I would > b

Re: Order of records read in a parquet file

2015-11-06 Thread rahul challapalli
unit or > regression tests. > > Thanks, > Jason > > On Fri, Nov 6, 2015 at 4:13 PM, rahul challapalli < > challapallira...@gmail.com> wrote: > > > Thanks for your replies. The file is private and I will try to construct > a > > file without sensitive dat

Re: Order of records read in a parquet file

2015-11-06 Thread rahul challapalli
data out if you read the whole file, just > > in a different order? > > > > On Fri, Nov 6, 2015 at 3:31 PM, rahul challapalli < > > challapallira...@gmail.com> wrote: > > > >> parquet-meta command suggests that there is only one row group > >>

Re: Order of records read in a parquet file

2015-11-06 Thread rahul challapalli
parquet-meta command suggests that there is only one row group On Fri, Nov 6, 2015 at 3:23 PM, Jacques Nadeau wrote: > How many row groups? > > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > > On Fri, Nov 6, 2015 at 3:14 PM, rahul challapalli < > challa

Order of records read in a parquet file

2015-11-06 Thread rahul challapalli
Drillers, With the new parquet library update, can someone throw some light on the order in which the records are read from a single parquet file? With the older library, when I run the below query on a single parquet file, I used to get a set of records. Now after the parquet library update, I a

Semantics for boolean expressions in order by clause

2015-11-02 Thread rahul challapalli
Drillers, What are the semantics for the below query ? Should this syntax even be supported? select * from hive.dest2 order by key+1 = 497; - Rahul

Re: Externally created Parquet files and partition pruning

2015-10-21 Thread rahul challapalli
Chris, Its not just sufficient to specify which column is the partition column. The data should also be organized accordingly. Below is a high level description of how partition pruning works with parquet files 1. Use CTAS with partition by clause : Here drill creates a single (or more) file for

Re: How is dir0 inferred from the directory path

2015-10-20 Thread rahul challapalli
PM, Jacques Nadeau > wrote: > > > The first variable directory gets treated as a dirX starting point I > > believe. > > > > Doesn't seem like a bug to me. > > On Oct 19, 2015 9:56 AM, "rahul challapalli" > > > wrote: > > > > >

How is dir0 inferred from the directory path

2015-10-19 Thread rahul challapalli
Drillers, The below result suggests that 'dir0' is inferred treating '/drill/testdata/audits' as the root in the below query. Is this by design that the first '*' gets treated as dir0? select * from dfs.`/drill/testdata/audits/*/audit/*.json` limit 1; +++--

RE: CSV with windows carriage return causes issues

2015-09-30 Thread rahul challapalli
Looks like a bug to me. Can you raise a jira for this if you haven't done it already On Sep 30, 2015 8:04 AM, wrote: > I've seen that issue too... ;) > > My personal opinion is that Drill (and sqlline) should treat Windows > end-of-line characters the same as Unix end-of-line characters. It does

Re: :querying avro data stored in Hbase through drill

2015-09-29 Thread rahul challapalli
Once you serialized your avro data into hbase, then avro should no longer come into picture. Now your table is just a normal hbase table. You can refer to the below documentation on querying hbase tables https://drill.apache.org/docs/querying-hbase/ - Rahul On Tue, Sep 29, 2015 at 12:14 AM, Aman

Re: Making parquet data available to Tableau

2015-09-28 Thread rahul challapalli
Cheers — Chris > > > On 28 Sep 2015, at 16:50, rahul challapalli > wrote: > > > > Your observation is right. We need to create a view on top of any > > file/folder for it to be available in Tableau or any reporting tool. This > > makes sense with text and even j

Re: Making parquet data available to Tableau

2015-09-28 Thread rahul challapalli
Your observation is right. We need to create a view on top of any file/folder for it to be available in Tableau or any reporting tool. This makes sense with text and even json formats as drill does not know the data types for the fields until it executes the queries. With parquet however drill coul

Re: Regarding drill jdbc with big file

2015-08-28 Thread rahul challapalli
ocs/alter-system/ < > > https://drill.apache.org/docs/alter-system/> > > https://drill.apache.org/docs/configuration-options-introduction/ < > > https://drill.apache.org/docs/configuration-options-introduction/> > > > > —Andries > > > > > >

Re: Regarding drill jdbc with big file

2015-08-28 Thread rahul challapalli
Can you search for the error id in the logs and post the stack trace? It looks like an overflow bug to me. - Rahul On Aug 28, 2015 6:47 AM, "Kunal Ghosh" wrote: > Hi, > > I am new to apache drill. I have configured apache drill on machine with > centos. > > "DRILL_MAX_DIRECT_MEMORY" = 25g > "DR

No of files created by CTAS auto partition feature

2015-08-26 Thread rahul challapalli
Drillers, I executed the below query on TPCH SF100 with drill and it took ~2hrs to complete on a 2 node cluster. alter session set `planner.width.max_per_node` = 4; alter session set `planner.memory.max_query_memory_per_node` = 8147483648; create table lineitem partition by (l_shipdate, l_receipt

Re: Issue in using drill JDBC jar in Java code for Hive storage

2015-08-14 Thread rahul challapalli
I believe this has nothing to do with JDBC in particular. Your hive storage plugin info seems to be corrupted on your workstation. From the error message it looks like the drillbit itself failed to start. Can you backup "/tmp/drill/sys.storage_plugins/hive.sys.drill". Now delete the "/tmp/drill/sy

  1   2   >