Re: Aggregation OutOfMemoryException

2016-03-11 Thread Abdel Hakim Deneche
Disabling hash aggregation will default to streaming aggregation + sort. This will allow you to handle larger data and spill to disk if necessary. Like stated in the documentation, starting from Drill 1.5 the default memory limit of sort may not be enough to process large data, but you can bump it

[GitHub] drill pull request: DRILL-4411: hash join should limit batch based...

2016-03-11 Thread adeneche
Github user adeneche commented on a diff in the pull request: https://github.com/apache/drill/pull/381#discussion_r55917078 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java --- @@ -532,6 +542,9 @@ public void close() { if

[GitHub] drill pull request: DRILL-4411: hash join should limit batch based...

2016-03-11 Thread minji-kim
Github user minji-kim commented on the pull request: https://github.com/apache/drill/pull/381#issuecomment-195647307 I made it such that it doesn't adjust the batch size once, but keep the minimum batch size (in terms of number of records) to be at least 1. --- If your project is set

[GitHub] drill pull request: DRILL-4455: Depend on Apache Arrow

2016-03-11 Thread jaltekruse
Github user jaltekruse commented on the pull request: https://github.com/apache/drill/pull/398#issuecomment-195623864 @StevenMPhillips Can you separate out the package renames into their own commit? It would be useful to see any meaningful code changes isolated. The diff is so big tha

[GitHub] drill pull request: Add Sudheesh to Drill Committers' List

2016-03-11 Thread sudheeshkatkam
Github user sudheeshkatkam closed the pull request at: https://github.com/apache/drill/pull/268 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] drill pull request: DRILL-4499: Remove 17 unused classes

2016-03-11 Thread sudheeshkatkam
GitHub user sudheeshkatkam opened a pull request: https://github.com/apache/drill/pull/426 DRILL-4499: Remove 17 unused classes You can merge this pull request into a Git repository by running: $ git pull https://github.com/sudheeshkatkam/drill DRILL-4499 Alternatively you ca

[jira] [Created] (DRILL-4500) Remove unused classes

2016-03-11 Thread Sudheesh Katkam (JIRA)
Sudheesh Katkam created DRILL-4500: -- Summary: Remove unused classes Key: DRILL-4500 URL: https://issues.apache.org/jira/browse/DRILL-4500 Project: Apache Drill Issue Type: Task Com

[jira] [Created] (DRILL-4499) Remove unused classes

2016-03-11 Thread Sudheesh Katkam (JIRA)
Sudheesh Katkam created DRILL-4499: -- Summary: Remove unused classes Key: DRILL-4499 URL: https://issues.apache.org/jira/browse/DRILL-4499 Project: Apache Drill Issue Type: Task R

Re: Aggregation OutOfMemoryException

2016-03-11 Thread John Omernik
I've had some luck disabling multi-phase aggregations on some queries where memory was an issue. https://drill.apache.org/docs/guidelines-for-optimizing-aggregation/ After I try that, than I typically look at the hash aggregation as you have done: https://drill.apache.org/docs/sort-based-and-has

Re: [VOTE] Release Apache Drill 1.6.0 - rc0

2016-03-11 Thread Parth Chandra
Amending for Daylight Savings Time The vote will be open for the next ~72 hours ending at *8*:10 AM PDT, March 14, 2016. Parth On Fri, Mar 11, 2016 at 7:09 AM, Parth Chandra wrote: > Hello all, > > I'd like to propose the zeroth release candidate (rc0) of Apache Drill, > version 1.6.0. > It cov

Aggregation OutOfMemoryException

2016-03-11 Thread François Méthot
Hi, Using version 1.5, DirectMemory is currently set at 32GB, heap is at 8GB. We have been trying to perform multiple aggregation in one query (see below) on 40 Billions+ rows stored on 13 nodes. We are using parquet format. We keep getting OutOfMemoryException: Failure allocating buffer.. on

[GitHub] drill pull request: DRILL-1328: Support table statistics

2016-03-11 Thread vkorukanti
GitHub user vkorukanti opened a pull request: https://github.com/apache/drill/pull/425 DRILL-1328: Support table statistics Patch attached to the JIRA is seems to be useful for generating table stats and using them for query planning. I rebased the patch to latest master, fixed few

[jira] [Created] (DRILL-4498) Projecting a map key within an array produces incorrect results

2016-03-11 Thread Jiang Wu (JIRA)
Jiang Wu created DRILL-4498: --- Summary: Projecting a map key within an array produces incorrect results Key: DRILL-4498 URL: https://issues.apache.org/jira/browse/DRILL-4498 Project: Apache Drill I

Re: Apache Drill Send Query Through Rest API

2016-03-11 Thread John Omernik
For a simple "use" query, what I have done is enable authentication in Drill via https://drill.apache.org/docs/configuring-user-authentication/. With that, you can now create sessions in the REST API. Since you mention HTTP Requests, basically, you create a requests.session() object, authenticate,

[VOTE] Release Apache Drill 1.6.0 - rc0

2016-03-11 Thread Parth Chandra
Hello all, I'd like to propose the zeroth release candidate (rc0) of Apache Drill, version 1.6.0. It covers a total of 44 resolved JIRAs [1]. Thanks to everyone who contributed to this release. The tarball artifacts are hosted at [2] and the maven artifacts are hosted at [3]. This release candid

[jira] [Created] (DRILL-4497) Casting strings with leading/trailing spaces to integers does not work

2016-03-11 Thread Ian Hellstrom (JIRA)
Ian Hellstrom created DRILL-4497: Summary: Casting strings with leading/trailing spaces to integers does not work Key: DRILL-4497 URL: https://issues.apache.org/jira/browse/DRILL-4497 Project: Apache

Apache Drill Send Query Through Rest API

2016-03-11 Thread Muhammad Fahreza
Hi All, first of all, thanks for make Apache Drill available to us, it is a really great software in my honest opinion. Since I was new to develop Drill, *I was wondering how to execute two Queries at once via Rest API.* 1. First Business Case For example, I have Postgresql *Storage *named *p

[jira] [Created] (DRILL-4496) Improve ParquetGroupScan.initFromMetadataCache() to read subfolders' cache files in parallel

2016-03-11 Thread Deneche A. Hakim (JIRA)
Deneche A. Hakim created DRILL-4496: --- Summary: Improve ParquetGroupScan.initFromMetadataCache() to read subfolders' cache files in parallel Key: DRILL-4496 URL: https://issues.apache.org/jira/browse/DRILL-4496