Repository: drill Updated Branches: refs/heads/gh-pages 2672c3669 -> c8a79a519
DRILL-2720 Project: http://git-wip-us.apache.org/repos/asf/drill/repo Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/c8a79a51 Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/c8a79a51 Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/c8a79a51 Branch: refs/heads/gh-pages Commit: c8a79a5193155b7081e16a6b807db0d8f3746820 Parents: 2672c36 Author: Kristine Hahn <kh...@maprtech.com> Authored: Wed Apr 8 15:51:59 2015 -0700 Committer: Bridget Bevens <bbev...@maprtech.com> Committed: Mon Apr 13 13:16:37 2015 -0700 ---------------------------------------------------------------------- _docs/connect/009-mapr-db-plugin.md | 2 +- _docs/manage/conf/001-mem-alloc.md | 88 ++++++++++---------------- _docs/sql-ref/004-functions.md | 2 +- _docs/sql-ref/functions/001-math.md | 2 +- _docs/sql-ref/functions/002-conversion.md | 14 ++-- 5 files changed, 43 insertions(+), 65 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/drill/blob/c8a79a51/_docs/connect/009-mapr-db-plugin.md ---------------------------------------------------------------------- diff --git a/_docs/connect/009-mapr-db-plugin.md b/_docs/connect/009-mapr-db-plugin.md index bc06144..66d2a81 100644 --- a/_docs/connect/009-mapr-db-plugin.md +++ b/_docs/connect/009-mapr-db-plugin.md @@ -2,7 +2,7 @@ title: "MapR-DB Format" parent: "Connect to a Data Source" --- -Drill includes a `maprdb` format plugin for handling MapR-DB and HBase data. The Drill Sandbox also includes the following `maprdb` format plugin on a MapR node: +Drill includes a `maprdb` format plugin for accessing data stored in MapR-DB. The Drill Sandbox also includes the following `maprdb` format plugin on a MapR node: { "type": "hbase", http://git-wip-us.apache.org/repos/asf/drill/blob/c8a79a51/_docs/manage/conf/001-mem-alloc.md ---------------------------------------------------------------------- diff --git a/_docs/manage/conf/001-mem-alloc.md b/_docs/manage/conf/001-mem-alloc.md index 5d99015..df60b7f 100644 --- a/_docs/manage/conf/001-mem-alloc.md +++ b/_docs/manage/conf/001-mem-alloc.md @@ -2,7 +2,7 @@ title: "Overview" parent: "Configuration Options" --- -The sys.options table in Drill contains information about boot and system options described in the following tables. You configure some of the options to tune performance. You can configure the options using the ALTER SESSION or ALTER SYSTEM command. +The sys.options table in Drill contains information about boot and system options listed in the following tables. To tune performance, you adjust some of the options to suit your application. Configure the options using the ALTER SESSION or ALTER SYSTEM command. ## Boot Options @@ -10,7 +10,7 @@ The sys.options table in Drill contains information about boot and system option <tr> <th>Name</th> <th>Default</th> - <th>Description</th> + <th>Comments</th> </tr> <tr> <td>drill.exec.buffer.impl</td> @@ -128,9 +128,9 @@ The sys.options table in Drill contains information about boot and system option <table> <tr> - <th>name</th> + <th>Name</th> <th>Default</th> - <th>Description</th> + <th>Comments</th> </tr> <tr> <td>drill.exec.functions.cast_empty_string_to_null</td> @@ -140,12 +140,7 @@ The sys.options table in Drill contains information about boot and system option <tr> <td>drill.exec.storage.file.partition.column.label</td> <td>dir</td> - <td></td> - </tr> - <tr> - <td>drill.exec.testing.exception-injections</td> - <td></td> - <td></td> + <td>Accepts a string input.</td> </tr> <tr> <td>exec.errors.verbose</td> @@ -155,27 +150,27 @@ The sys.options table in Drill contains information about boot and system option <tr> <td>exec.java_compiler</td> <td>DEFAULT</td> - <td></td> + <td>Switches between DEFAULT, JDK, and JANINO mode for the current session. Uses Janino by default for generated source code of less than exec.java_compiler_janino_maxsize; otherwise, switches to the JDK compiler.</td> </tr> <tr> <td>exec.java_compiler_debug</td> <td>TRUE</td> - <td></td> + <td>Toggles the output of debug-level compiler error messages in runtime generated code.</td> </tr> <tr> <td>exec.java_compiler_janino_maxsize</td> <td>262144</td> - <td></td> + <td>See the exec.java_compiler option comment. Accepts inputs of type LONG.</td> </tr> <tr> <td>exec.max_hash_table_size</td> <td>1073741824</td> - <td>Starting size for hash tables. Increase according to available memory to improve performance.</td> + <td>Ending size for hash tables. Range: 0 - 1073741824</td> </tr> <tr> <td>exec.min_hash_table_size</td> <td>65536</td> - <td></td> + <td>Starting size for hash tables. Increase according to available memory to improve performance. Range: 0 - 1073741824</td> </tr> <tr> <td>exec.queue.enable</td> @@ -185,27 +180,22 @@ The sys.options table in Drill contains information about boot and system option <tr> <td>exec.queue.large</td> <td>10</td> - <td></td> + <td>Range: 0-1000</td> </tr> <tr> <td>exec.queue.small</td> <td>100</td> - <td></td> + <td>Range: 0-1001</td> </tr> <tr> <td>exec.queue.threshold</td> <td>30000000</td> - <td></td> + <td>Range: 0-9223372036854775807</td> </tr> <tr> <td>exec.queue.timeout_millis</td> <td>300000</td> - <td></td> - </tr> - <tr> - <td>org.apache.drill.exec.compile.ClassTransformer.scalar_replacement</td> - <td>try</td> - <td></td> + <td>Range: 0-9223372036854775807</td> </tr> <tr> <td>planner.add_producer_consumer</td> @@ -215,7 +205,7 @@ The sys.options table in Drill contains information about boot and system option <tr> <td>planner.affinity_factor</td> <td>1.2</td> - <td></td> + <td>Accepts inputs of type DOUBLE.</td> </tr> <tr> <td>planner.broadcast_factor</td> @@ -225,22 +215,22 @@ The sys.options table in Drill contains information about boot and system option <tr> <td>planner.broadcast_threshold</td> <td>10000000</td> - <td></td> + <td>Threshold in number of rows that triggers a broadcast join for a query if the right side of the join contains fewer rows than the threshold. Avoids broadcasting too many rows to join. Range: 0-2147483647</td> </tr> <tr> <td>planner.disable_exchanges</td> <td>FALSE</td> - <td></td> + <td>Toggles the state of hashing to a random exchange.</td> </tr> <tr> <td>planner.enable_broadcast_join</td> <td>TRUE</td> - <td></td> + <td>Changes the state of aggregation and join operators. Do not disable.</td> </tr> <tr> <td>planner.enable_demux_exchange</td> <td>FALSE</td> - <td></td> + <td>Toggles the state of hashing to a demulitplexed exchange.</td> </tr> <tr> <td>planner.enable_hash_single_key</td> @@ -250,12 +240,12 @@ The sys.options table in Drill contains information about boot and system option <tr> <td>planner.enable_hashagg</td> <td>TRUE</td> - <td></td> + <td>Enable hash aggregation; otherwise, Drill does a sort-based aggregation. Does not write to disk. Enable is recommended.</td> </tr> <tr> <td>planner.enable_hashjoin</td> <td>TRUE</td> - <td></td> + <td>Enable the memory hungry hash join. Does not write to disk.</td> </tr> <tr> <td>planner.enable_hashjoin_swap</td> @@ -265,7 +255,7 @@ The sys.options table in Drill contains information about boot and system option <tr> <td>planner.enable_mergejoin</td> <td>TRUE</td> - <td></td> + <td>Sort-based operation. Writes to disk.</td> </tr> <tr> <td>planner.enable_multiphase_agg</td> @@ -275,12 +265,12 @@ The sys.options table in Drill contains information about boot and system option <tr> <td>planner.enable_mux_exchange</td> <td>TRUE</td> - <td></td> + <td>Toggles the state of hashing to a multiplexed exchange.</td> </tr> <tr> <td>planner.enable_streamagg</td> <td>TRUE</td> - <td></td> + <td>Sort-based operation. Writes to disk.</td> </tr> <tr> <td>planner.identifier_max_length</td> @@ -325,7 +315,7 @@ The sys.options table in Drill contains information about boot and system option <tr> <td>planner.memory.non_blocking_operators_memory</td> <td>64</td> - <td></td> + <td>Range: 0-2048</td> </tr> <tr> <td>planner.partitioner_sender_max_threads</td> @@ -345,27 +335,27 @@ The sys.options table in Drill contains information about boot and system option <tr> <td>planner.producer_consumer_queue_size</td> <td>10</td> - <td></td> + <td>How much data to prefetch from disk (in record batches) out of band of query execution</td> </tr> <tr> <td>planner.slice_target</td> <td>100000</td> - <td></td> + <td>The number of records manipulated within a fragment before Drill parallelizes operations.</td> </tr> <tr> <td>planner.width.max_per_node</td> <td>3</td> - <td></td> + <td>The maximum degree of distribution of a query across cores and cluster nodes.</td> </tr> <tr> <td>planner.width.max_per_query</td> <td>1000</td> - <td></td> + <td>Same as planner but applies to the query as executed by the entire cluster.</td> </tr> <tr> <td>store.format</td> <td>parquet</td> - <td></td> + <td>Output format for data written to tables with the CREATE TABLE AS (CTAS) command. Allowed values are parquet, json, or text. Allowed values: 0, -1, 1000000</td> </tr> <tr> <td>store.json.all_text_mode</td> @@ -375,17 +365,17 @@ The sys.options table in Drill contains information about boot and system option <tr> <td>store.mongo.all_text_mode</td> <td>FALSE</td> - <td></td> + <td>Similar to store.json.all_text_mode for MongoDB.</td> </tr> <tr> <td>store.parquet.block-size</td> <td>536870912</td> - <td></td> + <td>Sets the size of a Parquet row group to the number of bytes less than or equal to the block size of MFS, HDFS, or the file system.</td> </tr> <tr> <td>store.parquet.compression</td> <td>snappy</td> - <td></td> + <td>Compression type for storing Parquet output. Allowed values: snappy, gzip, none</td> </tr> <tr> <td>store.parquet.enable_dictionary_encoding</td> @@ -398,22 +388,14 @@ The sys.options table in Drill contains information about boot and system option <td></td> </tr> <tr> - <td>store.parquet.vector_fill_check_threshold</td> - <td>10</td> - <td></td> - </tr> - <tr> - <td>store.parquet.vector_fill_threshold</td> - <td>85</td> - <td></td> - </tr> - <tr> <td>window.enable</td> <td>FALSE</td> <td></td> </tr> </table> +## Memory Allocation + You can configure the amount of direct memory allocated to a Drillbit for query processing. The default limit is 8G, but Drill prefers 16G or more depending on the workload. The total amount of direct memory that a Drillbit http://git-wip-us.apache.org/repos/asf/drill/blob/c8a79a51/_docs/sql-ref/004-functions.md ---------------------------------------------------------------------- diff --git a/_docs/sql-ref/004-functions.md b/_docs/sql-ref/004-functions.md index a076920..2f1ee0b 100644 --- a/_docs/sql-ref/004-functions.md +++ b/_docs/sql-ref/004-functions.md @@ -12,4 +12,4 @@ You can use the following types of functions in your Drill queries: * [Nested Data](/docs/nested-data-functions/) * [Functions for Handling Nulls](/docs/functions-for-handling-nulls) - +You need to use a FROM clause in Drill queries. Examples in this documentation often use `FROM sys.version` in the query for example purposes. http://git-wip-us.apache.org/repos/asf/drill/blob/c8a79a51/_docs/sql-ref/functions/001-math.md ---------------------------------------------------------------------- diff --git a/_docs/sql-ref/functions/001-math.md b/_docs/sql-ref/functions/001-math.md index 718998a..4695e32 100644 --- a/_docs/sql-ref/functions/001-math.md +++ b/_docs/sql-ref/functions/001-math.md @@ -158,7 +158,7 @@ Exceptions are the LSHIFT and RSHIFT functions, which take all types except the Examples in this section use the `input2.json` file. Download the `input2.json` file from the [Drill source code](https://github.com/apache/drill/tree/master/exec/java-exec/src/test/resources/jsoninput) page. -You need to use a FROM clause in Drill queries. This document often uses the sys.version table in the FROM clause of the query for example purposes. +You need to use a FROM clause in Drill queries. In addition to using `input2.json`, examples in this documentation often use `FROM sys.version` in the query for example purposes. #### ABS Example Get the absolute value of the integer key in `input2.json`. The following snippet of input2.json shows the relevant integer content: http://git-wip-us.apache.org/repos/asf/drill/blob/c8a79a51/_docs/sql-ref/functions/002-conversion.md ---------------------------------------------------------------------- diff --git a/_docs/sql-ref/functions/002-conversion.md b/_docs/sql-ref/functions/002-conversion.md index 875de69..780b397 100644 --- a/_docs/sql-ref/functions/002-conversion.md +++ b/_docs/sql-ref/functions/002-conversion.md @@ -10,7 +10,7 @@ Drill supports the following functions for casting and converting data types: ## CAST -The CAST function converts an entity having a single data value, such as a column name, from one type to another. +The CAST function converts an expression from one type to another. ### Syntax @@ -18,7 +18,7 @@ The CAST function converts an entity having a single data value, such as a colum *expression* -An entity that evaluates to one or more values, such as a column name or literal +A combination of one or more values, operators, and SQL functions that evaluate to a value *data type* @@ -381,13 +381,9 @@ Currently Drill does not support conversion of a date, time, or timestamp from o +------------+ 1 row selected (1.199 seconds) -2. Configure the default time zone format in the drill-override.conf. For example: +2. Configure the default time zone format in <drill installation directory>/conf/drill-env.sh by adding `-Duser.timezone=UTC` to DRILL_JAVA_OPTS. For example: - drill.exec: { - cluster-id: âxyz", - zk.connect: âabc:5181", - user.timezone: "UTC" - } + export DRILL_JAVA_OPTS="-Xms1G -Xmx$DRILL_MAX_HEAP -XX:MaxDirectMemorySize=$DRILL_MAX_DIRECT_MEMORY -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=1G -ea -Duser.timezone=UTC" 3. Restart sqlline. @@ -416,7 +412,7 @@ TO_NUMBER(text, format)| numeric TO_TIMESTAMP(text, format)| timestamp TO_TIMESTAMP(double precision)| timestamp -Use the âzâ option to identify the time zone in TO_TIMESTAMP to make sure the timestamp has the timezone in it. Also, use the âzâ option to identify the time zone in a timestamp using the TO_CHAR function. For example: +You can use the âzâ option to identify the time zone in TO_TIMESTAMP to make sure the timestamp has the timezone in it. Also, use the âzâ option to identify the time zone in a timestamp using the TO_CHAR function. For example: SELECT TO_TIMESTAMP('2015-03-30 20:49:59.0 UTC', 'YYYY-MM-dd HH:mm:ss.s z') AS Original, TO_CHAR(TO_TIMESTAMP('2015-03-30 20:49:59.0 UTC', 'YYYY-MM-dd HH:mm:ss.s z'), 'z') AS TimeZone