This is an automated email from the ASF dual-hosted git repository. kunalkapoor pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git
The following commit(s) were added to refs/heads/master by this push: new c179195 [DOCUMENTATION] Document update for new configurations. c179195 is described below commit c179195f715132dc407347c828cd29ad4f697649 Author: manishnalla1994 <manish.nalla1...@gmail.com> AuthorDate: Tue Jul 2 11:29:19 2019 +0530 [DOCUMENTATION] Document update for new configurations. Added documentation for new configurations. This closes #3314 --- docs/configuration-parameters.md | 4 ++++ docs/ddl-of-carbondata.md | 2 +- docs/dml-of-carbondata.md | 6 ++++++ 3 files changed, 11 insertions(+), 1 deletion(-) diff --git a/docs/configuration-parameters.md b/docs/configuration-parameters.md index 7b31413..808d507 100644 --- a/docs/configuration-parameters.md +++ b/docs/configuration-parameters.md @@ -48,6 +48,7 @@ This section provides the details of all the configurations required for the Car | carbon.invisible.segments.preserve.count | 200 | CarbonData maintains each data load entry in tablestatus file. The entries from this file are not deleted for those segments that are compacted or dropped, but are made invisible. If the number of data loads are very high, the size and number of entries in tablestatus file can become too many causing unnecessary reading of all data. This configuration specifies the number of segment entries to be maintained afte they are compacted or dro [...] | carbon.lock.retries | 3 | CarbonData ensures consistency of operations by blocking certain operations from running in parallel. In order to block the operations from running in parallel, lock is obtained on the table. This configuration specifies the maximum number of retries to obtain the lock for any operations other than load. **NOTE:** Data manupulation operations like Compaction,UPDATE,DELETE or LOADING,UPDATE,DELETE are not allowed to run in parallel. How ever data loading can h [...] | carbon.lock.retry.timeout.sec | 5 | Specifies the interval between the retries to obtain the lock for any operation other than load. **NOTE:** Refer to ***carbon.lock.retries*** for understanding why CarbonData uses locks for operations. | +| carbon.fs.custom.file.provider | None | To support FileTypeInterface for configuring custom CarbonFile implementation to work with custom FileSystem. | ## Data Loading Configuration @@ -93,6 +94,8 @@ This section provides the details of all the configurations required for the Car | carbon.options.serialization.null.format | \N | Based on the business scenarios, some columns might need to be loaded with null values. As null value cannot be written in csv files, some special characters might be adopted to specify null values. This configuration can be used to specify the null values format in the data being loaded. | | carbon.column.compressor | snappy | CarbonData will compress the column values using the compressor specified by this configuration. Currently CarbonData supports 'snappy', 'zstd' and 'gzip' compressors. | | carbon.minmax.allowed.byte.count | 200 | CarbonData will write the min max values for string/varchar types column using the byte count specified by this configuration. Max value is 1000 bytes(500 characters) and Min value is 10 bytes(5 characters). **NOTE:** This property is useful for reducing the store size thereby improving the query performance but can lead to query degradation if value is not configured properly. | | +| carbon.merge.index.failure.throw.exception | true | It is used to configure whether or not merge index failure should result in data load failure also. | +| carbon.binary.decoder | None | Support configurable decode for loading. Two decoders supported: base64 and hex | ## Compaction Configuration @@ -112,6 +115,7 @@ This section provides the details of all the configurations required for the Car | carbon.concurrent.compaction | true | Compaction of different tables can be executed concurrently. This configuration determines whether to compact all qualifying tables in parallel or not. **NOTE: **Compacting concurrently is a resource demanding operation and needs more resources there by affecting the query performance also. This configuration is **deprecated** and might be removed in future releases. | | carbon.compaction.prefetch.enable | false | Compaction operation is similar to Query + data load where in data from qualifying segments are queried and data loading performed to generate a new single segment. This configuration determines whether to query ahead data from segments and feed it for data loading. **NOTE: **This configuration is disabled by default as it needs extra resources for querying extra data. Based on the memory availability on the cluster, user can enable it to imp [...] | carbon.merge.index.in.segment | true | Each CarbonData file has a companion CarbonIndex file which maintains the metadata about the data. These CarbonIndex files are read and loaded into driver and is used subsequently for pruning of data during queries. These CarbonIndex files are very small in size(few KB) and are many. Reading many small files from HDFS is not efficient and leads to slow IO performance. Hence these CarbonIndex files belonging to a segment can be combined into a sin [...] +| carbon.enable.range.compaction | true | To configure Ranges-based Compaction to be used or not for RANGE_COLUMN. If true after compaction also the data would be present in ranges. | ## Query Configuration diff --git a/docs/ddl-of-carbondata.md b/docs/ddl-of-carbondata.md index 2495bf6..7ab0e5f 100644 --- a/docs/ddl-of-carbondata.md +++ b/docs/ddl-of-carbondata.md @@ -165,7 +165,7 @@ CarbonData DDL statements are documented here,which includes: | Properties | Default value | Description | | ---------- | ------------- | ----------- | - | carbon.local.dictionary.enable | false | By default, Local Dictionary will be disabled for the carbondata table. | + | carbon.local.dictionary.enable | true | By default, Local Dictionary will be enabled for the carbondata table. | | carbon.local.dictionary.decoder.fallback | true | Page Level data will not be maintained for the blocklet. During fallback, actual data will be retrieved from the encoded page data using local dictionary. **NOTE:** Memory footprint decreases significantly as compared to when this property is set to false | Local Dictionary can be configured using the following properties during create table command: diff --git a/docs/dml-of-carbondata.md b/docs/dml-of-carbondata.md index 3e2a22d..84c629c 100644 --- a/docs/dml-of-carbondata.md +++ b/docs/dml-of-carbondata.md @@ -70,6 +70,7 @@ CarbonData DML statements are documented here,which includes: | [IS_EMPTY_DATA_BAD_RECORD](#bad-records-handling) | Whether empty data of a column to be considered as bad record or not | | [GLOBAL_SORT_PARTITIONS](#global_sort_partitions) | Number of partition to use for shuffling of data during sorting | | [SCALE_FACTOR](#scale_factor) | Control the partition size for RANGE_COLUMN feature | +| [CARBON_OPTIONS_BINARY_DECODER] | Support configurable decode for loading from csv | - You can use the following options to load data: @@ -307,6 +308,11 @@ CarbonData DML statements are documented here,which includes: * If both GLOBAL_SORT_PARTITIONS and SCALE_FACTOR are used at the same time, only GLOBAL_SORT_PARTITIONS is valid. * The compaction on RANGE_COLUMN will use LOCAL_SORT by default. + - ##### CARBON_ENABLE_RANGE_COMPACTION + + To configure Ranges-based Compaction to be used or not for RANGE_COLUMN. + The default value is 'true'. + ### INSERT DATA INTO CARBONDATA TABLE This command inserts data into a CarbonData table, it is defined as a combination of two queries Insert and Select query respectively.