[39/50] [abbrv] carbondata git commit: [Documentation] Editorial review comment fixed

jackylk Tue, 07 Aug 2018 06:36:27 -0700

[Documentation] Editorial review comment fixed

Minor issues fixed (spelling, syntax, and missing info)


This closes #2603


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/12725b75
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/12725b75
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/12725b75

Branch: refs/heads/external-format
Commit: 12725b75c7133971cc8a29d343def55ebd273c85
Parents: 9336924
Author: sgururajshetty <sgururajshe...@gmail.com>
Authored: Thu Aug 2 19:57:31 2018 +0530
Committer: kunal642 <kunalkapoor...@gmail.com>
Committed: Fri Aug 3 18:50:23 2018 +0530

----------------------------------------------------------------------
 docs/configuration-parameters.md          |  2 +-
 docs/data-management-on-carbondata.md     | 39 ++++++++++++++------------
 docs/datamap/bloomfilter-datamap-guide.md | 12 ++++----
 docs/datamap/lucene-datamap-guide.md      |  2 +-
 docs/datamap/timeseries-datamap-guide.md  |  2 +-
 docs/sdk-guide.md                         |  8 +++---
 6 files changed, 34 insertions(+), 31 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/carbondata/blob/12725b75/docs/configuration-parameters.md
----------------------------------------------------------------------
diff --git a/docs/configuration-parameters.md b/docs/configuration-parameters.md
index 6e4dea5..77cf230 100644
--- a/docs/configuration-parameters.md
+++ b/docs/configuration-parameters.md
@@ -140,7 +140,7 @@ This section provides the details of all the configurations 
required for CarbonD
 | carbon.enableMinMax | true | Min max is feature added to enhance query 
performance. To disable this feature, set it false. |
 | carbon.dynamicallocation.schedulertimeout | 5 | Specifies the maximum time 
(unit in seconds) the scheduler can wait for executor to be active. Minimum 
value is 5 sec and maximum value is 15 sec. |
 | carbon.scheduler.minregisteredresourcesratio | 0.8 | Specifies the minimum 
resource (executor) ratio needed for starting the block distribution. The 
default value is 0.8, which indicates 80% of the requested resource is 
allocated for starting block distribution.  The minimum value is 0.1 min and 
the maximum value is 1.0. | 
-| carbon.search.enabled | false | If set to true, it will use CarbonReader to 
do distributed scan directly instead of using compute framework like spark, 
thus avoiding limitation of compute framework like SQL optimizer and task 
scheduling overhead. |
+| carbon.search.enabled (Alpha Feature) | false | If set to true, it will use 
CarbonReader to do distributed scan directly instead of using compute framework 
like spark, thus avoiding limitation of compute framework like SQL optimizer 
and task scheduling overhead. |
 
 * **Global Dictionary Configurations**
   

http://git-wip-us.apache.org/repos/asf/carbondata/blob/12725b75/docs/data-management-on-carbondata.md
----------------------------------------------------------------------
diff --git a/docs/data-management-on-carbondata.md 
b/docs/data-management-on-carbondata.md
index 836fff9..41fd513 100644
--- a/docs/data-management-on-carbondata.md
+++ b/docs/data-management-on-carbondata.md
@@ -87,6 +87,25 @@ This tutorial is going to introduce all commands and data 
operations on CarbonDa
      * BATCH_SORT: It increases the load performance but decreases the query 
performance if identified blocks > parallelism.
      * GLOBAL_SORT: It increases the query performance, especially high 
concurrent point query.
        And if you care about loading resources isolation strictly, because the 
system uses the spark GroupBy to sort data, the resource can be controlled by 
spark. 
+        
+       ### Example:
+
+   ```
+    CREATE TABLE IF NOT EXISTS productSchema.productSalesTable (
+                                   productNumber INT,
+                                   productName STRING,
+                                   storeCity STRING,
+                                   storeProvince STRING,
+                                   productCategory STRING,
+                                   productBatch STRING,
+                                   saleQuantity INT,
+                                   revenue INT)
+    STORED BY 'carbondata'
+    TBLPROPERTIES ('SORT_COLUMNS'='productName,storeCity',
+                   'SORT_SCOPE'='NO_SORT')
+   ```
+   
+   **NOTE:** CarbonData also supports "using carbondata". Find example code at 
[SparkSessionExample](https://github.com/apache/carbondata/blob/master/examples/spark2/src/main/scala/org/apache/carbondata/examples/SparkSessionExample.scala)
 in the CarbonData repo.
  
    - **Table Block Size Configuration**
 
@@ -170,23 +189,6 @@ This tutorial is going to introduce all commands and data 
operations on CarbonDa
      
TBLPROPERTIES('LOCAL_DICTIONARY_ENABLE'='true','LOCAL_DICTIONARY_THRESHOLD'='1000',
      'LOCAL_DICTIONARY_INCLUDE'='column1','LOCAL_DICTIONARY_EXCLUDE'='column2')
    ```
-### Example:
-
-   ```
-    CREATE TABLE IF NOT EXISTS productSchema.productSalesTable (
-                                   productNumber INT,
-                                   productName STRING,
-                                   storeCity STRING,
-                                   storeProvince STRING,
-                                   productCategory STRING,
-                                   productBatch STRING,
-                                   saleQuantity INT,
-                                   revenue INT)
-    STORED BY 'carbondata'
-    TBLPROPERTIES ('SORT_COLUMNS'='productName,storeCity',
-                   'SORT_SCOPE'='NO_SORT')
-   ```
-  **NOTE:** CarbonData also supports "using carbondata". Find example code at 
[SparkSessionExample](https://github.com/apache/carbondata/blob/master/examples/spark2/src/main/scala/org/apache/carbondata/examples/SparkSessionExample.scala)
 in the CarbonData repo.
    
    - **Caching Min/Max Value for Required Columns**
      By default, CarbonData caches min and max values of all the columns in 
schema.  As the load increases, the memory required to hold the min and max 
values increases considerably. This feature enables you to configure min and 
max values only for the required columns, resulting in optimized memory usage. 
@@ -210,7 +212,7 @@ This tutorial is going to introduce all commands and data 
operations on CarbonDa
         COLUMN_META_CACHE=âcol1,col2,col3,â¦â
         ```
         
-        Columns to be cached can be specifies either while creating tale or 
after creation of the table.
+        Columns to be cached can be specified either while creating table or 
after creation of the table.
         During create table operation; specify the columns to be cached in 
table properties.
         
         Syntax:
@@ -574,6 +576,7 @@ Users can specify which columns to include and exclude for 
local dictionary gene
   ```
   REFRESH TABLE dbcarbon.productSalesTable
   ```
+  
   **NOTE:** 
   * The new database name and the old database name should be same.
   * Before executing this command the old table schema and data should be 
copied into the new database location.

http://git-wip-us.apache.org/repos/asf/carbondata/blob/12725b75/docs/datamap/bloomfilter-datamap-guide.md
----------------------------------------------------------------------
diff --git a/docs/datamap/bloomfilter-datamap-guide.md 
b/docs/datamap/bloomfilter-datamap-guide.md
index 8955cde..ccbcabe 100644
--- a/docs/datamap/bloomfilter-datamap-guide.md
+++ b/docs/datamap/bloomfilter-datamap-guide.md
@@ -1,4 +1,4 @@
-# CarbonData BloomFilter DataMap (Alpha feature in 1.4.0)
+# CarbonData BloomFilter DataMap (Alpha Feature)
 
 * [DataMap Management](#datamap-management)
 * [BloomFilter Datamap Introduction](#bloomfilter-datamap-introduction)
@@ -41,10 +41,10 @@ Disable Datamap
 
 ## BloomFilter DataMap Introduction
 A Bloom filter is a space-efficient probabilistic data structure that is used 
to test whether an element is a member of a set.
-Carbondata introduce BloomFilter as an index datamap to enhance the 
performance of querying with precise value.
+Carbondata introduced BloomFilter as an index datamap to enhance the 
performance of querying with precise value.
 It is well suitable for queries that do precise match on high cardinality 
columns(such as Name/ID).
 Internally, CarbonData maintains a BloomFilter per blocklet for each index 
column to indicate that whether a value of the column is in this blocklet.
-Just like the other datamaps, BloomFilter datamap is managed ablong with main 
tables by CarbonData.
+Just like the other datamaps, BloomFilter datamap is managed along with main 
tables by CarbonData.
 User can create BloomFilter datamap on specified columns with specified 
BloomFilter configurations such as size and probability.
 
 For instance, main table called **datamap_test** which is defined as:
@@ -83,9 +83,9 @@ User can create BloomFilter datamap using the Create DataMap 
DDL:
 
 | Property | Is Required | Default Value | Description |
 |-------------|----------|--------|---------|
-| INDEX_COLUMNS | YES |  | Carbondata will generate BloomFilter index on these 
columns. Queries on there columns are usually like 'COL = VAL'. |
-| BLOOM_SIZE | NO | 640000 | This value is internally used by BloomFilter as 
the number of expected insertions, it will affects the size of BloomFilter 
index. Since each blocklet has a BloomFilter here, so the default value is the 
approximate distinct index values in a blocklet assuming that each blocklet 
contains 20 pages and each page contains 32000 records. The value should be an 
integer. |
-| BLOOM_FPP | NO | 0.00001 | This value is internally used by BloomFilter as 
the False-Positive Probability, it will affects the size of bloomfilter index 
as well as the number of hash functions for the BloomFilter. The value should 
be in range (0, 1). In one test scenario, a 96GB TPCH customer table with 
bloom_size=320000 and bloom_fpp=0.00001 will result in 18 false positive 
samples. |
+| INDEX_COLUMNS | YES |  | Carbondata will generate BloomFilter index on these 
columns. Queries on these columns are usually like 'COL = VAL'. |
+| BLOOM_SIZE | NO | 640000 | This value is internally used by BloomFilter as 
the number of expected insertions, it will affect the size of BloomFilter 
index. Since each blocklet has a BloomFilter here, so the default value is the 
approximate distinct index values in a blocklet assuming that each blocklet 
contains 20 pages and each page contains 32000 records. The value should be an 
integer. |
+| BLOOM_FPP | NO | 0.00001 | This value is internally used by BloomFilter as 
the False-Positive Probability, it will affect the size of bloomfilter index as 
well as the number of hash functions for the BloomFilter. The value should be 
in the range (0, 1). In one test scenario, a 96GB TPCH customer table with 
bloom_size=320000 and bloom_fpp=0.00001 will result in 18 false positive 
samples. |
 | BLOOM_COMPRESS | NO | true | Whether to compress the BloomFilter index 
files. |
 
 

http://git-wip-us.apache.org/repos/asf/carbondata/blob/12725b75/docs/datamap/lucene-datamap-guide.md
----------------------------------------------------------------------
diff --git a/docs/datamap/lucene-datamap-guide.md 
b/docs/datamap/lucene-datamap-guide.md
index 5f7a2e4..119b609 100644
--- a/docs/datamap/lucene-datamap-guide.md
+++ b/docs/datamap/lucene-datamap-guide.md
@@ -1,4 +1,4 @@
-# CarbonData Lucene DataMap (Alpha feature in 1.4.0)
+# CarbonData Lucene DataMap (Alpha Feature)
   
 * [DataMap Management](#datamap-management)
 * [Lucene Datamap](#lucene-datamap-introduction)

http://git-wip-us.apache.org/repos/asf/carbondata/blob/12725b75/docs/datamap/timeseries-datamap-guide.md
----------------------------------------------------------------------
diff --git a/docs/datamap/timeseries-datamap-guide.md 
b/docs/datamap/timeseries-datamap-guide.md
index bea5286..15ca3fc 100644
--- a/docs/datamap/timeseries-datamap-guide.md
+++ b/docs/datamap/timeseries-datamap-guide.md
@@ -4,7 +4,7 @@
 * [Compaction](#compacting-pre-aggregate-tables)
 * [Data Management](#data-management-with-pre-aggregate-tables)
 
-## Timeseries DataMap Introduction (Alpha feature in 1.3.0)
+## Timeseries DataMap Introduction (Alpha Feature)
 Timeseries DataMap a pre-aggregate table implementation based on 
'pre-aggregate' DataMap.
 Difference is that Timeseries DataMap has built-in understanding of time 
hierarchy and
 levels: year, month, day, hour, minute, so that it supports automatic roll-up 
in time dimension 

http://git-wip-us.apache.org/repos/asf/carbondata/blob/12725b75/docs/sdk-guide.md
----------------------------------------------------------------------
diff --git a/docs/sdk-guide.md b/docs/sdk-guide.md
index 562269e..c7bff59 100644
--- a/docs/sdk-guide.md
+++ b/docs/sdk-guide.md
@@ -130,7 +130,7 @@ public class TestSdkJson {
        testJsonSdkWriter();
    }
    
-   public void testJsonSdkWriter() throws InvalidLoadOptionException {
+   public static void testJsonSdkWriter() throws InvalidLoadOptionException {
     String path = "./target/testJsonSdkWriter";
 
     Field[] fields = new Field[2];
@@ -297,7 +297,7 @@ public CarbonWriterBuilder persistSchemaFile(boolean 
persist);
 *               by default it is system time in nano seconds.
 * @return updated CarbonWriterBuilder
 */
-public CarbonWriterBuilder taskNo(String taskNo);
+public CarbonWriterBuilder taskNo(long taskNo);
 ```
 
 ```
@@ -340,7 +340,7 @@ public CarbonWriterBuilder withLoadOptions(Map<String, 
String> options);
 * @throws IOException
 * @throws InvalidLoadOptionException
 */
-public CarbonWriter buildWriterForCSVInput() throws IOException, 
InvalidLoadOptionException;
+public CarbonWriter 
buildWriterForCSVInput(org.apache.carbondata.sdk.file.Schema schema) throws 
IOException, InvalidLoadOptionException;
 ```
 
 ```  
@@ -351,7 +351,7 @@ public CarbonWriter buildWriterForCSVInput() throws 
IOException, InvalidLoadOpti
 * @throws IOException
 * @throws InvalidLoadOptionException
 */
-public CarbonWriter buildWriterForAvroInput() throws IOException, 
InvalidLoadOptionException;
+public CarbonWriter buildWriterForAvroInput(org.apache.avro.Schema schema) 
throws IOException, InvalidLoadOptionException;
 ```
 
 ```

[39/50] [abbrv] carbondata git commit: [Documentation] Editorial review comment fixed

Reply via email to