[Help] Carbondata use spark sql "select" query , but return empty dataset

odone Fri, 07 Dec 2018 03:12:22 -0800

Hi, 

I am trying to run example on Carbon data guide. 
https://carbondata.apache.org/quick-start-guide.html



Run it through spark-shell on local mode.
Start command:
/opt/spark2.3.2/bin/spark-shell --jars
apache-carbondata-1.5.1-bin-spark2.3.2-hadoop2.7.2.jar --master local

Code:
val store = "hdfs:///user/spark2/data/carbondata/store"
val meta = "hdfs:///user/spar2/data/carbondata"

import org.apache.spark.sql.CarbonSession._
val carbon =
SparkSession.builder().appName("CarbonSessionExample").getOrCreateCarbonSession(store,
meta)

val df = carbon.read.option("header",
true).csv("hdfs:///user/spark2/carbon_test.csv")
df.write.format("carbondata").option("tableName",
"carbon_test_t0").option("compress", "true").mode(SaveMode.Overwrite).save()

carbon_test.csv data like this:

id,name,city,age
a,xx,cc,1
b,xxx,ccc,2
c,xxxxx,ccc,3
d,xxx,fd,4

REPL print save result:
2018-12-07 15:58:40 AUDIT audit:72 - {"time":"December 7, 2018 3:58:40 PM
CST","username":"spark2","opName":"CREATE
TABLE","opId":"254101956103089","opStatus":"START"}
2018-12-07 15:58:41 WARN  HiveExternalCatalog:66 - Couldn't find
corresponding Hive SerDe for data source provider
org.apache.spark.sql.CarbonSource. Persisting data source table
`default`.`carbon_test_t0` into Hive metastore in Spark SQL specific format,
which is NOT compatible with Hive.
2018-12-07 15:58:41 AUDIT audit:93 - {"time":"December 7, 2018 3:58:41 PM
CST","username":"spark2","opName":"CREATE
TABLE","opId":"254101956103089","opStatus":"SUCCESS","opTime":"359
ms","table":"default.carbon_test_t0","extraInfo":{"bad_record_path":"","streaming":"false","local_dictionary_enable":"true","external":"false","sort_columns":"id,name,city,age","comment":""}}
2018-12-07 15:58:41 AUDIT audit:72 - {"time":"December 7, 2018 3:58:41 PM
CST","username":"spark2","opName":"LOAD DATA
OVERWRITE","opId":"254102325557552","opStatus":"START"}
2018-12-07 15:58:41 WARN  UnsafeIntermediateMerger:88 - the configure spill
size is 0 less than the page size 67108864,so no merge and spill in-memory
pages to disk
2018-12-07 15:58:42 WARN  CarbonDataProcessorUtil:93 - dir already exists,
skip dir creation:
/home/spark2/app/tmp/carbon254102981358020_0/Fact/Part0/Segment_0/0
2018-12-07 15:58:43 AUDIT audit:93 - {"time":"December 7, 2018 3:58:43 PM
CST","username":"spark2","opName":"LOAD DATA
OVERWRITE","opId":"254102325557552","opStatus":"SUCCESS","opTime":"1972
ms","table":"default.carbon_test_t0","extraInfo":{"SegmentId":"0","DataSize":"1.19KB","IndexSize":"674.0B"}}


SHOW SEGMENTS FOR TABLE carbon_test_t0 Result:
2018-12-07 16:01:18 AUDIT audit:72 - {"time":"December 7, 2018 4:01:18 PM
CST","username":"spark2","opName":"SHOW
SEGMENTS","opId":"254259762484865","opStatus":"START"}
2018-12-07 16:01:18 AUDIT audit:93 - {"time":"December 7, 2018 4:01:18 PM
CST","username":"spark2","opName":"SHOW
SEGMENTS","opId":"254259762484865","opStatus":"SUCCESS","opTime":"57
ms","table":"default.carbon_test_t0","extraInfo":{}}
+-----------------+-------+--------------------+--------------------+---------+-----------+---------+----------+
|SegmentSequenceId| Status|     Load Start Time|       Load End Time|Merged
To|File Format|Data Size|Index Size|
+-----------------+-------+--------------------+--------------------+---------+-----------+---------+----------+
|                0|Success|2018-12-07 15:58:...|2018-12-07 15:58:...|      
NA|COLUMNAR_V3|   1.19KB|    674.0B|
+-----------------+-------+--------------------+--------------------+---------+-----------+---------+----------+

But i run query "select * from carbon_test_t0" Result:
+---+----+----+---+
| id|name|city|age|
+---+----+----+---+
+---+----+----+---+

I am sure data insert successfully, because new segment can find when
running "insert into" or "save overwrite" command.

Thanks, 
Odone





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

[Help] Carbondata use spark sql "select" query , but return empty dataset

Reply via email to