BTW can you log in to thrift server and do select * from <TABLE> limit 10

Do you see the rows?

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 30 July 2016 at 07:20, Mich Talebzadeh <mich.talebza...@gmail.com> wrote:

> Works OK for me
>
> scala> val df =
> sqlContext.read.format("com.databricks.spark.csv").option("inferSchema",
> "true").option("header",
> "false").load("hdfs://rhes564:9000/data/stg/accounts/ll/18740868")
> df: org.apache.spark.sql.DataFrame = [C0: string, C1: string, C2: string,
> C3: string, C4: string, C5: string, C6: string, C7: string, C8: string]
> scala>
> df.write.mode("overwrite").parquet("/user/hduser/ll_18740868.parquet")
> scala> sqlContext.read.parquet("/user/hduser/ll_18740868.parquet")count
> res2: Long = 3651
> scala> val ff = sqlContext.read.parquet("/user/hduser/ll_18740868.parquet")
> ff: org.apache.spark.sql.DataFrame = [C0: string, C1: string, C2: string,
> C3: string, C4: string, C5: string, C6: string, C7: string, C8: string]
> scala> ff.take(5)
> res3: Array[org.apache.spark.sql.Row] = Array([Transaction
> Date,Transaction Type,Sort Code,Account Number,Transaction
> Description,Debit Amount,Credit Amount,Balance,],
> [31/12/2009,CPT,'30-64-72,18740868,LTSB STH KENSINGTO CD 5710 31DEC09
> ,90.00,,400.00,null], [31/12/2009,CPT,'30-64-72,18740868,LTSB CHELSEA (3091
> CD 5710 31DEC09 ,10.00,,490.00,null],
> [31/12/2009,DEP,'30-64-72,18740868,CHELSEA ,,500.00,500.00,null],
> [Transaction Date,Transaction Type,Sort Code,Account Number,Transaction
> Description,Debit Amount,Credit Amount,Balance,])
>
> Now in Zeppelin create an external table and read it
>
> [image: Inline images 2]
>
>
> HTH
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 29 July 2016 at 09:04, Chanh Le <giaosu...@gmail.com> wrote:
>
>> I continue to debug
>>
>> *16/07/29 13:57:35 INFO FileScanRDD: Reading File path:
>> file:///Users/giaosudau/Documents/Topics.parquet/part-r-00000-8997050f-e063-427e-b53c-f0a61739706f.gz.parquet,
>> range: 0-3118, partition values: [empty row]*
>> vs OK one
>> *16/07/29 15:02:47 INFO FileScanRDD: Reading File path:
>> file:///Users/giaosudau/data_example/FACT_ADMIN_HOURLY/time=2016-07-24-18/network_id=30206/part-r-00000-c5f5e18d-c8a1-4831-8903-3c60b02bdfe8.snappy.parquet,
>> range: 0-6050, partition values: [2016-07-24-18,30206]*
>>
>> I attached 2 files.
>>
>>
>>
>>
>>
>>
>> On Jul 29, 2016, at 9:44 AM, Chanh Le <giaosu...@gmail.com> wrote:
>>
>> Hi everyone,
>>
>> For more investigation I attached the file that I convert CSV to parquet.
>>
>> Spark Code
>>
>> I loaded from CSV file
>> *val df = spark.sqlContext.read
>> .format("com.databricks.spark.csv").option("delimiter",
>> ",").option("header", "true").option("inferSchema",
>> "true").load("/Users/giaosudau/Downloads/Topics.xls - Sheet 1.csv")*
>> I create a Parquet
>>
>> *df.write.mode("overwrite").parquet("/Users/giaosudau/Documents/Topics.parquet”)*
>>
>> It’s OK in Spark-Shell
>>
>> *scala> df.take(5)*
>> *res22: Array[org.apache.spark.sql.Row] = Array([124,Nghệ thuật & Giải
>> trí,Arts & Entertainment,0,124,1], [53,Scandal,Scandal,124,124,53,2],
>> [54,Showbiz - World,Showbiz-World,124,124,54,2], [52,Âm
>> nhạc,Entertainment-Music,124,124,52,2], [47,Bar - Karaoke -
>> Massage,Bar-Karaoke-Massage-Prostitution,124,124,47,2])*
>>
>> When Create a table in STS
>>
>> *0: jdbc:hive2://localhost:10000> CREATE EXTERNAL TABLE topic (TOPIC_ID
>> int, TOPIC_NAME_VN String, TOPIC_NAME_EN String, PARENT_ID int, FULL_PARENT
>> String, LEVEL_ID int) STORED AS PARQUET LOCATION
>> '/Users/giaosudau/Documents/Topics.parquet’;*
>>
>> But I get all result NULL
>>
>> <Screen Shot 2016-07-29 at 9.42.26 AM.png>
>>
>>
>>
>> I think it’s really a BUG right?
>>
>> Regards,
>> Chanh
>>
>>
>> <Topics.parquet>
>>
>>
>> <Topics.xls - Sheet 1.csv>
>>
>>
>>
>>
>>
>> On Jul 28, 2016, at 4:25 PM, Chanh Le <giaosu...@gmail.com> wrote:
>>
>> Hi everyone,
>>
>> I have problem when I create a external table in Spark Thrift Server
>> (STS) and query the data.
>>
>> Scenario:
>> *Spark 2.0*
>> *Alluxio 1.2.0 *
>> *Zeppelin 0.7.0*
>> STS start script
>> */home/spark/spark-2.0.0-bin-hadoop2.6/sbin/start-thriftserver.sh
>> --master mesos://zk://master1:2181,master2:2181,master3:2181/mesos --conf
>> spark.driver.memory=5G --conf spark.scheduler.mode=FAIR --class
>> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --jars
>> /home/spark/spark-2.0.0-bin-hadoop2.6/jars/alluxio-core-client-spark-1.2.0-jar-with-dependencies.jar
>> --total-executor-cores 35 spark-internal --hiveconf
>> hive.server2.thrift.port=10000 --hiveconf
>> hive.metastore.warehouse.dir=/user/hive/warehouse --hiveconf
>> hive.metastore.metadb.dir=/user/hive/metadb --conf
>> spark.sql.shuffle.partitions=20*
>>
>> I have a file store in Alluxio *alluxio://master2:19998/etl_info/TOPIC*
>>
>> then I create a table in STS by
>> CREATE EXTERNAL TABLE topic (topic_id int, topic_name_vn String,
>> topic_name_en String, parent_id int, full_parent String, level_id int)
>> STORED AS PARQUET LOCATION 'alluxio://master2:19998/etl_info/TOPIC';
>>
>> to compare STS with Spark I create a temp table with name topics
>> spark.sqlContext.read.parquet("alluxio://master2:19998/etl_info/TOPIC
>> ").registerTempTable("topics")
>>
>> Then I do query and compare.
>> <Screen Shot 2016-07-28 at 4.18.59 PM.png>
>>
>>
>> As you can see the result is different.
>> Is that a bug? Or I did something wrong
>>
>> Regards,
>> Chanh
>>
>>
>>
>>
>>
>

Reply via email to