In hive cli: msck repair table 《table_name》; Thanks Deepak
On Feb 11, 2018 11:14, "☼ R Nair (रविशंकर नायर)" <ravishankar.n...@gmail.com> wrote: > NO, can you pease explain the command ? Let me try now. > > Best, > > On Sun, Feb 11, 2018 at 12:40 AM, Deepak Sharma <deepakmc...@gmail.com> > wrote: > >> I am not sure about the exact issue bjt i see you are partioning while >> writing from spark. >> Did you tried msck repair on the table before reading it in hive ? >> >> Thanks >> Deepak >> >> On Feb 11, 2018 11:06, "☼ R Nair (रविशंकर नायर)" < >> ravishankar.n...@gmail.com> wrote: >> >>> All, >>> >>> Thanks for the inputs. Again I am not successful. I think, we need to >>> resolve this, as this is a very common requirement. >>> >>> Please go through my complete code: >>> >>> STEP 1: Started Spark shell as spark-shell --master yarn >>> >>> STEP 2: Flowing code is being given as inout to shark shell >>> >>> import org.apache.spark.sql.Row >>> import org.apache.spark.sql.SparkSession >>> val warehouseLocation ="/user/hive/warehouse" >>> >>> val spark = SparkSession.builder().appName("Spark Hive >>> Example").config("spark.sql.warehouse.dir", >>> warehouseLocation).enableHiveSupport().getOrCreate() >>> >>> import org.apache.spark.sql._ >>> var passion_df = spark.read. >>> format("jdbc"). >>> option("url", "jdbc:mysql://localhost:3307/policies"). >>> option("driver" ,"com.mysql.jdbc.Driver"). >>> option("user", "root"). >>> option("password", "root"). >>> option("dbtable", "insurancedetails"). >>> option("partitionColumn", "policyid"). >>> option("lowerBound", "1"). >>> option("upperBound", "100000"). >>> option("numPartitions", "4"). >>> load() >>> //Made sure that passion_df is created, as passion_df.show(5) shows me >>> correct data. >>> passion_df.write.saveAsTable("default.mine") //Default parquet >>> >>> STEP 3: Went to HIVE. Started HIVE prompt. >>> >>> hive> show tables; >>> OK >>> callcentervoicelogs >>> mine >>> Time taken: 0.035 seconds, Fetched: 2 row(s) >>> //As you can see HIVE is showing the table "mine" in default schema. >>> >>> STEP 4: HERE IS THE PROBLEM. >>> >>> hive> select * from mine; >>> OK >>> Time taken: 0.354 seconds >>> hive> >>> //Where is the data ??? >>> >>> STEP 5: >>> >>> See the below command on HIVE >>> >>> describe formatted mine; >>> OK >>> # col_name data_type comment >>> >>> policyid int >>> statecode string >>> socialid string >>> county string >>> eq_site_limit decimal(10,2) >>> hu_site_limit decimal(10,2) >>> fl_site_limit decimal(10,2) >>> fr_site_limit decimal(10,2) >>> tiv_2014 decimal(10,2) >>> tiv_2015 decimal(10,2) >>> eq_site_deductible int >>> hu_site_deductible int >>> fl_site_deductible int >>> fr_site_deductible int >>> latitude decimal(6,6) >>> longitude decimal(6,6) >>> line string >>> construction string >>> point_granularity int >>> >>> # Detailed Table Information >>> Database: default >>> Owner: ravishankarnair >>> CreateTime: Sun Feb 11 00:26:40 EST 2018 >>> LastAccessTime: UNKNOWN >>> Protect Mode: None >>> Retention: 0 >>> Location: file:/Users/ravishankarnair/spark-warehouse/mine >>> Table Type: MANAGED_TABLE >>> Table Parameters: >>> spark.sql.sources.provider parquet >>> spark.sql.sources.schema.numParts 1 >>> spark.sql.sources.schema.part.0 {\"type\":\"struct\",\"fields\ >>> ":[{\"name\":\"policyid\",\"type\":\"integer\",\"nullable\": >>> true,\"metadata\":{\"name\":\"policyid\",\"scale\":0}},{\"na >>> me\":\"statecode\",\"type\":\"string\",\"nullable\":true,\"m >>> etadata\":{\"name\":\"statecode\",\"scale\":0}},{\"name\":\" >>> Socialid\",\"type\":\"string\",\"nullable\":true,\"metadata\ >>> ":{\"name\":\"Socialid\",\"scale\":0}},{\"name\":\"county\", >>> \"type\":\"string\",\"nullable\":true,\"metadata\":{ >>> \"name\":\"county\",\"scale\":0}},{\"name\":\"eq_site_limit\ >>> ",\"type\":\"decimal(10,2)\",\"nullable\":true,\"metadata\": >>> {\"name\":\"eq_site_limit\",\"scale\":2}},{\"name\":\"hu_ >>> site_limit\",\"type\":\"decimal(10,2)\",\"nullable\": >>> true,\"metadata\":{\"name\":\"hu_site_limit\",\"scale\":2}}, >>> {\"name\":\"fl_site_limit\",\"type\":\"decimal(10,2)\",\" >>> nullable\":true,\"metadata\":{\"name\":\"fl_site_limit\",\" >>> scale\":2}},{\"name\":\"fr_site_limit\",\"type\":\" >>> decimal(10,2)\",\"nullable\":true,\"metadata\":{\"name\":\" >>> fr_site_limit\",\"scale\":2}},{\"name\":\"tiv_2014\",\"type\ >>> ":\"decimal(10,2)\",\"nullable\":true,\"metadata\":{ >>> \"name\":\"tiv_2014\",\"scale\":2}},{\"name\":\"tiv_2015\",\ >>> "type\":\"decimal(10,2)\",\"nullable\":true,\"metadata\":{ >>> \"name\":\"tiv_2015\",\"scale\":2}},{\"name\":\"eq_site_ >>> deductible\",\"type\":\"integer\",\"nullable\":true,\" >>> metadata\":{\"name\":\"eq_site_deductible\",\"scale\":0} >>> },{\"name\":\"hu_site_deductible\",\"type\":\"intege >>> r\",\"nullable\":true,\"metadata\":{\"name\":\"hu_site_ >>> deductible\",\"scale\":0}},{\"name\":\"fl_site_deductible\", >>> \"type\":\"integer\",\"nullable\":true,\"metadata\":{\"name\ >>> ":\"fl_site_deductible\",\"scale\":0}},{\"name\":\"fr_ >>> site_deductible\",\"type\":\"integer\",\"nullable\":true,\" >>> metadata\":{\"name\":\"fr_site_deductible\",\"scale\":0}},{\ >>> "name\":\"latitude\",\"type\":\"decimal(6,6)\",\"nullable\": >>> true,\"metadata\":{\"name\":\"latitude\",\"scale\":6}},{\"na >>> me\":\"longitude\",\"type\":\"decimal(6,6)\",\"nullable\":tr >>> ue,\"metadata\":{\"name\":\"longitude\",\"scale\":6}},{\"nam >>> e\":\"line\",\"type\":\"string\",\"nullable\":true,\"metadat >>> a\":{\"name\":\"line\",\"scale\":0}},{\"name\":\"constr >>> uction\",\"type\":\"string\",\"nullable\":true,\"metadata\": >>> {\"name\":\"construction\",\"scale\":0}},{\"name\":\"point_ >>> granularity\",\"type\":\"integer\",\"nullable\":true,\" >>> metadata\":{\"name\":\"point_granularity\",\"scale\":0}}]} >>> transient_lastDdlTime 1518326800 >>> >>> # Storage Information >>> SerDe Library: org.apache.hadoop.hive.ql.io.p >>> arquet.serde.ParquetHiveSerDe >>> InputFormat: org.apache.hadoop.hive.ql.io.p >>> arquet.MapredParquetInputFormat >>> OutputFormat: org.apache.hadoop.hive.ql.io.p >>> arquet.MapredParquetOutputFormat >>> Compressed: No >>> Num Buckets: -1 >>> Bucket Columns: [] >>> Sort Columns: [] >>> Storage Desc Params: >>> path hdfs://localhost:8020/user/hive/warehouse/mine >>> serialization.format 1 >>> Time taken: 0.077 seconds, Fetched: 48 row(s) >>> >>> Now, I see your advise and support. Whats the issue? Am I doing wrong, >>> it it a bug ? I am using Spark 2.2.1, HIVE 1.2.1, HADOOP 2.7.3. All class >>> path, configuration are set properly. >>> >>> Best, >>> >>> Ravion >>> >>> On Fri, Feb 9, 2018 at 1:29 PM, Nicholas Hakobian < >>> nicholas.hakob...@rallyhealth.com> wrote: >>> >>>> Its possible that the format of your table is not compatible with your >>>> version of hive, so Spark saved it in a way such that only Spark can read >>>> it. When this happens it prints out a very visible warning letting you know >>>> this has happened. >>>> >>>> We've seen it most frequently when trying to save a parquet file with a >>>> column in date format into a Hive table. In older versions of hive, its >>>> parquet reader/writer did not support Date formats (among a couple others). >>>> >>>> Nicholas Szandor Hakobian, Ph.D. >>>> Staff Data Scientist >>>> Rally Health >>>> nicholas.hakob...@rallyhealth.com >>>> >>>> >>>> On Fri, Feb 9, 2018 at 9:59 AM, Prakash Joshi <prakashcjos...@gmail.com >>>> > wrote: >>>> >>>>> Ravi, >>>>> >>>>> Can you send the result of >>>>> Show create table your_table_name >>>>> >>>>> Thanks >>>>> Prakash >>>>> >>>>> On Feb 9, 2018 8:20 PM, "☼ R Nair (रविशंकर नायर)" < >>>>> ravishankar.n...@gmail.com> wrote: >>>>> >>>>>> All, >>>>>> >>>>>> It has been three days continuously I am on this issue. Not getting >>>>>> any clue. >>>>>> >>>>>> Environment: Spark 2.2.x, all configurations are correct. >>>>>> hive-site.xml is in spark's conf. >>>>>> >>>>>> 1) Step 1: I created a data frame DF1 reading a csv file. >>>>>> >>>>>> 2) Did manipulations on DF1. Resulting frame is passion_df. >>>>>> >>>>>> 3) passion_df.write.format("orc").saveAsTable("sampledb.passion") >>>>>> >>>>>> 4) The metastore shows the hive table., when I do "show tables" in >>>>>> HIVE, I can see table name >>>>>> >>>>>> 5) I can't select in HIVE, though I can select from SPARK as >>>>>> spark.sql("select * from sampledb.passion") >>>>>> >>>>>> Whats going on here? Please help. Why I am not seeing data from HIVE >>>>>> prompt? >>>>>> The "describe formatted " command on the table in HIVE shows he data >>>>>> is is in default warehouse location ( /user/hive/warehouse) since I set >>>>>> it. >>>>>> >>>>>> I am not getting any definite answer anywhere. Many suggestions and >>>>>> answers given in Stackoverflow et al.Nothing really works. >>>>>> >>>>>> So asking experts here for some light on this, thanks >>>>>> >>>>>> Best, >>>>>> Ravion >>>>>> >>>>>> >>>>>> >>>> >>> >>> >>> -- >>> >>> > > > -- > >