Try this sorry about cut and paste
http://search.gmane.org/?author=Maurin+Lenglart&sort=date | 10 Apr 06:06 2016 alter table add columns aternatives or hive refresh Maurin Lenglart < http://gmane.org/get-address.php?address=maurin%2dlH42%2fCN5sHOGw%2bnKnLezzg%40public.gmane.org > 2016-04-10 04:06:44 GMT Hi, I am trying to add columns to table that I created with the “saveAsTable” api. I update the columns using sqlContext.sql(‘alter table myTable add columns (mycol string)’). The next time I create a df and save it in the same table, with the new columns I get a : “ParquetRelation requires that the query in the SELECT clause of the INSERT INTO/OVERWRITE statement generates the same number of columns as its schema.” Also thise two commands don t return the same columns : 1. sqlContext.table(‘myTable’).schema.fields <— wrong result 2. sqlContext.sql(’show columns in mytable’) <—— good results It seems to be a known bug : https://issues.apache.org/jira/browse/SPARK-9764 (see related bugs) But I am wondering, how else can I update the columns or make sure that spark take the new columns? I already tried to refreshTable and to restart spark. thanks Top of Form http://permalink.gmane.org/gmane.comp.lang.scala.spark.user/32484 | http://post.gmane.org/post.php?group=gmane.comp.lang.scala.spark.user&followup=32484 | Bottom of Form http:// http://search.gmane.org/?author=Mich+Talebzadeh&sort=date | 10 Apr 12:41 2016 Re: alter table add columns aternatives or hive refresh Mich Talebzadeh < http://gmane.org/get-address.php?address=mich.talebzadeh%2dRe5JQEeQqe8AvxtiuMwx3w%40public.gmane.org > 2016-04-10 10:41:59 GMT I have not tried it on Spark but the column added in Hive to an existing table cannot be updated for existing rows. In other words the new column is set to null which does not require the change in the existing file length. So basically as I understand when a column is added to an already table. 1. The metadata for the underlying table will be updated 2. The new column will by default have null value 3. The existing rows cannot have new column updated to a non null value 4. New rows can have non null values set for the new column 5. No sql operation can be done on that column. For example select * from <TABLE> where new_column IS NOT NULL 6. The easiest option is to create a new table with the new column and do insert/select from the existing table with values set for the new column HTH Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw http://talebzadehmich.wordpress.com/ On 10 April 2016 at 05:06, Maurin Lenglart <mailto:maurin-lH42/ cn5shogw+nknle...@public.gmane.org> wrote: Hi, I am trying to add columns to table that I created with the “saveAsTable” api. I update the columns using sqlContext.sql(‘alter table myTable add columns (mycol string)’). The next time I create a df and save it in the same table, with the new columns I get a : “ParquetRelation requires that the query in the SELECT clause of the INSERT INTO/OVERWRITE statement generates the same number of columns as its schema.” Also thise two commands don t return the same columns : 1. sqlContext.table(‘myTable’).schema.fields <— wrong result 2. sqlContext.sql(’show columns in mytable’) <—— good results It seems to be a known bug : https://issues.apache.org/jira/browse/SPARK-9764 (see related bugs) But I am wondering, how else can I update the columns or make sure that spark take the new columns? I already tried to refreshTable and to restart spark. thanks Top of Form http://permalink.gmane.org/gmane.comp.lang.scala.spark.user/32487 | http://post.gmane.org/post.php?group=gmane.comp.lang.scala.spark.user&followup=32487 | Bottom of Form http:// http://search.gmane.org/?author=Maurin+Lenglart&sort=date | 10 Apr 20:34 2016 Re: alter table add columns aternatives or hive refresh Maurin Lenglart < http://gmane.org/get-address.php?address=maurin%2dlH42%2fCN5sHOGw%2bnKnLezzg%40public.gmane.org > 2016-04-10 18:34:08 GMT Hi, So basically you are telling me that I need to recreate a table, and re-insert everything every time I update a column? I understand the constraints, but that solution doesn’t look good to me. I am updating the schema everyday and the table is a couple of TB of data. Do you see any other options that will allow me not to move TB of data everyday? Thanks for you answer From: Mich Talebzadeh <mailto:mich.talebzadeh%20%3Cat%3E%20gmail.com> Date: Sunday, April 10, 2016 at 3:41 AM To: maurin lenglart <mailto:maurin%20%3Cat%3E%20cuberonlabs.com> Cc: "mailto:user%20%3Cat%3E%20spark.apache.org" <mailto: user%20%3Cat%3E%20spark.apache.org> Subject: Re: alter table add columns aternatives or hive refresh I have not tried it on Spark but the column added in Hive to an existing table cannot be updated for existing rows. In other words the new column is set to null which does not require the change in the existing file length. So basically as I understand when a column is added to an already table. 1. The metadata for the underlying table will be updated 2. The new column will by default have null value 3. The existing rows cannot have new column updated to a non null value 4. New rows can have non null values set for the new column 5. No sql operation can be done on that column. For example select * from <TABLE> where new_column IS NOT NULL 6. The easiest option is to create a new table with the new column and do insert/select from the existing table with values set for the new column HTH Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw http://talebzadehmich.wordpress.com/ On 10 April 2016 at 05:06, Maurin Lenglart < mailto: maurin%20%3Cat%3E%20cuberonlabs.com> wrote: Hi, I am trying to add columns to table that I created with the “saveAsTable” api. I update the columns using sqlContext.sql(‘alter table myTable add columns (mycol string)’). The next time I create a df and save it in the same table, with the new columns I get a : “ParquetRelation requires that the query in the SELECT clause of the INSERT INTO/OVERWRITE statement generates the same number of columns as its schema.” Also thise two commands don t return the same columns : 1. sqlContext.table(‘myTable’).schema.fields <— wrong result 2. sqlContext.sql(’show columns in mytable’) <—— good results It seems to be a known bug : https://issues.apache.org/jira/browse/SPARK-9764 (see related bugs) But I am wondering, how else can I update the columns or make sure that spark take the new columns? I already tried to refreshTable and to restart spark. thanks Top of Form http://permalink.gmane.org/gmane.comp.lang.scala.spark.user/32501 | http://post.gmane.org/post.php?group=gmane.comp.lang.scala.spark.user&followup=32501 | Bottom of Form http:// http://search.gmane.org/?author=Mich+Talebzadeh&sort=date | 10 Apr 21:25 2016 Re: alter table add columns aternatives or hive refresh Mich Talebzadeh < http://gmane.org/get-address.php?address=mich.talebzadeh%2dRe5JQEeQqe8AvxtiuMwx3w%40public.gmane.org > 2016-04-10 19:25:04 GMT Hi, I am confining myself to Hive tables. As I stated it before I have not tried it in Spark. So I stand corrected. Let us try this simple test in Hive -- Create table hive> create table testme(col1 int); OK --insert a row hive> insert into testme values(1); Loading data to table test.testme OK -- Add a new column to testme hive> alter table testme add columns (new_col varchar(30)); OK Time taken: 0.055 seconds -- Expect one row here hive> select * from testme; OK 1 NULL -- Add a new row including values for new_col. This should work hive> insert into testme values(1,'London'); Loading data to table test.testme OK hive> select * from testme; OK 1 NULL 1 London Time taken: 0.074 seconds, Fetched: 2 row(s) -- Now update the new column hive> update testme set col2 = 'NY'; FAILED: SemanticException [Error 10297]: Attempt to do update or delete on table test.testme that does not use an AcidOutputFormat or is not bucketed So this is Hive. You can add new rows including values for the new column but cannot update the null values. Will this work for you? HTH Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw http://talebzadehmich.wordpress.com/ On 10 April 2016 at 19:34, Maurin Lenglart <mailto:maurin-lH42/ cn5shogw+nknle...@public.gmane.org> wrote: Hi, So basically you are telling me that I need to recreate a table, and re-insert everything every time I update a column? I understand the constraints, but that solution doesn’t look good to me. I am updating the schema everyday and the table is a couple of TB of data. Do you see any other options that will allow me not to move TB of data everyday? Thanks for you answer From: Mich Talebzadeh <mailto: mich.talebzadeh-re5jqeeqqe8avxtiumw...@public.gmane.org> Date: Sunday, April 10, 2016 at 3:41 AM To: maurin lenglart <mailto:maurin-lH42/cn5shogw+nknle...@public.gmane.org> Cc: "mailto:user%20%3Cat%3E%20spark.apache.org" <mailto: user-NQCDCRQ5me8yzMRdD/i...@public.gmane.org> Subject: Re: alter table add columns aternatives or hive refresh I have not tried it on Spark but the column added in Hive to an existing table cannot be updated for existing rows. In other words the new column is set to null which does not require the change in the existing file length. So basically as I understand when a column is added to an already table. 1. The metadata for the underlying table will be updated 2. The new column will by default have null value 3. The existing rows cannot have new column updated to a non null value 4. New rows can have non null values set for the new column 5. No sql operation can be done on that column. For example select * from <TABLE> where new_column IS NOT NULL 6. The easiest option is to create a new table with the new column and do insert/select from the existing table with values set for the new column HTH Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw http://talebzadehmich.wordpress.com/ On 10 April 2016 at 05:06, Maurin Lenglart < mailto:maurin-lH42/ cn5shogw+nknle...@public.gmane.org> wrote: Hi, I am trying to add columns to table that I created with the “saveAsTable” api. I update the columns using sqlContext.sql(‘alter table myTable add columns (mycol string)’). The next time I create a df and save it in the same table, with the new columns I get a : “ParquetRelation requires that the query in the SELECT clause of the INSERT INTO/OVERWRITE statement generates the same number of columns as its schema.” Also thise two commands don t return the same columns : 1. sqlContext.table(‘myTable’).schema.fields <— wrong result 2. sqlContext.sql(’show columns in mytable’) <—— good results It seems to be a known bug : https://issues.apache.org/jira/browse/SPARK-9764 (see related bugs) But I am wondering, how else can I update the columns or make sure that spark take the new columns? I already tried to refreshTable and to restart spark. thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 16 May 2016 at 20:02, Matthew McCline <mmccl...@hortonworks.com> wrote: > > > What version of Hive are you on? > > > ------------------------------ > *From:* Mahender Sarangam <mahender.bigd...@outlook.com> > *Sent:* Saturday, May 14, 2016 3:29 PM > *To:* user@hive.apache.org > *Subject:* Query Failing while querying on ORC Format > > Hi, > We are dumping our data into ORC Partition Bucketed table. We have loaded > almost 6 months data and here month is Partition by column. Now we have > modified ORC partition bucketed table schema. We have added 2 more columns > to the ORC table. Now whenever we are running select statement for older > month which has no columns( even though these columns are not part in > select clause, (projection column) ), it is throwing exception. > > There is JIRA bug for this kind of requirement has already been raised. > https://issues.apache.org/jira/browse/HIVE-11981 > > Can any one please tell me know alternative workaround for reading old > previous columns of ORC partition table. > > Thanks >