Re: Parquet schema migrations

2014-10-24 Thread Gary Malouf
Hi Michael, Does this affect people who use Hive for their metadata store as well? I'm wondering if the issue is as bad as I think it is - namely that if you build up a year's worth of data, adding a field forces you to have to migrate that entire year's data. Gary On Wed, Oct 8, 2014 at 5:08 P

Re: Parquet schema migrations

2014-10-08 Thread Cody Koeninger
On Wed, Oct 8, 2014 at 3:19 PM, Michael Armbrust wrote: > > I was proposing you manually convert each different format into one > unified format (by adding literal nulls and such for missing columns) and > then union these converted datasets. It would be weird to have union all > try and do thi

Re: Parquet schema migrations

2014-10-08 Thread Michael Armbrust
> > The kind of change we've made that it probably makes most sense to support > is adding a nullable column. I think that also implies supporting > "removing" a nullable column, as long as you don't end up with columns of > the same name but different type. > Filed here: https://issues.apache.org

Re: Parquet schema migrations

2014-10-06 Thread Cody Koeninger
hange the on-disk >> sstable format, it will do a convert-on-read as you access the sstables, >> or >> you can run the upgradesstables command to convert them all at once >> post-upgrade. >> >> Andrew >> >> On Fri, Oct 3, 2014 at 4:33 PM, Cody Koening

Re: Parquet schema migrations

2014-10-05 Thread Michael Armbrust
4:33 PM, Cody Koeninger wrote: > > > Wondering if anyone has thoughts on a path forward for parquet schema > > migrations, especially for people (like us) that are using raw parquet > > files rather than Hive. > > > > So far we've gotten away with reading old files, converting, and writing > to > > new directories, but that obviously becomes problematic above a certain > > data size. > > >

Re: Parquet schema migrations

2014-10-05 Thread Andrew Ash
PM, Cody Koeninger wrote: > Wondering if anyone has thoughts on a path forward for parquet schema > migrations, especially for people (like us) that are using raw parquet > files rather than Hive. > > So far we've gotten away with reading old files, converting, and writing to

Parquet schema migrations

2014-10-03 Thread Cody Koeninger
Wondering if anyone has thoughts on a path forward for parquet schema migrations, especially for people (like us) that are using raw parquet files rather than Hive. So far we've gotten away with reading old files, converting, and writing to new directories, but that obviously becomes proble