yeah. database design is always subjective so everybody has an opinion about it. but if you're just starting out i would recommend you kinda follow the rules as you would in a traditional relational database system. so two different datasets would mean two different tables in both Hive and an Rdb database.
Start there anyway and get your feet wet. :) On Wed, Aug 21, 2013 at 7:24 AM, Chris Driscol <cdris...@rallydev.com>wrote: > Hi - > I just started to get my feet wet with Hive and have a question that I > have not been able to find an answer to.. > > Suppose I have 2 CSV files: > >cat Schema1.csv > Name, Address, Phone > Chris, address1, 999-999-9999 > > and > >cat Schema2.csv > Id, Name, Address, Gender, Phone > 13, Tom, address2, male, 888-888-8888 > > I put these two files into Hadoop and want to be able to query these 2 > different schema's via Hive.. > > Do I need to create two tables in Hive to represent both schemas and use a > join? Or is there a better way that can handle these two different schemas? > > Please reply back with any other specific questions, I realize this is > somewhat open-ended.. thanks! > > -- > -cd >