need some Clarification about tmp folder(urgent)

2012-03-13 Thread hadoop hive
Hi Folks, i Have some question like in my tmp folder on hdfs i have a user called hadoop whose having much space, i need to free this space so please help me out. * /tmp/hive-hadoop/hive_2012-03-13_04-32-14_701_8751021191431391338/-ext-10002/ * Is it part of data stored in hdfs or its just a

Re: need some Clarification about tmp folder(urgent)

2012-03-13 Thread Nitin Pawar
to my knowledge this is all temporary data. All the data related to your tables is stored on the location which you can get with desc formatted table_name this is a temporary hive storage place. If you kill the job in between, this data in /tmp/ is left as stale data On Tue, Mar 13, 2012 at

Re: Reduce the number of map/reduce jobs during join

2012-03-13 Thread Jagat
Hello Weidong Bian Did you see the following configuration properties in conf directory property namemapred.reduce.tasks/name value-1/value descriptionThe default number of reduce tasks per job. Typically set to a prime close to the number of available hosts. Ignored when

RE: order by date

2012-03-13 Thread Tucker, Matt
Hi Keith, We generally store date columns as a string in a similar format to ISO 8601 (-mm-dd hh:MM:ss). This way, when we put the date column in the ORDER BY clause, it will be sorted chronologically. It also saves us the trouble of whipping out a unix timestamp calculator to figure out

Reduce the number of map/reduce jobs during join

2012-03-13 Thread Bruce Bian
Yes,it's in my hive-default.xml and Hive figured to use one reducer only, so I thought increase it to 5 might help,which doesn't. Anyway, to scan the largest table 6 times isn't efficient hence my question. On Wed, Mar 14, 2012 at 12:37 AM, Jagat jagatsi...@gmail.com wrote: Hello Weidong Bian

Re: order by date

2012-03-13 Thread Keith Wiley
Is see, you store the date-time as a lexicographically sortable string. That's fine, but I'm operating on existing csv tables. I guess I could whip up a hadoop job to convert all the date-time columns to lexicographic strings and then wrap hive around the resulting converted tables. I was

Re: non-equality joins

2012-03-13 Thread mahsa mofidpoor
Hi Keith, Do you know exactly how an algorithm should be in order to fit in the MapReduce framework? Could you refer me to some references? Thanks and Regards, Mahsa On Tue, Mar 13, 2012 at 12:49 PM, Keith Wiley kwi...@keithwiley.com wrote:

How to import extremely wide csv tables

2012-03-13 Thread Keith Wiley
Wrapping hive around existing csv files consists of manually naming and typing every column during the creation command. I have several csv tables and some of them have a ton of columns. I would love a way to create hive tables which automatically infers the column types by attempting various

RE: non-equality joins

2012-03-13 Thread Tucker, Matt
For theta joins, you'll have to convert the query to an equi-join, and then filter for non-equality in the WHERE clause. Depending upon the size of each table, you might consider looking at map-side joins, which will allow for doing non-equality filters during a join before it's passed to the

Re: Reduce the number of map/reduce jobs during join

2012-03-13 Thread shule ney
Do the joins share the same key? 2012/3/13 Bruce Bian weidong@gmail.com Yes,it's in my hive-default.xml and Hive figured to use one reducer only, so I thought increase it to 5 might help,which doesn't. Anyway, to scan the largest table 6 times isn't efficient hence my question. On

Re: order by date

2012-03-13 Thread Mark Grover
Hi Keith, You should also consider writing you own UDF that takes in the date in American format and spits out a lexicographical string. That way you don't have to modify your base data, just use this newly created from_american_date(String date) UDF to get your new date string in Mark Grover,

Re: How to import extremely wide csv tables

2012-03-13 Thread Edward Capriolo
You could do something like that. However you can structure the table as: CREATE TABLE X ( MapString,String stuff) CREATE TABLE X ( ListString stuff) You can then define a viww over these structures that allow you to cherry pick the fields you want. Edward On Tue, Mar 13, 2012 at 1:03 PM, Keith

RE: order by date

2012-03-13 Thread Tucker, Matt
If you don't want to modify your CSV files, I would suggest doing the conversion as part of the query. For that, you can either include the conversion in each query, or you can create a view of your table that includes a column with the converted date. Either way, you may want to try

order by having no effect?!

2012-03-13 Thread Keith Wiley
Um, this is weird. It simply isn't modifying the order of the returned rows at all. I get the same result with no 'order by' clause as with one. Adding a limit or specifying 'asc' has no effect. Using 'sort by' also has no effect. The column used for ordering is type INT. In the example

Re: order by having no effect?!

2012-03-13 Thread Igor Tatarinov
You have attributevalue in quotes which makes it a constant literal. igor decide.com On Tue, Mar 13, 2012 at 1:54 PM, Keith Wiley kwi...@keithwiley.com wrote: Um, this is weird. It simply isn't modifying the order of the returned rows at all. I get the same result with no 'order by' clause

Re: order by having no effect?!

2012-03-13 Thread Edward Capriolo
This syntax is wrong for both hive and SQL: hive select * from stringmap where attributename='foo' order by 'attributevalue'; This is right. hive select * from stringmap where attributename='foo' order by attributevalue; On Tue, Mar 13, 2012 at 4:54 PM, Keith Wiley kwi...@keithwiley.com wrote:

Re: order by having no effect?!

2012-03-13 Thread Keith Wiley
On Mar 13, 2012, at 13:57 , Igor Tatarinov wrote: You have attributevalue in quotes which makes it a constant literal. igor decide.com Argh! You are correct good sir! thanks Keith Wiley

csv boolean type

2012-03-13 Thread Keith Wiley
What string values in a csv field are parsable by Hive as booleans? If I indicate that a column is of type boolean when wrapping an external table around a csv file, what are the legal values? I can imagine numerous possibilities, for example (for the true values): 0 t T true True TRUE y Y

Re: csv boolean type

2012-03-13 Thread Keith Wiley
I obviously intended '1', not '0' as an example of a true value. Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com The easy confidence with which I know another man's religion is folly