R is another programming language geared to statistical analysis. It
has some features that make it easy to write this sort of data
manipulation. http://www.r-project.org/
Here's a sample R function that does something similar to what you
need (note: I'm not a real R programmer, so this might be ugly, but it
works):
abCountsToMatrix <- function(csvFilename, outputmatrixfile){
data <- read.csv(csvFilename, as.is=T, check.names=F, header=F)
cat("Loaded", nrow(data), "rows.\n")
idrows <- sort(unique(c(data[ , 1])))
idcols <- sort(unique(c(data[ , 2])))
cat("Found", length(idrows), " X ", length(idcols), ".")
output <- array(0, c(length(idrows), length(idcols)))
rownames(output) <- idrows
colnames(output) <- idcols
for(i in 1:nrow(data)){
if (i %% 100 == 0) {
cat("On ", i, " of ", nrow(data), "\n")
}
rowidx <- which(idrows == data[i, 1])
colidx <- which(idcols == data[i, 2])
count <- data[i,3]
output[rowidx, colidx] <- count
}
write.csv(output, file=outputmatrixfile, quote=F)
}
On Wed, Aug 8, 2012 at 8:58 PM, <[email protected]> wrote:
> John,
>
> What is R?
>
> -----Original Message-----
> From: ext John Meagher [mailto:[email protected]]
> Sent: Wednesday, August 08, 2012 4:34 PM
> To: [email protected]
> Subject: Re: Converting rows into dynamic colums in Hive
>
> I don't think having dynamic columns is possible in Hive. I've always output
> from Hive a structure like your query output and used R to convert it into a
> dynamic column structure.
>
>
> On Wed, Aug 8, 2012 at 3:56 PM, <[email protected]> wrote:
>> Thanks Ashish, that gives an idea.
>>
>> But I am not sure about the outer select loop, I have to know all the
>> values in Beta column beforehand to do a max on each value.
>>
>> Is there a better way?
>>
>>
>>
>> Richin
>>
>>
>>
>> From: ext Ashish Thusoo [mailto:[email protected]]
>> Sent: Tuesday, August 07, 2012 5:05 PM
>> To: [email protected]
>> Subject: Re: Converting rows into dynamic colums in Hive
>>
>>
>>
>> you should be able to do this in hive using a group by on alpha and
>> then using a combination of the max and if statement... something on
>> the following lines
>>
>> select alpha, max(abc), max(pqr), ...
>> (
>> select alpha, if (beta == 'abc', Gamma, NULL) as abc, if (beta ==
>> 'pqr', Gamma, NUL) as pqr, ....
>> from table
>> )
>> group by alpha
>>
>> something on those lines...
>>
>> Ashish
>>
>> On Tue, Aug 7, 2012 at 1:57 PM, <[email protected]> wrote:
>>
>> Hi All,
>>
>>
>>
>> One of my Query output looks like-
>>
>>
>>
>> Alpha Beta Gamma
>>
>> 123 xyz 1.0
>>
>> 123 abc 0.5
>>
>> 123 pqr 1.3
>>
>> 123 def 2.1
>>
>> 456 xyz 0.1
>>
>> 456 abc 0.6
>>
>> 456 pqr 1.9
>>
>> 456 def 3.2
>>
>> 456 lmn 1.1
>>
>> 456 sdf 1.2
>>
>>
>>
>> I want the output for the data visualization purpose to look like
>> (basically taking rows from one table and making it column in another
>> table)
>>
>>
>>
>> Alpha xyz abc pqr def lmn
>> sdf
>>
>> 123 1.0 0.5 1.3 2.1
>>
>> 456 0.1 0.6 1.9 3.2
>> 1.1 1.2
>>
>>
>>
>> Question - Can it be done in Hive? If not, any suggestions.
>>
>>
>>
>> Thanks,
>>
>> Richin
>>
>>
>>
>>
>>
>>