Considering that the records only differ by one column i.e if the first two
columns are are unique (distinct), then you simply use group by with max as
aggregation function to eliminate duplicates i,e
select cno, sqno, max (date)
from table
group by cno, sqno
If the above assumption is not true i
Hello,
I am using some legacy binaries as streaming in Hive.
These binaries are dependent on libraries which are installed on all the
nodes of the cluster under /user/project_name/lib
The env variable I want to set is LD_LIBRARY_PATH.
Something like LD_LIBRARY_PATH=/user/project_name/lib
I tried
the overhead is relatively small (reading 15MB per mapper is
> negligible compared to several GB of processed data).
>
> Best regards,
> Jan
>
>
> On Wed, Apr 3, 2013 at 10:35 PM, vivek thakre wrote:
>
>> Hello,
>>
>> I want to write a functionality using UDT
Hello,
I want to write a functionality using UDTF. The functionality involves
reading 7 different text files and create lookup structures such as Map,
Set, List , Map of String and List etc to be used in the logic.
These files are small size average 15 MB.
I can add these files in distributed ca
Hello,
I have a table with userid, movieId and some more columns say c1, c2, c3
I want to group the records by userId and then do some processing on those
records (for each user) and
output less number of records (or same number of records) based on some
logic.
The processing involves conside