Ryan,

Thank you!  Our Amper framework is built around Hive so the Hive Windowing and 
Analytics functionality fits very well.  I had used T-SQL ROW_NUMBER function 
previously but had no idea of the full power of the SQL windowing function 
before you pointed me to the Hive documentation.  We have only one small gap to 
implement sessionization which was the need to persist memory of the last 
SessionID assigned when we're stitching records together, which we're in the 
process of testing now using a "Persist" UDF.

Again, thanks for the pointers, they were right on point!

J. B. Rawlings
Senior Consultant
C: 425.233.1315
www.societyconsulting.com<http://www.societyconsulting.com/>

From: Ryan Harris [mailto:ryan.har...@zionsbancorp.com]
Sent: Friday, February 5, 2016 12:58 PM
To: user@hive.apache.org
Subject: RE: Sessionize using Hive

I don't have a textbook example to point you to, but you should be able to 
handle the problem either using:
a) a UDF
b) an external TRANSFORM script in a language of your choosing
c) using Hive Windowing and Analytics functions (Lead/Lag, over, etc) 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics

All depending on the version of hive you are using as well as your programming 
language preferences.

From: JB Rawlings [mailto:jrawli...@societyconsulting.com]
Sent: Friday, February 05, 2016 1:53 PM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: RE: Sessionize using Hive

Ryan,

Can you perhaps point me to example(s) of how this is done in Hive?

Thanks,

J. B. Rawlings
Senior Consultant
C: 425.233.1315
www.societyconsulting.com<http://www.societyconsulting.com/>

From: Ryan Harris [mailto:ryan.har...@zionsbancorp.com]
Sent: Monday, February 1, 2016 6:19 PM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: RE: Sessionize using Hive

it can be done in hive...whether or not it is the "best choice" depends on 
whether or not you have any other reason for your data to be in hive.
If you are wondering whether Hive is the best tool for accomplishing this one 
task....it would probably be easier to do in pig.

From: JB Rawlings [mailto:jrawli...@societyconsulting.com]
Sent: Monday, February 01, 2016 7:11 PM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: Sessionize using Hive

We are considering whether Hive is the best choice for "sessionizing" a set of 
data given the following parameters:

*         Input data set:  A series of records with userID, startTimstamp, 
EndTimestamp, recordType, etc.

*         Output data set:  Same records (no aggregation) with an added 
SessionId based on time difference between endTime of previous record and 
startTime of current record plus satisfying other criteria of the type 
current.recordType = previousRecordType.  As long as a series of records meet 
the criteria for sessionization they would all have the same SessionId appended 
to each record.

Briefly based on my analysis it appears that this problem would be better 
suited to MapReduce using Java, but would be interested in hearing from those 
with more experience in this area.

J. B. Rawlings

________________________________
THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS CONFIDENTIAL 
and may contain information that is privileged and exempt from disclosure under 
applicable law. If you are neither the intended recipient nor responsible for 
delivering the message to the intended recipient, please note that any 
dissemination, distribution, copying or the taking of any action in reliance 
upon the message is strictly prohibited. If you have received this 
communication in error, please notify the sender immediately. Thank you.
________________________________
THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS CONFIDENTIAL 
and may contain information that is privileged and exempt from disclosure under 
applicable law. If you are neither the intended recipient nor responsible for 
delivering the message to the intended recipient, please note that any 
dissemination, distribution, copying or the taking of any action in reliance 
upon the message is strictly prohibited. If you have received this 
communication in error, please notify the sender immediately. Thank you.

Reply via email to