it is ok but . I want to categorize the urls by sessions actually. *DATA:* (sorted by time) *(userid1_time, url1) * *(userid1_time2, url2)* *(userid1_time3, url3) * *(userid1_time4, url4)*
*RESULT: * *url1 *already added to* session1* *time2-time1 < 30 min *so* url2 *go to* session1* *time3-time2 > 30 min *so* url3 *goes to* session2* *time4-time3 <30 min *so *url4* goes to* session3* *(user1, [url1, url2] [url3,url4])* Does your solution fit my problem? 2015-12-25 12:23 GMT+02:00 Xingchi Wang <regrec...@gmail.com>: > map{case(x, y) => s = x.split("_"), (s(0), (s(1), > y)))}.groupByKey().filter{case (_, (a, b)) => abs(a._1, a._1) < 30min} > > does it work for you ? > > 2015-12-25 16:53 GMT+08:00 Yasemin Kaya <godo...@gmail.com>: > >> hi, >> >> I have struggled this data couple of days, i cant find solution. Could >> you help me? >> >> *DATA:* >> *(userid1_time, url) * >> *(userid1_time2, url2)* >> >> >> I want to get url which are in 30 min. >> >> *RESULT:* >> *If time2-time1<30 min* >> *(user1, [url1, url2] )* >> >> Best, >> yasemin >> -- >> hiç ender hiç >> > > -- hiç ender hiç