I believe I tried that, both in the return argument and the outer query. If
memory serves me, I got an error about the array index needing to be a
constant value.

I will try again when I get back to a computer.

Sent from my iPhone

On Aug 21, 2013, at 6:48 PM, Harish Butani <hbut...@hortonworks.com> wrote:

Can you try this:

select search_terms, productid, clicks_to_product from npath ( on clicks
                distributed by sessionid sort by timestamp
                arg1('SEARCH.NOTPRODUCT*.PRODUCT'),
                arg2('SEARCH'), arg3(page = 'SEARCH'),
                arg4('PRODUCT'), arg5(page = 'PRODUCT'),
                arg5('NOTPRODUCT'), arg5(page != 'PRODUCT'),
                arg6('search_terms,  (size(tpath)-1) as clicks_to_product,
tpath[size(tpath) -1].productid as productid')
                );


- added NOTPRODUCT to capture clicks between SEARCH and PRODUCT
- you don't need first_value for search_terms, because you are getting the
row back starting at which the Pattern matches.
- to get the last_value, i am hoping this works: tpath[size(tpath)
-1].productid


On Aug 21, 2013, at 12:25 PM, Justin Workman <justinjwork...@gmail.com>
wrote:

Assuming click stream type of data I want to get the search terms from the
first search request, and return the product id that was eventually viewed
and the number of clicks to the product. So something like this

select search_terms, productid, clicks_to_product from npath ( on clicks
                distributed by sessionid sort by timestamp
                arg1('SEARCH.PRODUCT'),
                arg2('SEARCH'), arg3(page = 'SEARCH'),
                arg4('PRODUCT'), arg5([age = 'PRODUCT'),
                arg6('first_value(search_terms) as search_terms,
last_value(productid) as productid, (size(tpath)-1) as clicks_to_product')
                );

>From what I have seen, I will get the search terms from the first search
without the first_value, however it would be nice to be able to use
first_value to guarantee that. I cannot get the productid from the last
tpath object using this. I did try and get the last_value(tpath.productid)
in the outer query, however that returned the productid ( and all nulls
leading up to the product viewed page) in the very tpath value for the very
last row returned from the inner npath select, eg not the last value for
the productid for that row. I can use tpath.productid in place of productid
in the outer query and it returns the nulls for each row in the current
tpath, upto the final product view.



CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.

Reply via email to