Re: [Analytics] New fields in wmf.webrequest hive table

2015-04-10 Thread Yuri Astrakhan
Please clarify why the field "is_zero" is needed, as it is nothing more than a test for ("zero=" in x_analytics). Does having this field significantly improve performance for zero queries, e.g. "select count(*) from requests where iszero = true" ? Because otherwise it simply identifies "zero partne

Re: [Analytics] New fields in wmf.webrequest hive table

2015-04-10 Thread Oliver Keyes
Cool! On 10 April 2015 at 17:12, Joseph Allemandou wrote: > Yes Oliver, the agent_type = spider includes IsCrawler UDF. > > On Fri, Apr 10, 2015 at 11:08 PM, Oliver Keyes wrote: >> >> What does agent-type add? In the sense that if we're pre-parsing the >> user agent, surely the difference is bet

Re: [Analytics] New fields in wmf.webrequest hive table

2015-04-10 Thread Joseph Allemandou
Yes Oliver, the agent_type = spider includes IsCrawler UDF. On Fri, Apr 10, 2015 at 11:08 PM, Oliver Keyes wrote: > What does agent-type add? In the sense that if we're pre-parsing the > user agent, surely the difference is between "WHERE agent_type != > 'spider'" and "WHERE user_agent_map['devi

Re: [Analytics] New fields in wmf.webrequest hive table

2015-04-10 Thread Oliver Keyes
What does agent-type add? In the sense that if we're pre-parsing the user agent, surely the difference is between "WHERE agent_type != 'spider'" and "WHERE user_agent_map['device_family'] != 'Spider'"? Does agent_type include the isCrawler UDF results? On 10 April 2015 at 16:47, Joseph Allemandou

Re: [Analytics] New fields in wmf.webrequest hive table

2015-04-10 Thread Joseph Allemandou
And I forgot one field : - is_zero - True if a request is made on a zero provider. On Fri, Apr 10, 2015 at 10:36 PM, Leila Zia wrote: > Hi Joseph, > >Thanks for the update, and for doing this. These three items make the > analysis of the data much easier on our end. We've had many reque

Re: [Analytics] New fields in wmf.webrequest hive table

2015-04-10 Thread Leila Zia
Hi Joseph, Thanks for the update, and for doing this. These three items make the analysis of the data much easier on our end. We've had many requests in the past that required agent_type and access_method information and having them readily available is awesome! :-) Have a great weekend! Leil

[Analytics] New fields in wmf.webrequest hive table

2015-04-10 Thread Joseph Allemandou
Hi Analytics people, Today happens another bunch of addition to the refined webrequest table in hive. Now the table contains: - ts - The unix timestamp (milliseconds) version of the dt date - access_method - The method used to access the site, being one of the three [mobile app | mobile