Hello, I took the liberty of creating a few Zeppelin notebooks which collect Apache project profiles and posts, normalize them to activity streams format, and interact with them using spark data frames, based on previous work by Trevor and myself.
The notebooks are currently hosted in my zeppelinhub account, which anyone with the links below can access. https://www.zeppelinhub.com/viewer/notebooks/bm90ZTovL3N0ZXZlYmxhY2ttb24vYXBhY2hlLXplcHBlbGluLWRhc2hib2FyZC84YjQ5YmY3MWIxYTU0ZTE2YjlkMDQyMTliMzNlMjQzYS9ub3RlLmpzb24 https://www.zeppelinhub.com/viewer/notebooks/bm90ZTovL3N0ZXZlYmxhY2ttb24vYXBhY2hlLXplcHBlbGluLWRhc2hib2FyZC9lNzQzZjRkZGVkMGY0YjA3YTkzZTQ2NWFkYjU2ZTQxOS9ub3RlLmpzb24 https://www.zeppelinhub.com/viewer/notebooks/bm90ZTovL3N0ZXZlYmxhY2ttb24vYXBhY2hlLXplcHBlbGluLWRhc2hib2FyZC8zZmQ3M2Y1OWEzOGE0YmM2YjFkMGM4MzBkNTczZDU0Mi9ub3RlLmpzb24 I’m planning to send these link over to dev@community - a discussion on that list during ApacheCon EU mentioned tracking social media metrics for apache projects as something the group is interested in. Helping them out could be a good way to increase awareness of Streams within Apache. I’d appreciate if this group would take a look first and let me know if you see anything that can be obviously and easily improved first. Definitely let me know if anything looks awry or misfires while you are perusing the notebooks Steve