Spark Talks: Using MLib to Predict Popular Tweets & Using Zeppelin Notebooks

Felix Cheung Tue, 19 Jan 2016 14:29:26 -0800

FYI



    _____________________________


            

*Note, expedite your check in at Galvanize and register here 

Talk 1: Using Spark MLlib To Predict Most Popular Tweets 
Spark's Machine Learning Library (MLlib) enables running Machine Learning 
algorithms in a scalable way on massive datasets. In this talk we will use 
Spark and MLlib to analyze tweets and predict the number of stars and retweets 
that a tweet will get. The talk will include a tutorial on Spark and MLlib. 

Prerequisites: 
Beginner. Familiarity with a programming language will be helpful. 

What You'll Learn:  
After this talk you will be able to: 
1. Use Spark to process large data sets. 
2. Use Spark MLlib to apply Machine Learning algorithms to large data sets. 
3. Understand pros and cons of using Spark vs other Machine Learning 
technologies. 

What to Bring: 
Asim will share source code for the talk. Attendees can bring a laptop to 
download and try the demos on their own machines. 

Meet Your Speaker: 
Asim Jalis is a Lead Instructor in Data Engineering at Galvanize. He has worked 
as a software engineer and instructor at Cloudera, Microsoft, and 
Hewlett-Packard. He has an MS in computer science from the University of 
Virginia and an MA in mathematics from the University of Wisconsin—Madison. 


Talk 2: Using Zeppelin Notebooks for Spark Streaming and Live Monitoring 
We will discuss the rapidly evolving open source Zeppelin notebook project and 
how it can be used for data science applications, including those with 
streaming data. Zeppelin notebooks can use the scheduler functionality to 
update data and generate plots. This allows for live monitoring applications to 
be rapidly prototyped in a production environment. Simple end to end examples 
will be discussed. 

Prerequisites: 
Intermediate to Advanced. Some experience with Spark and Zeppelin (or notebooks 
in general) will be helpful. 

Meet Your Speaker:  
Jerome Nilmeier is a Data Scientist and Engineer at IBM in the Spark Enablement 
Team and the Spark Technology Center, where he works on all things Spark 
related. He contributes to open source, participates in community outreach, and 
works with clients on Spark in production environments. 

Prior to his journey into big data, he was a computational scientist at the 
Lawrence Livermore and Berkeley National Laboratories. He holds a PhD in 
Computational Biophysics from UC San Francisco, and a BS from UC Berkeley in 
Chemical Engineering. 

*Note, expedite your check in at Galvanize and register here

Spark Talks: Using MLib to Predict Popular Tweets & Using Zeppelin Notebooks

Reply via email to