You can put the database files in a central location accessible to all the 
workers and build the GeoIP object once per-partition when you go to do a 
mapPartitions across your dataset, loading from the central location.


___


From: Filli Alem [alem.fi...@ti8m.ch]

Sent: Wednesday, July 29, 2015 12:04 PM

To: user@spark.apache.org

Subject: IP2Location within spark jobs










Hi,
 
I would like to use ip2Location databases during my spark jobs (MaxMind).
So far I haven’t found a way to properly serialize the database offered by the 
Java API of the database.

The CSV version isn’t easy to handle as it contains of multiple files.
 
Any recommendations on how to do this?
 
Thanks
Alem
 


















---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to