I came across this a few weeks ago. II a nutshell you can use it for generating test data and other scenarios where you need realistic-looking but not necessarily real data. With so many regulations and copyrights etc it is a viable alternative. I used it to generate 1000 lines of mixed true and fraudulent transactions to build a machine learning model to detect fraudulent transactions using PySpark's MLlib library. You can install it via pip install Faker
Details from https://github.com/joke2k/faker HTH Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh Disclaimer: The information provided is correct to the best of my knowledge but of course cannot be guaranteed . It is essential to note that, as with any advice, quote "one test result is worth one-thousand expert opinions (Werner Von Braun)". --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org