I'm trying to run through some examples from the Agile Data Science book, but I'm running into some pretty fundamental roadblocks. The book was written against Pig 0.11.1, but because I'm hard headed I'd rather start with a more modern stack.
Whether I read from an Avro collection or from MongoDB, the only exception I get is that Pig doesn't know what the schema is. I've attached the pig latin script for reference, but it's a pretty simple count of times one person emails another. I can run the equivalent map-reduce directly in MondoDB, but the goal here is to get the infrastructure set up so I can build on the simple foundation I have and experiment beyond the simple examples in the book. I also have Hadoop 2.6.0 installed, and I've had to fix a number of things in the DOS scripts just so that it could find and execute Java from the default install path. It's painfully obvious to me now that Pig 0.14.0 was not built against Hadoop 2.6.0, but I have no idea what it was built against. BTW, there's a number of things you'll have to change in the DOS scripts to even find the hadoop-config.cmd
