Cool - thanks Dmitriy!
On Jun 15, 2011, at 12:54 PM, Dmitriy Ryaboy wrote:
> Another tip:
> If you parametrize your load statements, it becomes easy to switch
> between loading from something like Cassandra, and reading from HDFS
> or local fs directly.
>
> Also:
> Try using Pig's "illustrate" c
Another tip:
If you parametrize your load statements, it becomes easy to switch
between loading from something like Cassandra, and reading from HDFS
or local fs directly.
Also:
Try using Pig's "illustrate" command when working through your flows
-- it does some clever things that go far beyond sim
We started doing this recently and thought it might be useful to others.
Pig (and Hive) have a sample function that allows you to sample data from your
data store.
In pig it looks something like this:
mysample = SAMPLE myrelation 0.01;
One possible use for this, with pig and cassandra is to sol