Re: Fwd: Extacting ALL Data using multiple java processes

2016-10-16 Thread Josh Elser
There should be a static setRanges(Configuration, Collection) method somewhere in the type hierarchy of AccumuloInputFormat which lets you specify the Range[s]. Not using the TimestampFilter (not being able to use the timestamp for this filtering), you have two options to perform row-filtering

Re: Fwd: Extacting ALL Data using multiple java processes

2016-10-16 Thread Bob Cook
Josh, Thanks. I was able to get TimestampFilter to works for my needs. But I originally wanted "createdDate" as our application creates that date which is known to the user and may be different than accumulo timestamp due to when the data actually got processed into accumulo. So if I wanted to u

Re: Fwd: Extacting ALL Data using multiple java processes

2016-10-16 Thread Josh Elser
The TimestampFilter will return only the Keys whose timestamp fall in the range you specify. The timestamp is an attribute on every Key, a long value which, when not set by the client at write time, is the number of millis since the epoch. You specify the numeric range of timestamps you want. T

Re: Data Replication

2016-10-16 Thread Josh Elser
Exactly right, Vaibhav. vaibhav thapliyal wrote: I think neither of these would contribute much to load balancing. HDFS replication is mostly a safeguard against Single Points of failure in a Hadoop cluster. However, Data center replication would ensure the availability of an Accumulo instance.

Re: Data Replication

2016-10-16 Thread vaibhav thapliyal
I think neither of these would contribute much to load balancing. HDFS replication is mostly a safeguard against Single Points of failure in a Hadoop cluster. However, Data center replication would ensure the availability of an Accumulo instance. On 16 October 2016 at 21:02, Yamini Joshi wrote:

Re: Data Replication

2016-10-16 Thread Yamini Joshi
In other words, what helps in load balancing? HDFS replication or Data center replication? Best regards, Yamini Joshi On Sat, Oct 15, 2016 at 10:44 PM, Yamini Joshi wrote: > So HDFS is for durability while replication is for availability? I'm > assuming that the client is unaware of the replica