There should be a static setRanges(Configuration, Collection)
method somewhere in the type hierarchy of AccumuloInputFormat which lets
you specify the Range[s].
Not using the TimestampFilter (not being able to use the timestamp for
this filtering), you have two options to perform row-filtering
Josh,
Thanks. I was able to get TimestampFilter to works for my needs. But I
originally wanted "createdDate" as our application creates that date which
is known to the user
and may be different than accumulo timestamp due to when the data actually
got processed into accumulo.
So if I wanted to u
The TimestampFilter will return only the Keys whose timestamp fall in
the range you specify. The timestamp is an attribute on every Key, a
long value which, when not set by the client at write time, is the
number of millis since the epoch. You specify the numeric range of
timestamps you want. T
Exactly right, Vaibhav.
vaibhav thapliyal wrote:
I think neither of these would contribute much to load balancing. HDFS
replication is mostly a safeguard against Single Points of failure in a
Hadoop cluster. However, Data center replication would ensure the
availability of an Accumulo instance.
I think neither of these would contribute much to load balancing. HDFS
replication is mostly a safeguard against Single Points of failure in a
Hadoop cluster. However, Data center replication would ensure the
availability of an Accumulo instance.
On 16 October 2016 at 21:02, Yamini Joshi wrote:
In other words, what helps in load balancing? HDFS replication or Data
center replication?
Best regards,
Yamini Joshi
On Sat, Oct 15, 2016 at 10:44 PM, Yamini Joshi
wrote:
> So HDFS is for durability while replication is for availability? I'm
> assuming that the client is unaware of the replica