On Fri, May 16, 2014 at 6:04 PM, Corey Nolet wrote:
> What's the expected size of your unique key set? Thousands? Millions?
> Billions?
>
> You could probably use a table structure similar to
> https://github.com/calrissian/accumulo-recipes/tree/master/store/metrics-storebut
> just have it emit
I have only 1.5.0. Perhaps I need to expend the effort to upgrade. Time
being precious I've been procrastinating.
On Fri, May 16, 2014 at 11:59 AM, Josh Elser wrote:
> On 5/16/14, 10:38 AM, David Medinets wrote:
>
>> I tried both of the following ways:
>>
>> scan -c :name
>>
>
> This worked for
woops, sorry for the empty response, but I'm new to E-mail. The bitset
within HLL supports union and intersection. You should be able to estimate
cardinality without re-reading the data. In effect, you can segment your
estimation and minimize error < about 2%.
Union is straightforward, whereas int
What's the expected size of your unique key set? Thousands? Millions?
Billions?
You could probably use a table structure similar to
https://github.com/calrissian/accumulo-recipes/tree/master/store/metrics-storebut
just have it emit 1's instead of summing them.
I'm thinking maybe your mappings cou
Thanks, Josh. I'll take a look through the Hadoop web UI.
-Russ
On Fri, May 16, 2014 at 1:37 PM, Josh Elser wrote:
> Hi Russ,
>
> I believe that the AccumuloInputFormat will use the splits on the table
> you're reading to generate the MR InputSplits. The InputFormat should be
> trying to run th
Has the table been compacted since loading the data?
Hi Russ,
I believe that the AccumuloInputFormat will use the splits on the table
you're reading to generate the MR InputSplits. The InputFormat should be
trying to run the Mappers on the same machine as the tserver serving the
data is located.
Hi, quick question,
I’m attempting to optimize the ingest rates for a document-partitioned table. I
am currently presplitting the tables and have even spread of data across tablet
servers. However, I was wondering if changing the size of mutations would have
a major impact on the ingest rates.
Yes, the data has not yet been ingested. I can control the table structure;
hopefully by integrating (or extending) the D4M schema.
I'm leaning towards using https://github.com/addthis/stream-lib as part of
the ingest process. Upon start up, existing tables would be analyzed to
find cardinality. T
Hi Russ,
I believe that the AccumuloInputFormat will use the splits on the table
you're reading to generate the MR InputSplits. The InputFormat should be
trying to run the Mappers on the same machine as the tserver serving the
data is located.
If you're only getting a few mappers, adding mor
Can we assume this data has not yet been ingested? Do you have control over
the way in which you structure your table?
On Fri, May 16, 2014 at 1:54 PM, David Medinets wrote:
> If I have the following simple set of data:
>
> NAME John
> NAME Jake
> NAME John
> NAME Mary
>
> I want to end up with
On 5/16/14, 10:38 AM, David Medinets wrote:
I tried both of the following ways:
scan -c :name
This worked for me with 1.6.0. Does it fail with 1.5.1?
scan -c "":name
Neither worked. Is there a way?
Josh, this morning I woke up and remembered that I wrote
http://affy.blogspot.com/2012/11/how-can-i-use-reverse-sort-on-generic.html
about 18 months ago. I can easily add a reverse index in order to
extend
the D4M schema.
I'm glad to see that reverse scanning is possible in HBase.
On Thu, May 15
Hi, folks,
When I execute an MR job with AccumuloInputFormat, are there any guarantees
about which mappers process which rows? I'm trying to minimize crosstalk in
my cluster but either I haven't split my table properly or I'm expecting
too much, because I'm only seeing 1 or 2 nodes running MR task
Yes. It will be less useful if you can't scan only the newest data, as
you'll be recombining the same pieces of data on subsequent runs.
On Fri, May 16, 2014 at 1:54 PM, David Medinets wrote:
> If I have the following simple set of data:
>
> NAME John
> NAME Jake
> NAME John
> NAME Mary
>
> I wa
Reverse scanning isn't necessarily infeasible:
https://issues.apache.org/jira/browse/HBASE-4811
This might be something cool that could be implemented to make this sort
of thing easiser.
The pagination isolation you mention in Approach B is interesting. I'm
curious as to how clone'ing tables
If I have the following simple set of data:
NAME John
NAME Jake
NAME John
NAME Mary
I want to end up with the following:
NAME 3
I'm thinking that perhaps a HyperLogLog approach should work. See
http://en.wikipedia.org/wiki/HyperLogLog for more information.
Has anyone done this before in Accumu
I tried both of the following ways:
scan -c :name
scan -c "":name
Neither worked. Is there a way?
Just to be clear, if the Warning message eventually goes away (should be
within seconds, maybe minutes), then it's probably just asynch delay.
If a substantial time later you're still getting the warning, that's
probably a sign that the tracing is used wrong (opened and not closed as
Eric said
18 matches
Mail list logo