Re: Reading the whole table with MapReduce and Spark.

Guillermo Ortiz Fernández Tue, 28 May 2019 15:14:40 -0700

it depends of the row, they did only share 5% of the qualifiers names. Each
row could have about 500-3000 columns in 3 column families. One of them has
80% of the columns.


The table has around 75M of rows.

El mar., 28 may. 2019 a las 17:33, <s...@comcast.net> escribió:

> Guillermo
>
>
> How large is your table?   How many columns?
>
>
> Sincerely,
>
> Sean
>
> > On May 28, 2019 at 10:11 AM Guillermo Ortiz <konstt2...@gmail.com
> mailto:konstt2...@gmail.com > wrote:
> >
> >
> >     I have a doubt. When you process a Hbase table with MapReduce you
> could use
> >     the TableInputFormat, I understand that it goes directly to HDFS
> files
> >     (storesFiles in HDFS) , so you could do some filter in the map phase
> and
> >     it's not the same to go through to the region servers to do some
> massive
> >     queriesIt's possible to do the same using TableInputFormat with
> Spark and
> >     it's more efficient than use scan with filters and so on (again)
> when you
> >     want to do a massive query about all the table. Am I right?
> >
>

Re: Reading the whole table with MapReduce and Spark.

Reply via email to