If you exported tables from 4.8 and importing them to the preexisting
tables in 4.12, make sure that you created tables using COLUMN_ENCODED_BYTES
= 0 or have phoenix.default.column.encoded.bytes.attrib set to 0 in
hbase-site.xml.
I believe that the problem you see is the column name encoding that
Forgot to mention. That kind of problems can be mitigated by increasing the
number of threads for open regions. By default, it's 3 (?), but we haven't
seen any problems with increasing it up to several hundred for clusters
that have up to 2k regions per RS.
Thanks,
Sergey
On Fri, Sep 14, 2018 at 4
That was the real problem quite a long time ago (couple years?). Can't say
for sure in which version that was fixed, but now indexes has a priority
over regular tables and their regions open first. So by the moment when we
replay WALs for tables, all index regions are supposed to be online. If you
Thomas is absolutely right that there will be a possibility of hotspotting.
Salting is the mechanism that should prevent that in all cases (because all
rowids are different). The partitioning described above actually can be
implemented by using id2 as a first column of the PK and using presplit by
Yeah, I think that's his point :)
For a fine-grained facet, the hotspotting is desirable to co-locate the
data for query. To try to make an example to drive this point home:
Consider a primary key constraint(col1, col2, col3, col4);
If I defined the SALT_HASH based on "col1" alone, you'd get
Hi, I am attempting to make connection with Spark but no success so far.
For writing into Phoenix, I am trying this:
tdd.toDF("ID", "COL1", "COL2",
"COL3").write.format("org.apache.phoenix.spark").option("zkUrl",
"zookeper-host-url:2181").option("table",
htablename).mode("overwrite").save()
But