Hi,

Hive indexes won¹t really help you speed up that query right now, because
of the plan it generates due to the <= clauses.

> CREATETABLE ipv4table
> AS
> SELECT logon.IP, ipv4.Country
> FROM
> (SELECT * FROM logontable WHERE isIpv4(IP)) logon
> LEFT OUTER JOIN
> (SELECT StartIp, EndIp, Country FROM ipv4geotable) ipv4 ON
> isIpv4(logon.IP) 
> WHERE ipv4.StartIp <=logon.IP AND logon.IP <= ipv4.EndIp;

That¹s a cross-product join, which can¹t be distributed at all & will take
forever, even if you use Tez/LLAP.

Range join queries have partial distribution rewrites, but AFAIK none of
them use filter indexes in hive.

But before I suggest a rewrite, can you post the ³explain <query>;² for
the above query, so that I can check if this is indeed producing a
cross-product + filter?

Cheers,
Gopal








Reply via email to