> I have a idea to put temp table in tachyon to speed query. > I found the jira which is similar to my idea. > https://issues.apache.org/jira/browse/HIVE-7313
It is an easy thing from a logical stand point - temporary tables can be hosted in any hadoop fs location. create temporary table x(x int) location 'tachyon://tmp/_tmp.db/x'; But tachyon is inherently unreliable with an LRU eviction policy which removes blocks from Tachyon (i.e final tier eviction). You have to figure out a way to recompute part of a temp-table when Tachyon throws away a block. Or are you going to pin everything into memory and potentially fill it with junk temp-tables, which is probably a bad idea. The patch you're looking at is only half of what Hive does. The HDFS in-mem implementation massively improves the write throughput (since we can write faster than any disk into it), this is flushed to disk as a 2nd replica in a few seconds asynchronously. The LLAP in-mem implementation handles the life cycle of the table in-memory while it's being processed, caching only a fraction of the table (like only 1 column) into memory or just caching the ORC bloom-filter indexes into memory instead of the whole file. A filter clause is evaluated against this bloom filter before it produces a cache miss - if the bloom filter says "don't read this", it just skips the data read entirely. When LLAP evicts, it is only evicting the 3rd replica of the data-set, not the source of truth (which is the disk replica, which has checksums). Cheers, Gopal
