Hi, this is a very common question, as many people knowing SQL are used to RDBMS like MySQL, Oracle, or SQL Server. The first thing you need to know about Hive is that, in the first place, it has not been designed to replace such databases. Not when they are used for transaction processing anyway.
This is stated in the Hive Tutorial, in the section "What Hive is NOT" https://cwiki.apache.org/confluence/display/Hive/Tutorial Transcationality is a feature that has been added afterwards, and as Jörn stated, it can be fast when used with TEZ + LLAP. In the logs you sent, you can see that Hive is running a MapReduce job to perform your task: MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 2.7 sec HDFS Read: 3836 HDFS Write: 81 SUCCESS Total MapReduce CPU Time Spent: 2 seconds 700 msec OK If you are familiar with MapReduce, you should now that however small your dataset is, what takes time here is: 1. Provisionning executors on YARN 2. Starting one JVM per mapper and reducer (here you only have 1 mapper and 0 reducer, as stated by the logs). The starting time of a JVM is generally a few seconds nowadays, except when to have to load 200Mb of Hadoop jars to start, in that case it takes around 10 seconds. 3. Perform the task (a few milliseconds in your case) 4. Persisting the results on HDFS (which requires a few hdfs operations and can take a few seconds to minutes if you write a lot of files, but in your case should be quick) The main improvement that a Tez LLAP or Spark backend will do is that the first to steps are already done and waiting for your queries to run. Hive as been designed in the first place to go faster than RDBMS in the cases where: - your data is too large to fit on a single instance, and sharding is painful - your jobs mostly consist in analytical processing, like full table aggregations In such case, the correct way to use Hive is by partitioning your table by day (if you run nightly batches) and generate a new partition every day. If you want to change something in your table (e.g. fix a bug), you just regenerate it. If you need fast response time for updating and fetching records, and scalability, perhaps you should look into HBase, Cassandra, or Kudu. On Mon, Sep 11, 2017 at 9:18 PM, Jörn Franke <jornfra...@gmail.com> wrote: > Why do you want to do single inserts? > It has been more designed for bulk loads. > In any case newer version of Hive 2 using TEZ +llap improve it > significantly (also for bulk analysis). Nevertheless, it is good practice > to not use single inserts in an analysis systems, but try to combine and > bulk-load them. > > On 11. Sep 2017, at 21:01, Jinhui Qin <qin.jin...@gmail.com> wrote: > > > > Hi, > > I am new to Hive. I just created a simple table in hive and inserted two > records, the first insertion took 16.4 sec, while the second took 14.3 sec. > Why is that very slow? is this the normal performance you get in Hive using > INSERT ? Is there a way to improve the performance of a single "insert" in > Hive? Any help would be really appreciated. Thanks! > > Here is the record from a terminal in Hive shell: > > ========================= > > hive> show tables; > OK > Time taken: 2.758 seconds > hive> create table people(id int, name string, age int); > OK > Time taken: 0.283 seconds > hive> insert into table people(1,'Tom A', 20); > Query ID = hive_20170911134052_04680c79-432a-43e0-827b-29a4212fbbc0 > Total jobs = 3 > Launching Job 1 out of 3 > Number of reduce tasks is set to 0 since there's no reduce operator > Starting Job = job_1505146047428_0098, Tracking URL = > http://iop-hadoop-bi.novalocal:8088/proxy/application_1505146047428_0098/ > Kill Command = /usr/iop/4.1.0.0/hadoop/bin/hadoop job -kill > job_1505146047428_0098 > Hadoop job information for Stage-1: number of mappers: 1; number of > reducers: 0 > 2017-09-11 13:41:01,492 Stage-1 map = 0%, reduce = 0% > 2017-09-11 13:41:06,940 Stage-1 map = 100%, reduce = 0%, Cumulative CPU > 2.7 sec > MapReduce Total cumulative CPU time: 2 seconds 700 msec > Ended Job = job_1505146047428_0098 > Stage-4 is selected by condition resolver. > Stage-3 is filtered out by condition resolver. > Stage-5 is filtered out by condition resolver. > Moving data to: hdfs://iop-hadoop-bi.novalocal:8020/apps/hive/warehouse/ > people/.hive-staging_hive_2017-09-11_13-40-52_106_462156758110461544 > 1-1/-ext-10000 > Loading data to table default.people > Table default.people stats: [numFiles=1, numRows=1, totalSize=11, > rawDataSize=10] > MapReduce Jobs Launched: > Stage-Stage-1: Map: 1 Cumulative CPU: 2.7 sec HDFS Read: 3836 HDFS > Write: 81 SUCCESS > Total MapReduce CPU Time Spent: 2 seconds 700 msec > OK > Time taken: 16.417 seconds > hive> insert into table people values(1,'Tom A', 20); > Query ID = hive_20170911134128_c8f46977-7718-4496-9a98-cce0f89ced79 > Total jobs = 3 > Launching Job 1 out of 3 > Number of reduce tasks is set to 0 since there's no reduce operator > Starting Job = job_1505146047428_0099, Tracking URL = > http://iop-hadoop-bi.novalocal:8088/proxy/application_1505146047428_0099/ > Kill Command = /usr/iop/4.1.0.0/hadoop/bin/hadoop job -kill > job_1505146047428_0099 > Hadoop job information for Stage-1: number of mappers: 1; number of > reducers: 0 > 2017-09-11 13:41:36,289 Stage-1 map = 0%, reduce = 0% > 2017-09-11 13:41:40,721 Stage-1 map = 100%, reduce = 0%, Cumulative CPU > 2.28 sec > MapReduce Total cumulative CPU time: 2 seconds 280 msec > Ended Job = job_1505146047428_0099 > Stage-4 is selected by condition resolver. > Stage-3 is filtered out by condition resolver. > Stage-5 is filtered out by condition resolver. > Moving data to: hdfs://iop-hadoop-bi.novalocal:8020/apps/hive/warehouse/ > people/.hive-staging_hive_2017-09-11_13-41-28_757_445847252207124056 > 7-1/-ext-10000 > Loading data to table default.people > Table default.people stats: [numFiles=2, numRows=2, totalSize=22, > rawDataSize=20] > MapReduce Jobs Launched: > Stage-Stage-1: Map: 1 Cumulative CPU: 2.28 sec HDFS Read: 3924 HDFS > Write: 81 SUCCESS > Total MapReduce CPU Time Spent: 2 seconds 280 msec > OK > Time taken: 14.288 seconds > hive> exit; > ================= > > > Jinhui > >