Hi. I put the dfs-dir-aware to true, but the performance wasn't the expected. So for test purposes, i let it with resource.disks About the hdfs space cleaning, which directories can i delete from my hadoop? Like, is there a problem if i delete the query detail? Can i delete another folder? Thanks
2015-10-09 10:15 GMT-05:00 Jihoon Son <[email protected]>: > Hi Odin, yes you can make your query faster. > > First of all, you can increase disk resource for tajo workers by setting ' > *tajo.worker.resource.**disks*'. This disk resource is related to the > number of tasks which are executed in parallel. A high disk resource > increases the number of tasks which are executed in parallel. For example, > given 10 tasks each of which reads data from hdfs, a tajo worker will > execute those tasks one by one. With a disk resource of 2, two tasks can be > executed simultaneously. So, it can improve the performance. > However, as you may know, if too many tasks access a single disk at the > same time, there will be a lot of random accesses which make the query > performance worse. > So, I recommend to use the real number of physical disks for this > configuration. Or, if you already configured multiple disks for hdfs, tajo > can automatically detect it and use for tajo worker's disk resource by > setting '*tajo.worker.resource.dfs-dir-aware*' as true. Please refer to > http://tajo.apache.org/docs/devel/configuration/worker_configuration.html > for more information. > After changing configuration values, you need to restart your tajo cluster. > > In addition, I *strongly recommend* to enable ' > *dfs.datanode.hdfs-blocks-metadata.enabled*' for your HDFS. With this > configuration, tajo can achieve higher data locality when assigning its > tasks to workers. This will improve tajo's performance significantly. You > need to restart your hdfs after configuring this, too. > > Best regards, > Jihoon > > 2015년 10월 9일 (금) 오후 11:43, Odin Guillermo Caudillo Gallegos < > [email protected]>님이 작성: > >> Hi. >> I did a select count from a hdfs wich returns me a total record of almost >> 17 million. >> The count was done in 2 minutes. >> I have the current config for the worker: >> >> <property> >> <name>tajo.worker.resource.memory-mb</name> >> <value>4096</value> >> <description>Available memory size (MB)</description> >> </property> >> >> <property> >> <name>tajo.worker.resource.disks</name> >> <value>1</value> >> <description>Available disk capacity (usually number of >> disks)</description> >> </property> >> >> <property> >> <name>tajo.worker.tmpdir.locations</name> >> >> <value>/tmp/tajo-11/tmpdir,/tmp/tajo-11/tmpdir1,/tmp/tajo-11/tmpdir2</value> >> <description>A base for other temporary directories.</description> >> </property> >> >> Is there anyway to give the query more power to make it faster? >> Do i need to do another configuration? >> >>
