Re: Optimal Config for the worker

Odin Guillermo Caudillo Gallegos Fri, 09 Oct 2015 08:52:02 -0700

Hi.
I put the dfs-dir-aware to true, but the performance wasn't the expected.
So for test purposes, i let it with resource.disks
About the hdfs space cleaning, which directories can i delete from my
hadoop?
Like, is there a problem if i delete the query detail? Can i delete another
folder?
Thanks


2015-10-09 10:15 GMT-05:00 Jihoon Son <[email protected]>:

> Hi Odin, yes you can make your query faster.
>
> First of all, you can increase disk resource for tajo workers by setting '
> *tajo.worker.resource.**disks*'. This disk resource is related to the
> number of tasks which are executed in parallel. A high disk resource
> increases the number of tasks which are executed in parallel. For example,
> given 10 tasks each of which reads data from hdfs, a tajo worker will
> execute those tasks one by one. With a disk resource of 2, two tasks can be
> executed simultaneously. So, it can improve the performance.
> However, as you may know, if too many tasks access a single disk at the
> same time, there will be a lot of random accesses which make the query
> performance worse.
> So, I recommend to use the real number of physical disks for this
> configuration. Or, if you already configured multiple disks for hdfs, tajo
> can automatically detect it and use for tajo worker's disk resource by
> setting '*tajo.worker.resource.dfs-dir-aware*' as true. Please refer to
> http://tajo.apache.org/docs/devel/configuration/worker_configuration.html
> for more information.
> After changing configuration values, you need to restart your tajo cluster.
>
> In addition, I *strongly recommend* to enable '
> *dfs.datanode.hdfs-blocks-metadata.enabled*' for your HDFS. With this
> configuration, tajo can achieve higher data locality when assigning its
> tasks to workers. This will improve tajo's performance significantly. You
> need to restart your hdfs after configuring this, too.
>
> Best regards,
> Jihoon
>
> 2015년 10월 9일 (금) 오후 11:43, Odin Guillermo Caudillo Gallegos <
> [email protected]>님이 작성:
>
>> Hi.
>> I did a select count from a hdfs wich returns me a total record of almost
>> 17 million.
>> The count was done in 2 minutes.
>> I have the current config for the worker:
>>
>> <property>
>>   <name>tajo.worker.resource.memory-mb</name>
>>   <value>4096</value>
>>   <description>Available memory size (MB)</description>
>> </property>
>>
>> <property>
>>   <name>tajo.worker.resource.disks</name>
>>   <value>1</value>
>>   <description>Available disk capacity (usually number of
>> disks)</description>
>> </property>
>>
>> <property>
>>   <name>tajo.worker.tmpdir.locations</name>
>>
>> <value>/tmp/tajo-11/tmpdir,/tmp/tajo-11/tmpdir1,/tmp/tajo-11/tmpdir2</value>
>>   <description>A base for other temporary directories.</description>
>> </property>
>>
>> Is there anyway to give the query more power to make it faster?
>> Do i need to do another configuration?
>>
>>

Re: Optimal Config for the worker

Reply via email to