Hi Odin, yes you can make your query faster.

First of all, you can increase disk resource for tajo workers by setting '
*tajo.worker.resource.**disks*'. This disk resource is related to the
number of tasks which are executed in parallel. A high disk resource
increases the number of tasks which are executed in parallel. For example,
given 10 tasks each of which reads data from hdfs, a tajo worker will
execute those tasks one by one. With a disk resource of 2, two tasks can be
executed simultaneously. So, it can improve the performance.
However, as you may know, if too many tasks access a single disk at the
same time, there will be a lot of random accesses which make the query
performance worse.
So, I recommend to use the real number of physical disks for this
configuration. Or, if you already configured multiple disks for hdfs, tajo
can automatically detect it and use for tajo worker's disk resource by
setting '*tajo.worker.resource.dfs-dir-aware*' as true. Please refer to
http://tajo.apache.org/docs/devel/configuration/worker_configuration.html
for more information.
After changing configuration values, you need to restart your tajo cluster.

In addition, I *strongly recommend* to enable '
*dfs.datanode.hdfs-blocks-metadata.enabled*' for your HDFS. With this
configuration, tajo can achieve higher data locality when assigning its
tasks to workers. This will improve tajo's performance significantly. You
need to restart your hdfs after configuring this, too.

Best regards,
Jihoon

2015년 10월 9일 (금) 오후 11:43, Odin Guillermo Caudillo Gallegos <
[email protected]>님이 작성:

> Hi.
> I did a select count from a hdfs wich returns me a total record of almost
> 17 million.
> The count was done in 2 minutes.
> I have the current config for the worker:
>
> <property>
>   <name>tajo.worker.resource.memory-mb</name>
>   <value>4096</value>
>   <description>Available memory size (MB)</description>
> </property>
>
> <property>
>   <name>tajo.worker.resource.disks</name>
>   <value>1</value>
>   <description>Available disk capacity (usually number of
> disks)</description>
> </property>
>
> <property>
>   <name>tajo.worker.tmpdir.locations</name>
>
> <value>/tmp/tajo-11/tmpdir,/tmp/tajo-11/tmpdir1,/tmp/tajo-11/tmpdir2</value>
>   <description>A base for other temporary directories.</description>
> </property>
>
> Is there anyway to give the query more power to make it faster?
> Do i need to do another configuration?
>
>

Reply via email to