This is an automated email from the ASF dual-hosted git repository. tmarshall pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git
commit 5b32a0d60110be7c21184819c2dffbb7cbff750f Author: Alex Rodoni <arod...@cloudera.com> AuthorDate: Tue Feb 12 12:40:42 2019 -0800 IMPALA-7214: [DOCS] More on decoupling impala and DataNodes Change-Id: I4b6f1c704c1e328af9f0beec73f8b6b61fba992e Reviewed-on: http://gerrit.cloudera.org:8080/12457 Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Reviewed-by: Tim Armstrong <tarmstr...@cloudera.com> --- docs/topics/impala_processes.xml | 10 +++------ docs/topics/impala_troubleshooting.xml | 39 +++++++++++++++++----------------- 2 files changed, 23 insertions(+), 26 deletions(-) diff --git a/docs/topics/impala_processes.xml b/docs/topics/impala_processes.xml index 71986d3..70366dd 100644 --- a/docs/topics/impala_processes.xml +++ b/docs/topics/impala_processes.xml @@ -55,10 +55,7 @@ under the License. Start one instance of the Impala catalog service. </li> - <li> - Start the main Impala service on one or more DataNodes, ideally on all DataNodes to maximize local - processing and avoid network traffic due to remote reads. - </li> + <li> Start the main Impala daemon services. </li> </ol> <p> @@ -101,9 +98,8 @@ under the License. <codeblock rev="1.2">$ sudo service impala-catalog start</codeblock> - <p> - Start the Impala service on each DataNode using a command similar to the following: - </p> + <p> Start the Impala daemon services using a command similar to the + following: </p> <p> <codeblock>$ sudo service impala-server start</codeblock> diff --git a/docs/topics/impala_troubleshooting.xml b/docs/topics/impala_troubleshooting.xml index 250c899..80b7363 100644 --- a/docs/topics/impala_troubleshooting.xml +++ b/docs/topics/impala_troubleshooting.xml @@ -123,17 +123,17 @@ terminate called after throwing an instance of 'boost::exception_detail::clone_i <concept id="trouble_io" rev=""> <title>Troubleshooting I/O Capacity Problems</title> <conbody> - <p> - Impala queries are typically I/O-intensive. If there is an I/O problem with storage devices, - or with HDFS itself, Impala queries could show slow response times with no obvious cause - on the Impala side. Slow I/O on even a single DataNode could result in an overall slowdown, because - queries involving clauses such as <codeph>ORDER BY</codeph>, <codeph>GROUP BY</codeph>, or <codeph>JOIN</codeph> - do not start returning results until all DataNodes have finished their work. - </p> - <p> - To test whether the Linux I/O system itself is performing as expected, run Linux commands like - the following on each DataNode: - </p> + <p> Impala queries are typically I/O-intensive. If there is an I/O problem + with storage devices, or with HDFS itself, Impala queries could show + slow response times with no obvious cause on the Impala side. Slow I/O + on even a single Impala daemon could result in an overall slowdown, + because queries involving clauses such as <codeph>ORDER BY</codeph>, + <codeph>GROUP BY</codeph>, or <codeph>JOIN</codeph> do not start + returning results until all executor Impala daemons have finished their + work. </p> + <p> To test whether the Linux I/O system itself is performing as expected, + run Linux commands like the following on each host Impala daemon is + running: </p> <codeblock> $ sudo sysctl -w vm.drop_caches=3 vm.drop_caches=0 vm.drop_caches = 3 @@ -265,14 +265,15 @@ $ sudo dd if=/dev/sdd bs=1M of=/dev/null count=1k </p> <p> - <note> - Replace <varname>hostname</varname> and <varname>port</varname> with the hostname and port of - your Impala state store host machine and web server port. The default port is 25010. - </note> - The number of <codeph>impalad</codeph> instances listed should match the expected number of - <codeph>impalad</codeph> instances installed in the cluster. There should also be one - <codeph>impalad</codeph> instance installed on each DataNode - </p> + <note> Replace <varname>hostname</varname> and + <varname>port</varname> with the hostname and port of your + Impala state store host machine and web server port. The + default port is 25010. </note> The number of + <codeph>impalad</codeph> instances listed should match the + expected number of <codeph>impalad</codeph> instances + installed in the cluster. There should also be one + <codeph>impalad</codeph> instance installed on each + DataNode.</p> </entry> <entry> <p>