[1/3] kafka git commit: MINOR: Fix typos in documentation

jgus Mon, 10 Oct 2016 15:59:01 -0700

Repository: kafka
Updated Branches:
  refs/heads/0.10.1 401fe0a9b -> 6956a3819



http://git-wip-us.apache.org/repos/asf/kafka/blob/6956a381/docs/implementation.html
----------------------------------------------------------------------
diff --git a/docs/implementation.html b/docs/implementation.html
index 12846fb..c22f4cf 100644
--- a/docs/implementation.html
+++ b/docs/implementation.html
@@ -199,7 +199,7 @@ value length   : 4 bytes
 value          : V bytes
 </pre>
 <p>
-The use of the message offset as the message id is unusual. Our original idea 
was to use a GUID generated by the producer, and maintain a mapping from GUID 
to offset on each broker. But since a consumer must maintain an ID for each 
server, the global uniqueness of the GUID provides no value. Furthermore the 
complexity of maintaining the mapping from a random id to an offset requires a 
heavy weight index structure which must be synchronized with disk, essentially 
requiring a full persistent random-access data structure. Thus to simplify the 
lookup structure we decided to use a simple per-partition atomic counter which 
could be coupled with the partition id and node id to uniquely identify a 
message; this makes the lookup structure simpler, though multiple seeks per 
consumer request are still likely. However once we settled on a counter, the 
jump to directly using the offset seemed natural&mdash;both after all are 
monotonically increasing integers unique to a partition. Since the offs
 et is hidden from the consumer API this decision is ultimately an 
implementation detail and we went with the more efficient approach.
+The use of the message offset as the message id is unusual. Our original idea 
was to use a GUID generated by the producer, and maintain a mapping from GUID 
to offset on each broker. But since a consumer must maintain an ID for each 
server, the global uniqueness of the GUID provides no value. Furthermore, the 
complexity of maintaining the mapping from a random id to an offset requires a 
heavy weight index structure which must be synchronized with disk, essentially 
requiring a full persistent random-access data structure. Thus to simplify the 
lookup structure we decided to use a simple per-partition atomic counter which 
could be coupled with the partition id and node id to uniquely identify a 
message; this makes the lookup structure simpler, though multiple seeks per 
consumer request are still likely. However once we settled on a counter, the 
jump to directly using the offset seemed natural&mdash;both after all are 
monotonically increasing integers unique to a partition. Since the off
 set is hidden from the consumer API this decision is ultimately an 
implementation detail and we went with the more efficient approach.
 </p>
 <img src="images/kafka_log.png">
 <h4><a id="impl_writes" href="#impl_writes">Writes</a></h4>

http://git-wip-us.apache.org/repos/asf/kafka/blob/6956a381/docs/introduction.html
----------------------------------------------------------------------
diff --git a/docs/introduction.html b/docs/introduction.html
index 484c0e7..e32ae7b 100644
--- a/docs/introduction.html
+++ b/docs/introduction.html
@@ -17,9 +17,9 @@
 <h3> Kafka is <i>a distributed streaming platform</i>. What exactly does that 
mean?</h3>
 <p>We think of a streaming platform as having three key capabilities:</p>
 <ol>
-       <li>It let's you publish and subscribe to streams of records. In this 
respect it is similar to a message queue or enterprise messaging system.
-       <li>It let's you store streams of records in a fault-tolerant way.
-       <li>It let's you process streams of records as they occur.
+       <li>It lets you publish and subscribe to streams of records. In this 
respect it is similar to a message queue or enterprise messaging system.
+       <li>It lets you store streams of records in a fault-tolerant way.
+       <li>It lets you process streams of records as they occur.
 </ol>
 <p>What is Kafka good for?</p>
 <p>It gets used for two broad classes of application:</p>
@@ -56,7 +56,7 @@ In Kafka the communication between the clients and the 
servers is done with a si
 <p> Each partition is an ordered, immutable sequence of records that is 
continually appended to&mdash;a structured commit log. The records in the 
partitions are each assigned a sequential id number called the <i>offset</i> 
that uniquely identifies each record within the partition.
 </p>
 <p>
-The Kafka cluster retains all published records&mdash;whether or not they have 
been consumed&mdash;using a configurable retention period. For example if the 
retention policy is set to two days, then for the two days after a record is 
published, it is available for consumption, after which it will be discarded to 
free up space. Kafka's performance is effectively constant with respect to data 
size so storing data for a long time is not a problem.
+The Kafka cluster retains all published records&mdash;whether or not they have 
been consumed&mdash;using a configurable retention period. For example, if the 
retention policy is set to two days, then for the two days after a record is 
published, it is available for consumption, after which it will be discarded to 
free up space. Kafka's performance is effectively constant with respect to data 
size so storing data for a long time is not a problem.
 </p>
 <img class="centered" src="images/log_consumer.png" style="width:400px">
 <p>
@@ -124,7 +124,7 @@ More details on these guarantees are given in the design 
section of the document
 How does Kafka's notion of streams compare to a traditional enterprise 
messaging system?
 </p>
 <p>
-Messaging traditionally has two models: <a 
href="http://en.wikipedia.org/wiki/Message_queue";>queuing</a> and <a 
href="http://en.wikipedia.org/wiki/Publish%E2%80%93subscribe_pattern";>publish-subscribe</a>.
 In a queue, a pool of consumers may read from a server and each record goes to 
one of them; in publish-subscribe the record is broadcast to all consumers. 
Each of these two models has a strength and a weakness. The strength of queuing 
is that it allows you to divide up the processing of data over multiple 
consumer instances, which lets you scale your processing. Unfortunately queues 
aren't multi-subscriber&mdash;once one process reads the data it's gone. 
Publish-subscribe allows you broadcast data to multiple processes, but has no 
way of scaling processing since every message goes to every subscriber.
+Messaging traditionally has two models: <a 
href="http://en.wikipedia.org/wiki/Message_queue";>queuing</a> and <a 
href="http://en.wikipedia.org/wiki/Publish%E2%80%93subscribe_pattern";>publish-subscribe</a>.
 In a queue, a pool of consumers may read from a server and each record goes to 
one of them; in publish-subscribe the record is broadcast to all consumers. 
Each of these two models has a strength and a weakness. The strength of queuing 
is that it allows you to divide up the processing of data over multiple 
consumer instances, which lets you scale your processing. Unfortunately, queues 
aren't multi-subscriber&mdash;once one process reads the data it's gone. 
Publish-subscribe allows you broadcast data to multiple processes, but has no 
way of scaling processing since every message goes to every subscriber.
 </p>
 <p>
 The consumer group concept in Kafka generalizes these two concepts. As with a 
queue the consumer group allows you to divide up processing over a collection 
of processes (the members of the consumer group). As with publish-subscribe, 
Kafka allows you to broadcast messages to multiple consumer groups.
@@ -164,7 +164,7 @@ It isn't enough to just read, write, and store streams of 
data, the purpose is t
 In Kafka a stream processor is anything that takes continual streams of  data 
from input topics, performs some processing on this input, and produces 
continual streams of data to output topics.
 </p>
 <p>
-For example a retail application might take in input streams of sales and 
shipments, and output a stream of reorders and price adjustments computed off 
this data.
+For example, a retail application might take in input streams of sales and 
shipments, and output a stream of reorders and price adjustments computed off 
this data.
 </p>
 <p>
 It is possible to do simple processing directly using the producer and 
consumer APIs. However for more complex transformations Kafka provides a fully 
integrated <a href="/documentation.html#streams">Streams API</a>. This allows 
building applications that do non-trivial processing that compute aggregations 
off of streams or join streams together.

http://git-wip-us.apache.org/repos/asf/kafka/blob/6956a381/docs/ops.html
----------------------------------------------------------------------
diff --git a/docs/ops.html b/docs/ops.html
index a65269a..b1f1d0c 100644
--- a/docs/ops.html
+++ b/docs/ops.html
@@ -129,7 +129,10 @@ Here is an example showing how to mirror a single topic 
(named <i>my-topic</i>)
 </pre>
 Note that we specify the list of topics with the <code>--whitelist</code> 
option. This option allows any regular expression using <a 
href="http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html";>Java-style
 regular expressions</a>. So you could mirror two topics named <i>A</i> and 
<i>B</i> using <code>--whitelist 'A|B'</code>. Or you could mirror <i>all</i> 
topics using <code>--whitelist '*'</code>. Make sure to quote any regular 
expression to ensure the shell doesn't try to expand it as a file path. For 
convenience we allow the use of ',' instead of '|' to specify a list of topics.
 <p>
-Sometimes it is easier to say what it is that you <i>don't</i> want. Instead 
of using <code>--whitelist</code> to say what you want to mirror you can use 
<code>--blacklist</code> to say what to exclude. This also takes a regular 
expression argument. However, <code>--blacklist</code> is not supported when 
using <code>--new.consumer</code>.
+Sometimes it is easier to say what it is that you <i>don't</i> want. Instead 
of using <code>--whitelist</code> to say what you want
+to mirror you can use <code>--blacklist</code> to say what to exclude. This 
also takes a regular expression argument.
+However, <code>--blacklist</code> is not supported when the new consumer has 
been enabled (i.e. when <code>bootstrap.servers</code>
+has been defined in the consumer configuration).
 <p>
 Combining mirroring with the configuration 
<code>auto.create.topics.enable=true</code> makes it possible to have a replica 
cluster that will automatically create and replicate all data in a source 
cluster even as new topics are added.
 
@@ -555,7 +558,7 @@ Note that durability in Kafka does not require syncing data 
to disk, as a failed
 <p>
 We recommend using the default flush settings which disable application fsync 
entirely. This means relying on the background flush done by the OS and Kafka's 
own background flush. This provides the best of all worlds for most uses: no 
knobs to tune, great throughput and latency, and full recovery guarantees. We 
generally feel that the guarantees provided by replication are stronger than 
sync to local disk, however the paranoid still may prefer having both and 
application level fsync policies are still supported.
 <p>
-The drawback of using application level flush settings is that it is less 
efficient in it's disk usage pattern (it gives the OS less leeway to re-order 
writes) and it can introduce latency as fsync in most Linux filesystems blocks 
writes to the file whereas the background flushing does much more granular 
page-level locking.
+The drawback of using application level flush settings is that it is less 
efficient in its disk usage pattern (it gives the OS less leeway to re-order 
writes) and it can introduce latency as fsync in most Linux filesystems blocks 
writes to the file whereas the background flushing does much more granular 
page-level locking.
 <p>
 In general you don't need to do any low-level tuning of the filesystem, but in 
the next few sections we will go over some of this in case it is useful.
 

http://git-wip-us.apache.org/repos/asf/kafka/blob/6956a381/docs/quickstart.html
----------------------------------------------------------------------
diff --git a/docs/quickstart.html b/docs/quickstart.html
index 5216d33..7a77692 100644
--- a/docs/quickstart.html
+++ b/docs/quickstart.html
@@ -67,7 +67,7 @@ test
 
 <h4><a id="quickstart_send" href="#quickstart_send">Step 4: Send some 
messages</a></h4>
 
-<p>Kafka comes with a command line client that will take input from a file or 
from standard input and send it out as messages to the Kafka cluster. By 
default each line will be sent as a separate message.</p>
+<p>Kafka comes with a command line client that will take input from a file or 
from standard input and send it out as messages to the Kafka cluster. By 
default, each line will be sent as a separate message.</p>
 <p>
 Run the producer and then type a few messages into the console to send to the 
server.</p>
 
@@ -119,7 +119,7 @@ config/server-2.properties:
     listeners=PLAINTEXT://:9094
     log.dir=/tmp/kafka-logs-2
 </pre>
-<p>The <code>broker.id</code> property is the unique and permanent name of 
each node in the cluster. We have to override the port and log directory only 
because we are running these all on the same machine and we want to keep the 
brokers from all trying to register on the same port or overwrite each others 
data.</p>
+<p>The <code>broker.id</code> property is the unique and permanent name of 
each node in the cluster. We have to override the port and log directory only 
because we are running these all on the same machine and we want to keep the 
brokers from all trying to register on the same port or overwrite each other's 
data.</p>
 <p>
 We already have Zookeeper and our single node started, so we just need to 
start the two new nodes:
 </p>
@@ -197,7 +197,7 @@ java.exe    java  -Xmx1G -Xms1G -server -XX:+UseG1GC ... 
build\libs\kafka_2.10-0
 Topic:my-replicated-topic      PartitionCount:1        ReplicationFactor:3     
Configs:
        Topic: my-replicated-topic      Partition: 0    Leader: 2       
Replicas: 1,2,0 Isr: 2,0
 </pre>
-<p>But the messages are still be available for consumption even though the 
leader that took the writes originally is down:</p>
+<p>But the messages are still available for consumption even though the leader 
that took the writes originally is down:</p>
 <pre>
 &gt; <b>bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 
--from-beginning --topic my-replicated-topic</b>
 ...
@@ -305,7 +305,7 @@ unbounded input data, it will periodically output its 
current state and results
 because it cannot know when it has processed "all" the input data.
 </p>
 <p>
-We will now prepare input data to a Kafka topic, which will subsequently 
processed by a Kafka Streams application.
+We will now prepare input data to a Kafka topic, which will subsequently be 
processed by a Kafka Streams application.
 </p>
 
 <!--

http://git-wip-us.apache.org/repos/asf/kafka/blob/6956a381/docs/security.html
----------------------------------------------------------------------
diff --git a/docs/security.html b/docs/security.html
index 2e77c93..24cd771 100644
--- a/docs/security.html
+++ b/docs/security.html
@@ -31,7 +31,7 @@ It's worth noting that security is optional - non-secured 
clusters are supported
 The guides below explain how to configure and use the security features in 
both clients and brokers.
 
 <h3><a id="security_ssl" href="#security_ssl">7.2 Encryption and 
Authentication using SSL</a></h3>
-Apache Kafka allows clients to connect over SSL. By default SSL is disabled 
but can be turned on as needed.
+Apache Kafka allows clients to connect over SSL. By default, SSL is disabled 
but can be turned on as needed.
 
 <ol>
     <li><h4><a id="security_ssl_key" href="#security_ssl_key">Generate SSL key 
and certificate for each Kafka broker</a></h4>
@@ -425,7 +425,7 @@ Apache Kafka allows clients to connect over SSL. By default 
SSL is disabled but
         <ul>
           <li>SASL/PLAIN should be used only with SSL as transport layer to 
ensure that clear passwords are not transmitted on the wire without 
encryption.</li>
           <li>The default implementation of SASL/PLAIN in Kafka specifies 
usernames and passwords in the JAAS configuration file as shown
-            <a href="#security_sasl_plain_brokerconfig">here</a>. To avoid 
storing passwords on disk, you can plugin your own implementation of
+            <a href="#security_sasl_plain_brokerconfig">here</a>. To avoid 
storing passwords on disk, you can plug in your own implementation of
             <code>javax.security.auth.spi.LoginModule</code> that provides 
usernames and passwords from an external source. The login module 
implementation should
             provide username as the public credential and password as the 
private credential of the <code>Subject</code>. The default implementation
             
<code>org.apache.kafka.common.security.plain.PlainLoginModule</code> can be 
used as an example.</li>
@@ -616,7 +616,7 @@ Kafka Authorization management CLI can be found under bin 
directory with all the
     <li><b>Adding Acls</b><br>
 Suppose you want to add an acl "Principals User:Bob and User:Alice are allowed 
to perform Operation Read and Write on Topic Test-Topic from IP 198.51.100.0 
and IP 198.51.100.1". You can do that by executing the CLI with following 
options:
         <pre>bin/kafka-acls.sh --authorizer-properties 
zookeeper.connect=localhost:2181 --add --allow-principal User:Bob 
--allow-principal User:Alice --allow-host 198.51.100.0 --allow-host 
198.51.100.1 --operation Read --operation Write --topic Test-topic</pre>
-        By default all principals that don't have an explicit acl that allows 
access for an operation to a resource are denied. In rare cases where an allow 
acl is defined that allows access to all but some principal we will have to use 
the --deny-principal and --deny-host option. For example, if we want to allow 
all users to Read from Test-topic but only deny User:BadBob from IP 
198.51.100.3 we can do so using following commands:
+        By default, all principals that don't have an explicit acl that allows 
access for an operation to a resource are denied. In rare cases where an allow 
acl is defined that allows access to all but some principal we will have to use 
the --deny-principal and --deny-host option. For example, if we want to allow 
all users to Read from Test-topic but only deny User:BadBob from IP 
198.51.100.3 we can do so using following commands:
         <pre>bin/kafka-acls.sh --authorizer-properties 
zookeeper.connect=localhost:2181 --add --allow-principal User:* --allow-host * 
--deny-principal User:BadBob --deny-host 198.51.100.3 --operation Read --topic 
Test-topic</pre>
         Note that ``--allow-host`` and ``deny-host`` only support IP addresses 
(hostnames are not supported).
         Above examples add acls to a topic by specifying --topic [topic-name] 
as the resource option. Similarly user can add acls to cluster by specifying 
--cluster and to a consumer group by specifying --group [group-name].</li>

http://git-wip-us.apache.org/repos/asf/kafka/blob/6956a381/docs/upgrade.html
----------------------------------------------------------------------
diff --git a/docs/upgrade.html b/docs/upgrade.html
index d140ec2..05b55e0 100644
--- a/docs/upgrade.html
+++ b/docs/upgrade.html
@@ -139,7 +139,7 @@ work with 0.10.0.x brokers. Therefore, 0.9.0.0 clients 
should be upgraded to 0.9
 
     To avoid such message conversion before consumers are upgraded to 
0.10.0.0, one can set log.message.format.version to 0.8.2 or 0.9.0 when 
upgrading the broker to 0.10.0.0. This way, the broker can still use zero-copy 
transfer to send the data to the old consumers. Once consumers are upgraded, 
one can change the message format to 0.10.0 on the broker and enjoy the new 
message format that includes new timestamp and improved compression.
 
-    The conversion is supported to ensure compatibility and can be useful to 
support a few apps that have not updated to newer clients yet, but is 
impractical to support all consumer traffic on even an overprovisioned cluster. 
Therefore it is critical to avoid the message conversion as much as possible 
when brokers have been upgraded but the majority of clients have not.
+    The conversion is supported to ensure compatibility and can be useful to 
support a few apps that have not updated to newer clients yet, but is 
impractical to support all consumer traffic on even an overprovisioned cluster. 
Therefore, it is critical to avoid the message conversion as much as possible 
when brokers have been upgraded but the majority of clients have not.
 </p>
 <p>
     For clients that are upgraded to 0.10.0.0, there is no performance impact.
@@ -233,7 +233,7 @@ work with 0.10.0.x brokers. Therefore, 0.9.0.0 clients 
should be upgraded to 0.9
     <li> The kafka-topics.sh script (kafka.admin.TopicCommand) now exits with 
non-zero exit code on failure. </li>
     <li> The kafka-topics.sh script (kafka.admin.TopicCommand) will now print 
a warning when topic names risk metric collisions due to the use of a '.' or 
'_' in the topic name, and error in the case of an actual collision. </li>
     <li> The kafka-console-producer.sh script (kafka.tools.ConsoleProducer) 
will use the Java producer instead of the old Scala producer be default, and 
users have to specify 'old-producer' to use the old producer. </li>
-    <li> By default all command line tools will print all logging messages to 
stderr instead of stdout. </li>
+    <li> By default, all command line tools will print all logging messages to 
stderr instead of stdout. </li>
 </ul>
 
 <h5><a id="upgrade_901_notable" href="#upgrade_901_notable">Notable changes in 
0.9.0.1</a></h5>

[1/3] kafka git commit: MINOR: Fix typos in documentation

Reply via email to