Hi Liam

Many thanks. I was able to get the file stream connector demo working. See the 
attached file for detail. 

Andy

On 3/29/22, 5:27 PM, "Liam Clarke-Hutchinson" <lclar...@redhat.com> wrote:

    Hi Andrew,

    So if you've downloaded Apache Kafka, you can run a standalone connect
    instance using the bin/connect-standalone.sh script mentioned. And while a
    lot of the connector documentation is on the Confluent website, you can
    still use them with FOSS Kafka so long as you're in line with the Confluent
    Community Licence (basically, IIRC, you can use them for free, but not to
    run a SAAS or similar that competes with Confluent, but IANAL).

    I agree that there's not much useful documentation for your use case. I
    will look into writing a tutorial for your use case, would you be happy to
    give me feedback on it as I go?

    The most important configuration initially is the plugin.path, where your
    standalone KC process will look for those JARs. You can see an example
    properties file for standalone Connect under the config/ dir in the Kafka
    you downloaded. Note that it has the plugin path commented out initially.

    So, Kafka ships with a connector that exposes a file source and file sink,
    which is good for testing out KC and getting used to it. You can either
    build it from source, or download it from here:
    https://mvnrepository.com/artifact/org.apache.kafka/connect-file - choose
    the version that matches the version of Kafka you've downloaded, and then
    you can download the JAR under files up the top. This documentation from
    Confluent is useful:
    https://docs.confluent.io/platform/current/connect/filestream_connector.html

    Note that if you don't provide a file property, (this isn't documented
    either(!)) it will use standard input for the file source, and standard
    output for the file sink. You can see example configs for this connector
    reading from a file or console under that same config/ directory, and ditto
    for writing.

    These connectors might also be useful for playing with KC, and are all free
    and downloadable:
    https://www.confluent.io/hub/confluentinc/kafka-connect-datagen <-
    generates a stream of test data
    https://www.confluent.io/hub/jcustenborder/kafka-connect-twitter <-
    disregard, I saw you mentioned not having Twitter
    https://www.confluent.io/hub/C0urante/kafka-connect-reddit <- I haven't
    used this, but could be interesting?


    I hope this helps you get started, and please let me know if I can help
    with anything else :)

    Cheers,

    Liam Clarke



    On Wed, 30 Mar 2022 at 11:54, andrew davidson <a...@santacruzanalytics.com>
    wrote:

    > I found the quick start https://kafka.apache.org/quickstart example very
    > helpful. It made it really easy to understand how download, start up,
    > create topic, push some data through the Kafka. I did not find
    > https://kafka.apache.org/quickstart#quickstart_kafkaconnect useful.
    >
    > I am looking for something very simple to  learning how to configure and
    > use connectors using Apache Kafka distribution, not Confluent. I can run 
on
    > my mac or Linux server. Being a newbie I want to keep things super simple.
    > I do not want to have to debug firewalls, ACL, …
    >
    > I do not have a data base, access to twitter, …
    >
    > I thought maybe something some sort source/sink using the local file
    > system?
    >
    > Any suggestions?
    >
    > Kind regards
    >
    > Andy
    >
    > p.s. I have read a lot of documentation most of it is very high level. Can
    > anyone recommend a “hand on” tutorial?
    >
    >
    >
# Connector quick start.  based on Liam Clarke's email 
a...@santacruzanalytics.com
4/1/22

- example property
  * kafka_2.13-3.1.0/config/connect-standalone.properties
  * configure plugin.path
- apache kafak ships with a file source and sink (not production grade)
  * kafka_2.13-3.1.0/libs/connect-file-3.1.0.jar 
  * [documentation](https://docs.confluent.io/platform/current/connect/filestream_connector.html)
  * Note that if you don't provide a file property, (this isn't documented either(!)) it will use standard input for the file source, and standard output for the file sink. You can see example configs for this connector reading from a file or console under that same config/ directory, and ditto for writing.
    ```
    ls kafka_2.13-3.1.0/config/connect-file-*.properties
        kafka_2.13-3.1.0/config/connect-file-sink.properties
        kafka_2.13-3.1.0/config/connect-file-source.properties
    ```
  * https://mvnrepository.com/artifact/org.apache.kafka/connect-file 


overview:
* We start all the servers
* create a file /tmp/test.txt
* the "file-source" plug running in the "connect server" will read the file and send the contents to the connect-test topic on the kafka broker
* the "sink connector" running in the connect server will read the contect-test topic message and write it to /tmp/test.sink.txt


Note:
if you remove the /tmp/test.txt file and create a new one it will no longer be connected to /tmp/test.sink.txt

some hacks to figure out what is going on

```
bin/kafka-topics.sh --list --bootstrap-server localhost:9092
__consumer_offsets
connect-test

bin/kafka-topics.sh --describe --bootstrap-server localhost:9092 --topic connect-test 
Topic: connect-test	TopicId: UHhFV_XYSe6_GvbEFs8HAw	PartitionCount: 1	ReplicationFactor: 1	Configs: segment.bytes=1073741824
	Topic: connect-test	Partition: 0	Leader: 0	Replicas: 0	Isr: 0
    
```

Once you have delete the /tmp/test.txt you need to restart the connect-server. There must be a way to change the config with out re-boot

## demo instructions
1. start kafka locally
   
   a. start zoo keeper
       ```
       cd scaWorkSpace/breathBiopsy/realTime/bin/kafka_2.13-3.1.0
       bin/zookeeper-server-start.sh config/zookeeper.properties
       ```
   
   b. start kafka brokers service in a new terminal
       ```
       bin/kafka-server-start.sh config/server.properties
       ```
   
   c. create the config files if they do not exist. 
       The file location does not matter
       ```
       cat kafka_2.13-3.1.0/config/aedwip-connect-file-sink.properties 
       name=local-file-sink
       connector.class=FileStreamSink
       tasks.max=1
       file=/tmp/test.sink.txt
       topics=connect-test
       ```
       
       ```
       cat kafka_2.13-3.1.0/config/aedwip-connect-file-source.properties
       name=local-file-source
       connector.class=FileStreamSource
       tasks.max=1
       file=/tmp/test.txt
       topic=connect-test
       ```

   d. configure kafka_2.13-3.1.0/config/connect-standalone.properties
       * set plugin.path 
       * notice comma at the end
         ```
         plugin.path=/usr/local/share/java,/usr/local/share/kafka/plugins,/opt/connectors,/Users/andrewdavidson/googleSCA/scaWorkSpace/breathBiopsy/realTime/bin/kafka_2.13-3.1.0/libs,
         ```
         
   e. run connect sever in a new terminal
       ```
       workerProp=config/connect-standalone.properties
       bin/connect-standalone.sh \
           $workerProp \
           config/aedwip-connect-file-source.properties \
           config/aedwip-connect-file-sink.properties
       ```
   
2. pass a file through
   a. in a new terminal create a file 
       ```
       ls * > /tmp/test.txt
       ```
   b. the file /tmp/tmp.sink.txt will be created. It will be identical to /tmp/test.txt.
   c. append some text to the bottom of the file
       ```
       echo "adding more content to the bottom of /tmp/test.txt" >> /tmp/test.txt
       ```
   d. we should see the changes in the sink file /tmp/test.sink.txt. 

Reply via email to