Product docs and API reference are now on Akamai TechDocs.
Search product docs.
Search for “” in product docs.
Search API reference.
Search for “” in API reference.
Search Results
 results matching 
 results
No Results
Filters
Manually Deploy an Apache Kafka Cluster on Akamai
Traducciones al EspañolEstamos traduciendo nuestros guías y tutoriales al Español. Es posible que usted esté viendo una traducción generada automáticamente. Estamos trabajando con traductores profesionales para verificar las traducciones de nuestro sitio web. Este proyecto es un trabajo en curso.
Apache Kafka is a scalable open source, distributed system for managing real-time data streams. Kafka supports a wide range of applications, including those used for log aggregation, monitoring, and real-time analytics. Kafka is considered an industry standard for use cases such as data pipelines, event-driven architectures, and stream-processing applications.
This guide includes steps for deploying a Kafka cluster using Ansible. The provided Ansible playbook creates a functioning Kafka cluster comprised of three broker nodes configured to authenticate with encrypted secrets. Also included are steps for producing and consuming sample data for testing cluster functionality.
If you wish to deploy Kafka automatically rather than manually, consider our Apache Kafka cluster marketplace deployment.
Architecture Diagram
Architecture Components
- Kafka Cluster: A minimum three-node cluster for a fault-tolerant architecture
- Kafka Brokers: Individual servers used to receive and send data
- Commit log: A file that lives on each broker server to log event data
- Producers: Any app, software, or code that connects to Kafka producing data
- Consumer group: Any app, software, or code that consumes produced data to perform a function
Prerequisites
The following software and components must be installed and configured on your local system in order for the provided playbooks to function:
- Python version: > v3.8
- The venv Python module
- A Linode API access token
- A configured SSH key pair along with your public key
- The Git utility
Deployment Details, Software, and Supported Distributions
Deployment Details
- The minimum cluster size is three nodes, as three controllers are configured for fault-tolerance at all times.
- The nodes used in this deployment default to 4GB dedicated instances. For production environments, it is recommended to use a minimum plan size of 8GB dedicated instances up to 32GB dedicated instances. Individual use cases may vary.
- The manual deployment in this guide uses Kafka’s native consensus protocol, KRaft.
- During the provisioning process, the cluster is configured with mTLS for authentication. This means inter-broker communication and client authentication are established via certificate identity.
- Clients that connect to the cluster need their own valid certificate. All certificates are created using a self-signed Certificate Authority (CA). Client keystores and truststores are found on the first Kafka node in the following directories:
/etc/kafka/ssl/keystore
/etc/kafka/ssl/truststore
- The CA key and certificate pair are stored on the first Kafka node in the
/etc/kafka/ssl/ca
directory.
Included Software
- Apache Kafka version 3.8.0
- KRaft
- UFW (Uncomplicated Firewall) version 0.36.1
- Fail2ban version 0.11.2
Supported Deployment Distribution
- Ubuntu 24.04 LTS
Clone the docs-cloud-projects Github Repository
In order to run the Kafka deployment in this guide, the docs-cloud-projects GitHub repository must be cloned to your local machine. This includes all playbooks, configurations, and files for all project directories in the repository, including those needed to successfully deploy the Kafka cluster.
Using git, clone the docs-cloud-projects repository. This clones the repository to the current working directory on your local machine:
git clone https://github.com/linode/docs-cloud-projects.git
Navigate to the manual-kafka-cluster directory within your local cloned repository:
cd docs-cloud-projects/apps/manual-kafka-cluster
Confirm the manual-kafka-cluster directory contents on your system:
ls
The following contents should be visible:
LICENSE ansible.cfg group_vars images requirements.txt scripts README.md collections.yml hosts provision.yml roles site.yml
Installation
Using python, create a virtual environment with the venv utility. This isolates dependencies from other packages on your local system:
python3 -m venv env source env/bin/activate pip install -U pip
Install all packages in the
requirements.txt
file. This includes Ansible collections and required Python packages:pip install -r requirements.txt ansible-galaxy collection install -r collections.yml
Confirm Ansible is installed by checking the version:
ansible --version
Sample output:
ansible [core 2.17.5] (...) python version = 3.12.4 (main, Jun 18 2024, 08:58:27) [Clang 15.0.0 (clang-1500.0.40.1)] (...) jinja version = 3.1.4 libyaml = True
Upgrading the ansible-core package Some ansible-core package versions may contain older parameters. Should you experience any errors related to out-of-date or deprecated parameters, you can update the ansible-core version with the below command:
python -m pip install --upgrade ansible-core
Setup
All secrets are encrypted with the Ansible Vault utility as a best practice.
Export
VAULT_PASSWORD
, replacing MY_VAULT_PASSWORD with a password of your choosing. This password acts as a key for decrypting encrypted secrets. Save this password for future use:export VAULT_PASSWORD=MY_VAULT_PASSWORD
Using the ansible-vault utility, encrypt the following: a root password, sudo user password, your Linode APIv4 token, truststore password, keystore password, and certificate authority (CA) password
In the command, replace the below values with your own:
- ROOT_PASSWORD with a root password
- SUDO_PASSWORD with a sudo user password
- API_TOKEN with your Linode APIv4 token
- TRUSTSTORE_PASSWORD with a truststore password
- KEYSTORE_PASSWORD with a keystore password
- CA_PASSWORD with a certificate authority password
This command generates encrypted output and also assigns values to the following variables for Ansible to reference later. Do not replace these values:
root_password
sudo_password
api_token
truststore_password
keystore_password
ca_password
When running the command, leave the single quotation marks
'
around each value:ansible-vault encrypt_string 'ROOT_PASSWORD' --name 'root_password' ansible-vault encrypt_string 'SUDO_PASSWORD' --name 'sudo_password' ansible-vault encrypt_string 'API_TOKEN' --name 'api_token' ansible-vault encrypt_string 'TRUSTSTORE_PASSWORD' --name 'truststore_password' ansible-vault encrypt_string 'KEYSTORE_PASSWORD' --name 'keystore_password' ansible-vault encrypt_string 'CA_PASSWORD' --name 'ca_password'
Password Requirements Each password must meet Akamai’s strong password requirements. If your passwords do not meet these requirements, deployment will fail during the provisioning stage.
Copy the generated outputs for
root_password
,sudo_password
,api_token
,truststore_password
,keystore_password
, andca_password
. Save them in thegroup_vars/kafka/secret_vars
file. Sample output:root_password: !vault | $ANSIBLE_VAULT;1.1;AES256 38306438386334663834633634363930343233373066353234616363356534653033346232333538 3163313031373138383965383739356339663831613061660a666332636564356236656331323361 61383134663166613462363633646330678356561386230383332313564643135343538383161383236 6432396332643232620a393630633132336134613039666336326337376566383531393464303864 34306435376534653961653739653232383262613336383837343962633565356546 sudo_password: !vault | $ANSIBLE_VAULT;1.1;AES256 38306438386334663834633634363930343233373066353234616363356534653033346232333538 3163313031373138383965383739356339663831613061660a666332636564356236656331323361 61383134663166613462363633646330356561386230383332313564643135343538383161383236 6432396332643232620a393630633sdf32336134613039666336326337376566383531393464303864 34306435376534653961653739653232383262613336383837343962633565356546 api_token: !vault | $ANSIBLE_VAULT;1.1;AES256 38306438386334663834633634363930343233373066353234616363356534653033346232333538 3163313031373138383965383739356339663831613061660a666332636564356236656331323361 6138313466316661346236363364567330356561386230383332313564643135343538383161383236 6432396332643232620a393630633132336134613039666336326337376566383531393464303864 34306435376534653961653739653232383262613336383837343962633565356546 truststore_password: !vault | $ANSIBLE_VAULT;1.1;AES256 38306438386334663834633634363930343233373066353234616363356534653033346232333538 3163313031373138383965383739356339663831613061660a666332636564356236656331323361 6138313466316661346236363364567330356561386230383332313564643135343538383161383236 6432396332643232620a393630633132336134613039666336326337376566383531393464303864 34306435376534653961653739653232383262613336383837343962633565356546 keystore_password: !vault | $ANSIBLE_VAULT;1.1;AES256 38306438386334663834633634363930343233373066353234616363356534653033346232333538 3163313031373138383965383739356339663831613061660a666332636564356236656331323361 6138313466316661346236363364567330356561386230383332313564643135343538383161383236 6432396332643232620a393630633132336134613039666336326337376566383531393464303864 34306435376534653961653739653232383262613336383837343962633565356546 ca_password: !vault | $ANSIBLE_VAULT;1.1;AES256 38306438386334663834633634363930343233373066353234616363356534653033346232333538 3163313031373138383965383739356339663831613061660a666332636564356236656331323361 6138313466316661346236363364567330356561386230383332313564643135343538383161383236 6432396332643232620a393630633132336134613039666336326337376566383531393464303864 34306435376534653961653739653232383262613336383837343962633565356546
Only save the encrypted values When saving the generated encrypted outputs, omit anyEncryption successful
text.Using a text editor, open and edit the Linode instance parameters in the
group_vars/kafka/vars
file. Replace the values for the following variables with your preferred deployment specifications, and save your changes when complete:ssh_keys
: Your SSH public key(s); replace the example keys with your own and remove any unused keys.type
: Compute Instance type and plan for each Kafka instance.region
: The data center region for the cluster.image
: The distribution image to be installed on each Kafka instance. The deployment in this guide supports theubuntu24.04
image.group
andlinode_tags
(optional): Any groups or tags you with to apply to your cluster’s instances for organizational purposes.cluster_size
: The number of Kafka instances in the cluster deployment. Minimum value of 3.sudo_username
: A sudo username for each cluster instance.country_name
,state_or_province_name
,locality_name
, andorganization_name
: The geographical and organizational information for your self-signed TLS certificate.email_address
: A functioning SOA administrator email for your self-signed TLS certificate.
- File: group_vars/kafka/vars
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
ssh_keys: - ssh-ed25519 YOUR_PUBLIC_KEY - ssh-rsa YOUR_PUBLIC_KEY instance_prefix: kafka type: g6-dedicated-2 region: us-southeast image: linode/ubuntu24.04 group: linode_tags: cluster_size: 3 client_count: 2 sudo_username: SUDO_USERNAME #tls country_name: US state_or_province_name: Pennsylvania locality_name: Philadelphia organization_name: Akamai Technologies email_address: administrator@example.com ca_common_name: Kafka RootCA
See Linode API: List Types for information on Linode API parameters.
Provision Your Cluster
Using the
ansible-playbook
utility, run theprovision.yml
playbook with verbose options to keep track of the deployment process. This creates Linode instances and dynamically writes the Ansible inventory to the hosts file:ansible-playbook -vvv provision.yml
Once the playbook has finished running, you should see the following output:
PLAY RECAP ******************************************************************************************************* localhost : ok=6 changed=3 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
Run the
site.yml
playbook with the hosts inventory file. This playbook configures and installs all required dependencies in the cluster:ansible-playbook -vvv -i hosts site.yml
Once complete, you should see similar output to the first playbook, this time including the public IP addresses for each Kafka instance:
PLAY RECAP ******************************************************************************************************* 192.0.2.21 : ok=25 changed=24 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 198.51.100.17 : ok=25 changed=24 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 203.0.113.24 : ok=49 changed=46 unreachable=0 failed=0 skipped=3 rescued=0 ignored=0
Producing and Consuming Data
Once your cluster is up and running, you can begin producing and consuming messages from the brokers. The steps below produce and consume sample data to test the functionality of your Kafka cluster.
Install Python Dependency
Using pip
, install the confluent_kafka
Python client for Apache Kafka while still in your local virtual environment:
pip install confluent_kafka
Installing collected packages: confluent_kafka
Successfully installed confluent_kafka-2.6.0
Configure Your /etc/hosts File
On your local machine (or the machine from which you are producing data), you must configure your /etc/hosts
file to resolve each Kafka node’s IP with a hostname. This is done to facilitate certificate authentication between a client and broker.
On your local machine, open your
/etc/hosts
file using the text editor of your choice in a separate terminal session. You may need to edit the file path to match your local environment:nano /etc/hosts
Add the highlighted lines underneath your localhost information. Replace the IP addresses with those of your respective Kafka broker nodes. Save your changes when complete:
- File: /etc/hosts
1 2 3 4 5
127.0.0.1 localhost (...) 192.0.2.21 kafka1 198.51.100.17 kafka2 203.0.113.24 kafka3
In the file,
kafka1
,kafka2
, andkafka3
define the hostnames associated with each Kafka node and must be included.
Obtain Client Certificates
In order to send data to the Kafka broker, you must obtain three certificate files (ca-cert
, client1.crt
, and client1.key
) stored on the first Kafka server in your cluster, kafka1. For the purposes of this test, the certificate files must also be located in the same working directory as the produce.py
and consume.py
scripts used to produce and consume testing data, respectively. These scripts are located in the /scripts
directory of the manual-kafka-cluster folder that was cloned from the docs-cloud-projects repository.
The produce.py
, consume.py
, and getcerts.sh
scripts are provided by Akamai for testing and are not endorsed by Apache.
In your local virtual environment, navigate to the
scripts
folder withinmanual-kafka-cluster
:cd scripts
Confirm the contents of the directory:
ls
You should see both the
produce.py
andconsume.py
scripts, along with thegetcerts.sh
script used to obtain the necessary certificate files from the kafka1 server:consume.py getcerts.sh produce.py
To obtain the certificate files, run the
getcerts.sh
script. Replace IP_ADDRESS with the IP address of your first Kafka node, kafka1.bash getcerts.sh IP_ADDRESS
[info] fetching /etc/kafka/ssl/cert/client1.crt from 192.0.2.21.. [info] fetching /etc/kafka/ssl/key/client1.key from 192.0.2.21.. [info] fetching /etc/kafka/ssl/ca/ca-crt from 192.0.2.21..
Confirm successful download of the certificate files,
ca-cert
,client1.crt
, andclient1.key
:ls
ca-crt client1.crt client1.key consume.py getcerts.sh produce.py
Produce and Consume Data
Produce Data
The produce.py
script connects to one of the three Kafka broker nodes to send sample message data over port 9092. This is the default port Kafka brokers use to communicate with clients that produce and consume data.
While in the
scripts
directory, run theproduce.py
script to send message data to the broker node:python3 produce.py
Message delivered to test [0] at offset 0 Message delivered to test [0] at offset 1 Message delivered to test [0] at offset 2
Consume Data
Similar to the produce.py
script, the consume.py
script is provided to test the consumption of message data. The consume.py
script connects to one of the available Kafka nodes to consume the messages that were produced by the produce.py
script.
While in the same working directory,
scripts
, run theconsume.py
script to receive the sample data:python3 consume.py
Received event: {'event_id': 0, 'timestamp': 1727888292, 'message': 'Event number 0'} Received event: {'event_id': 1, 'timestamp': 1727888292, 'message': 'Event number 1'} Received event: {'event_id': 2, 'timestamp': 1727888292, 'message': 'Event number 2'}
Once the
consume.py
script has successfully fetched the message data, you can break the connection with the broker by pressing Ctrl + C on your keyboard.
What’s Next
Once your Kafka cluster is up, running, and fully functional, you may consider the following next steps depending on your application or use case:
- Save the client certificate files on your “producer” and “consumer” servers so that they can communicate with Kafka. The certificates are located in
/etc/kafka/ssl
on the first Kafka node, kafka1. - Update your connection string to connect to the Kafka brokers
Familiarize yourself with the official Apache Kafka documentation, including use cases, community links, and Kafka support:
More Information
You may wish to consult the following resources for additional information on this topic. While these are provided in the hope that they will be useful, please note that we cannot vouch for the accuracy or timeliness of externally hosted materials.
This page was originally published on