Let us see what CEPH cluster is all about and then we proceed on how to install CEPH storage cluster on Rocky Linux 8 & Debian 10 . If you need any support on OS Installation follow the previous blog post.
Introduction – What is a CEPH Cluster
CEPH is an open-source software (software-defined storage) storage platform, implements object storage on a single distributed computer cluster, and provides 3-in-1 interfaces for object-block and file-level storage. CEPH aims primarily for completely distributed operation without a single point of failure, scalable to the exabyte level, and freely available.
There are several different ways to install CEPH. Choose the method that best suits your needs. According to CEPH documentation cephadm or Rook (CEPH cluster running in Kubernetes) are recommended methods of installation of CEPH cluster. ceph-ansible is not integrated with the new orchestrator APIs, introduced in Nautilus and Octopus, which means that newer management features and dashboard integration are not available, so it is not recommended. ceph-deploy is no longer actively maintained. It is not tested on versions of CEPH newer than Nautilus. It does not support RHEL8, CentOS8, Rocky 8 or newer operating systems.
Cephadm installs and manages a CEPH cluster using containers and systemd, with tight integration with the CLI and dashboard GUI. Cephadm only supports Octopus and newer releases. Cephadm is fully integrated with the new orchestration API and fully supports the new CLI and dashboard features to manage cluster deployment
Follow the official CEPH documentation for more details
Requirements
- Installed and configured OS
- Network/Bonding configuration
- container support (podman or docker)
- systemd
- python3
- Time synchronization (such as chrony or NTP)
- LVM2 for provisioning storage devices
How to Install CEPH Cluster
The steps required to set up CEPH Cluster using cephadm on Rocky or Debian (the commands common for both systems are not distinguished, the different commands for both systems are notified by the name of the system):
- Login as root and add host on /etc/host on each host
cat<<EOF>>/etc/hosts
10.2.93.34 ceph-node0.linuxquery.org
10.2.93.35 ceph-node1.linuxquery.org
10.2.93.36 ceph-node2.linuxquery.org
EOF
2. Set passwordless SSH authentication
ssh-keygen
ssh-copy-id ceph-node0.linuxquery.org
ssh-copy-id ceph-node1.linuxquery.org
ssh-copy-id ceph-node2.linuxquery.org
3. Set hostname for every node of the CEPH cluster:
hostnamectl set-hostname ceph-node0.linuxquery.org
hostnamectl set-hostname ceph-node1.linuxquery.org
hostnamectl set-hostname ceph-node2.linuxquery.org
4. Update and upgrade all the packages
Rocky
# dnf update -y; dnf upgrade -y
Debian
# apt-get update; apt-get upgrade
5. Install python3, lvm2, and podman. In case of Debian version, we recommend choosing docker instead of podman as podman has some problem and wasn’t stable enough.
Rocky
# dnf install -y python3 lvm2 podman
Debian
# apt-get install ca-certificates curl gnupg lsb-release
# mkdir -p /etc/apt/keyrings
# curl -fsSL https://download.docker.com/linux/debian/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
# echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/debian \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
#apt-get update
# apt-get install python3 lvm2 docker-ce docker-ce-cli containerd.io docker-compose-plugin
6. Configure NTP server for every node in CEPH cluster
Rocky
[root@ceph-node0 ~]# vim /etc/chrony.conf
[root@ceph-node0 ~]# systemctl restart chronyd.service
[root@ceph-node0 ~]# chronyc sources
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
^* dilbert.linuxquery.org> 3 6 17 3 +25us[ +20us] +/- 44ms
^+ norbert.linuxquery.org> 3 6 17 3 -29us[ -34us] +/- 48ms
[root@ceph-node0 ~]#
Debian
apt install ntpdate
ntpdate 192.168.X.X
7. Install Cephadm using curl-based installation method
#set proper version for CEPH
version=quincy
curl --silent --remote-name --location
https://github.com/ceph/ceph/raw/$version/src/cephadm/cephadm
mv cephadm /usr/local/bin
chmod +x /usr/local/bin/cephadm
mkdir -p /etc/ceph
# add ceph common tools
cephadm add-repo --release $version
cephadm install ceph-common
7. Bootstrap new CEPH cluster
cephadm bootstrap --mon-ip 10.X.X.20 --allow-fqdn-hostname (--cluster-network 172.30.135.0/24) (--docker)
Example:
cephadm bootstrap --mon-ip 10.X.X.20 –cluster-network 172.30.135.0/24 --allow-fqdn-hostname
Creating directory /etc/ceph for ceph.conf
Verifying podman|docker is present...
Verifying lvm2 is present...
Verifying time synchronization is in place...
Unit chronyd.service is enabled and running
Repeating the final host check...
podman (/usr/bin/podman) version 4.0.2 is present
systemctl is present
lvcreate is present
Unit chronyd.service is enabled and running
Host looks OK
Cluster fsid: 91568d64-1199-11ed-9928-54ab3a712c97
ERROR: hostname is a fully qualified domain name (ceph-node0.linuxquery.org); either fix (e.g., "sudo hostname ceph-node0" or similar) or pass --allow-fqdn-hostname
[root@ceph-node0 ~]# cephadm bootstrap --mon-ip 10.X.X.20 --allow-fqdn-hostname
Verifying podman|docker is present...
Verifying lvm2 is present...
Verifying time synchronization is in place...
Unit chronyd.service is enabled and running
Repeating the final host check...
podman (/usr/bin/podman) version 4.0.2 is present
systemctl is present
lvcreate is present
Unit chronyd.service is enabled and running
Host looks OK
Cluster fsid: c611c0e6-1199-11ed-b066-54ab3a712c97
Verifying IP 10.X.X.20 port 3300 ...
Verifying IP 10.X.X.20 port 6789 ...
Mon IP `10.X.X.20` is in CIDR network `10.X.X.18/27`
Mon IP `10.X.X.20` is in CIDR network `10.X.X.18/27`
Internal network (--cluster-network) has not been provided, OSD replication will default to the public_network
Pulling container image quay.io/ceph/ceph:v17...
Ceph version: ceph version 17.2.3 (dff484dfc9e19a9819f375586300b3b79d80034d) quincy (stable)
Extracting ceph user uid/gid from container image...
Creating initial keys...
Creating initial monmap...
Creating mon...
firewalld ready
Enabling firewalld service ceph-mon in current zone...
Waiting for mon to start...
Waiting for mon...
mon is available
Assimilating anything we can from ceph.conf...
Generating new minimal ceph.conf...
Restarting the monitor...
Setting mon public_network to 10.X.X.18/27
Wrote config to /etc/ceph/ceph.conf
Wrote keyring to /etc/ceph/ceph.client.admin.keyring
Creating mgr...
Verifying port 9283 ...
firewalld ready
Enabling firewalld service ceph in current zone...
firewalld ready
Enabling firewalld port 9283/tcp in current zone...
Waiting for mgr to start...
Waiting for mgr...
mgr not available, waiting (1/15)...
mgr not available, waiting (2/15)...
mgr not available, waiting (3/15)...
mgr is available
Enabling cephadm module...
Waiting for the mgr to restart...
Waiting for mgr epoch 5...
mgr epoch 5 is available
Setting orchestrator backend to cephadm...
Generating ssh key...
Wrote public SSH key to /etc/ceph/ceph.pub
Adding key to root@localhost authorized_keys...
Adding host ceph-node0.linuxquery.org...
Deploying mon service with default placement...
Deploying mgr service with default placement...
Deploying crash service with default placement...
Deploying prometheus service with default placement...
Deploying grafana service with default placement...
Deploying node-exporter service with default placement...
Deploying alertmanager service with default placement...
Enabling the dashboard module...
Waiting for the mgr to restart...
Waiting for mgr epoch 9...
mgr epoch 9 is available
Generating a dashboard self-signed certificate...
Creating initial admin user...
Fetching dashboard port number...
firewalld ready
Enabling firewalld port 8443/tcp in current zone...
Ceph Dashboard is now available at:
URL: https://ceph-node0.linuxquery.org:8443/
User: admin
Password: xyx
Enabling client.admin keyring and conf on hosts with "admin" label
Saving cluster configuration to /var/lib/ceph/c611c0e6-1199-11ed-b066-54ab3a712c97/config directory
Enabling autotune for osd_memory_target
You can access the Ceph CLI as following in case of multi-cluster or non-default config:
sudo /usr/sbin/cephadm shell --fsid c611c0e6-1199-11ed-b066-54ab3a712c97 -c/etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring
Or, if you are only running a single cluster on this host:
sudo /usr/sbin/cephadm shell
Please consider enabling telemetry to help improve Ceph:
ceph telemetry on
For more information see:
https://docs.ceph.com/docs/master/mgr/telemetry/
Bootstrap complete.
[root@ceph-node0 ~]#
8. Check Ceph dashboard, access IP address of https://ceph-node0.linuxquery.org:8443/ and use credentials from the cephadm bootstrap output then set a new password.
9. Verify Ceph CLI. Check status of ceph cluster, OK for [HEALTH_WARN] because OSDs are not added yet.
[root@ceph-node0 ~]#
[root@ceph-node0 ~]# ceph -s
cluster:
id: c611c0e6-1199-11ed-b066-54ab3a712c97
health: HEALTH_WARN
OSD count 0 < osd_pool_default_size 3
services:
mon: 1 daemons, quorum ceph-node0.linuxquery.org (age 25m)
mgr: ceph-node0.linuxquery.org.xtlcez(active, since 23m)
osd: 0 osds: 0 up, 0 in
data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 B
usage: 0 B used, 0 B / 0 B avail
pgs:
[root@ceph-node0 ~]# ceph -v
ceph version 17.2.3 (dff484dfc9e19a9819f375586300b3b79d80034d) quincy (stable)
[root@ceph-node0 ~]#
[root@ceph-node0 ~]#
10. Verify containers are running for each service and check status for systemd service for each containers
[root@ceph-node0 ~]# podman ps
[root@ceph-node0 ~]# systemctl status ceph-* --no-page
11. Adding hosts to the cluster, you can do this via CLI or via CEPH dashboard:
[root@ceph-node0 ~]#ceph orch host add ceph-node1.linuxquery.org 10.2.93.35
[root@ceph-node0 ~]#ceph orch host add ceph-node2.linuxquery.org 10.2.93.36
12. Deploy OSDs to the cluster
[root@ceph-node0 ~]# ceph orch device ls
HOST PATH TYPE DEVICE ID SIZE AVAILABLE REFRESHED REJECT REASONS
ceph-node0.linuxquery.org /dev/sdb ssd INTEL_SSDSC2BX80_BTHC642408QM800NGN 800G Yes 7m ago
ceph-node0.linuxquery.org /dev/sdc ssd INTEL_SSDSC2BX80_BTHC642409S6800NGN 800G Yes 7m ago
ceph-node0.linuxquery.org /dev/sdd ssd INTEL_SSDSC2KG96_BTYG111608R6960CGN 960G Yes 7m ago
ceph-node1.linuxquery.org /dev/sdb ssd INTEL_SSDSC2BX80_BTHC642408Q1800NGN 800G Yes 4m ago
ceph-node1.linuxquery.org /dev/sdd ssd INTEL_SSDSC2KG96_BTYG111609JH960CGN 960G Yes 4m ago
ceph-node2.linuxquery.org /dev/sdb ssd INTEL_SSDSC2KG96_BTYG112101MY960CGN 960G Yes 3m ago
ceph-node2.linuxquery.org /dev/sdc ssd INTEL_SSDSC2KG96_BTYG111608T1960CGN 960G Yes 3m ago
ceph-node2.linuxquery.org /dev/sdd ssd INTEL_SSDSC2KG96_BTYG112000LF960CGN 960G Yes 3m ago
[root@ceph-node0 ~]#
[root@ceph-node0 ~]# ceph orch apply osd --all-available-devices
Scheduled osd.all-available-devices update...
[root@ceph-node0 ~]#
[root@ceph-node0 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 7.27722 root default
-5 2.32867 host ceph-node0
0 ssd 0.72769 osd.0 up 1.00000 1.00000
4 ssd 0.72769 osd.4 up 1.00000 1.00000
6 ssd 0.87329 osd.6 up 1.00000 1.00000
-3 2.32867 host ceph-node1
1 ssd 0.72769 osd.1 up 1.00000 1.00000
3 ssd 0.87329 osd.3 up 1.00000 1.00000
8 ssd 0.72769 osd.8 up 1.00000 1.00000
-7 2.61987 host ceph-node2
2 ssd 0.87329 osd.2 up 1.00000 1.00000
5 ssd 0.87329 osd.5 up 1.00000 1.00000
7 ssd 0.87329 osd.7 up 1.00000 1.00000
[root@ceph-node0 ~]#
[root@ceph-node0 ~]# ceph -s
cluster:
id: c611c0e6-1199-11ed-b066-54ab3a712c97
health: HEALTH_OK
services:
mon: 3 daemons, quorum ceph-node0.linuxquery.org,ceph-node1,ceph-node2 (age 21h)
mgr: ceph-node0.linuxquery.org.xtlcez(active, since 22h), standbys: ceph-node1.msyqqr
osd: 9 osds: 9 up (since 4h), 9 in (since 4h)
data:
pools: 1 pools, 1 pgs
objects: 2 objects, 1.9 MiB
usage: 84 MiB used, 7.3 TiB / 7.3 TiB avail
pgs: 1 active+clean
13. Deploy ceph-mon (ceph monitor daemon)
[root@ceph-node0 ~]# ceph orch apply mon --placement=" ceph-node0.linuxquery.org,ceph-node1.linuxquery.org,ceph-node2.linuxquery.org"
14. Deploy ceph-mgr (ceph manager daemon)
[root@ceph-node0 ~]# ceph orch apply mgr --placement=" ceph-node0.linuxquery.org,ceph-node1.linuxquery.org,ceph-node2.linuxquery.org"
15. Now verify ceph cluster and see everything OK
[root@ceph-node0 ~]# ceph -s
cluster:
id: c611c0e6-1199-11ed-b066-54ab3a712c97
health: HEALTH_OK
services:
mon: 3 daemons, quorum ceph-node0.linuxquery.org,ceph-node1,ceph-node2 (age 5w)
mgr: ceph-node0.linuxquery.org.xtlcez(active, since 5w), standbys: ceph-node1.msyqqr
osd: 9 osds: 9 up (since 5w), 9 in (since 5w)
data:
pools: 2 pools, 33 pgs
objects: 1.95k objects, 7.5 GiB
usage: 27 GiB used, 7.3 TiB / 7.3 TiB avail
pgs: 33 active+clean
[root@ceph-node0 ~]# ceph orch ps
NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID
alertmanager.ceph-node0 ceph-node0.linuxquery.org *:9093,9094 running (5w) 2m ago 5w 43.6M - ba2b418f427c 8e45af1c44fa
crash.ceph-node0 ceph-node0.linuxquery.org running (5w) 2m ago 5w 6991k - 17.2.3 44957ee5ff33 4de179c6b8ba
crash.ceph-node1 ceph-node1.linuxquery.org running (5w) 8m ago 5w 7134k - 17.2.3 44957ee5ff33 488bc9a62ad8
crash.ceph-node2 ceph-node2.linuxquery.org running (5w) 2m ago 5w 7130k - 17.2.3 44957ee5ff33 9088993f932b
grafana.ceph-node0 ceph-node0.linuxquery.org*:3000 running (5w) 2m ago 5w 119M - 8.3.5 dad864ee21e9 2c63ce8d19dd
mgr.ceph-node0.linuxquery.org.xtlcez ceph-node0.linuxquery.org *:9283 running (5w) 2m ago 5w 990M - 17.2.3 44957ee5ff33 fa4b9d53eef5
mgr.ceph-node1.msyqqr ceph-node1.linuxquery.org *:8443,9283 running (5w) 8m ago 5w 440M - 17.2.3 44957ee5ff33 f2325ca62e49
mon.ceph-node0.linuxquery.org ceph-node0.linuxquery.org running (5w) 2m ago 5w 510M 2048M 17.2.3 44957ee5ff33 ef12e9e2ef10
mon.ceph-node1 ceph-node1.linuxquery.org running (5w) 8m ago 5w 499M 2048M 17.2.3 44957ee5ff33 c9fa415d9e95
mon.ceph-node2 ceph-node2.theitcloudinfo.net running (5w) 2m ago 5w 492M 2048M 17.2.3 44957ee5ff33 ce3558a7dee2
node-exporter.ceph-node0 ceph-node0.linuxquery.org *:9100 running (5w) 2m ago 5w 60.1M - 1dbe0e931976 53276f0e94fc
node-exporter.ceph-node1 ceph-node1.linuxquery.org *:9100 running (5w) 8m ago 5w 49.1M - 1dbe0e931976 bc5bd78e205e
node-exporter.ceph-node2 ceph-node2.linuxquery.org *:9100 running (5w) 2m ago 5w 40.6M - 1dbe0e931976 3d9f67b86630
osd.0 ceph-node0.linuxquery.org running (5w) 2m ago 5w 1135M 4096M 17.2.3 44957ee5ff33 382cdd964b66
osd.1 ceph-node1.linuxquery.org running (5w) 8m ago 5w 1127M 4096M 17.2.3 44957ee5ff33 021ecddfd452
osd.2 ceph-node2.linuxquery.org running (5w) 2m ago 5w 1398M 4096M 17.2.3 44957ee5ff33 9b4e979c292b
osd.3 ceph-node1.linuxquery.org running (5w) 8m ago 5w 1830M 4096M 17.2.3 44957ee5ff33 b3c7314538fb
osd.4 ceph-node0.linuxquery.org running (5w) 2m ago 5w 915M 4096M 17.2.3 44957ee5ff33 17da1fa27c3b
osd.5 ceph-node2.linuxquery.org running (5w) 2m ago 5w 1267M 4096M 17.2.3 44957ee5ff33 402ac8da875b
osd.6 ceph-node0.linuxquery.org running (5w) 2m ago 5w 2021M 4096M 17.2.3 44957ee5ff33 903b9e857216
osd.7 ceph-node2.linuxquery.org running (5w) 2m ago 5w 1797M 4096M 17.2.3 44957ee5ff33 b8fb20dc44f7
osd.8 ceph-node1.linuxquery.org running (5w) 8m ago 5w 1106M 4096M 17.2.3 44957ee5ff33 a70315f638a0
prometheus.ceph-node0 ceph-node0.linuxquery.org *:9095 running (5w) 2m ago 5w 1835M - 514e6a882f6e 969a144d0f46
This marks the End of the installation and configuration of the 3 Node CEPH Cluster.