How to Install CEPH storage cluster on Rocky Linux 8 & Debian 10 – Three nodes CEPH Cluster

Let us see what CEPH cluster is all about and then we proceed on how to install CEPH storage cluster on Rocky Linux 8 & Debian 10 . If you need any support on OS Installation follow the previous blog post.

How to Install Rocky Linux 8

Introduction – What is a CEPH Cluster

CEPH is an open-source software (software-defined storage) storage platform, implements object storage on a single distributed computer cluster, and provides 3-in-1 interfaces for object-block and file-level storage. CEPH aims primarily for completely distributed operation without a single point of failure, scalable to the exabyte level, and freely available.

There are several different ways to install CEPH. Choose the method that best suits your needs. According to CEPH documentation cephadm or Rook (CEPH cluster running in Kubernetes) are recommended methods of installation of CEPH cluster. ceph-ansible is not integrated with the new orchestrator APIs, introduced in Nautilus and Octopus, which means that newer management features and dashboard integration are not available, so it is not recommended. ceph-deploy is no longer actively maintained. It is not tested on versions of CEPH newer than Nautilus. It does not support RHEL8, CentOS8, Rocky 8 or newer operating systems.

Cephadm installs and manages a CEPH cluster using containers and systemd, with tight integration with the CLI and dashboard GUI. Cephadm only supports Octopus and newer releases. Cephadm is fully integrated with the new orchestration API and fully supports the new CLI and dashboard features to manage cluster deployment

Follow the official CEPH documentation for more details

Official CEPH Documentation

Requirements

  • Installed and configured OS
  • Network/Bonding configuration
  • container support (podman or docker)
  • systemd
  • python3
  • Time synchronization (such as chrony or NTP)
  • LVM2 for provisioning storage devices

How to Install CEPH Cluster

The steps required to set up CEPH Cluster using cephadm on Rocky or Debian (the commands common for both systems are not distinguished, the different commands for both systems are notified by the name of the system):

  1. Login as root and add host on /etc/host on each host
cat<<EOF>>/etc/hosts
10.2.93.34 ceph-node0.linuxquery.org
10.2.93.35 ceph-node1.linuxquery.org
10.2.93.36 ceph-node2.linuxquery.org
EOF

2. Set passwordless SSH authentication

ssh-keygen
ssh-copy-id ceph-node0.linuxquery.org
ssh-copy-id ceph-node1.linuxquery.org
ssh-copy-id ceph-node2.linuxquery.org

3. Set hostname for every node of the CEPH cluster:

hostnamectl set-hostname ceph-node0.linuxquery.org
hostnamectl set-hostname ceph-node1.linuxquery.org
hostnamectl set-hostname ceph-node2.linuxquery.org

4. Update and upgrade all the packages

Rocky

# dnf update -y; dnf upgrade -y

Debian

# apt-get update; apt-get upgrade

5. Install python3, lvm2, and podman. In case of Debian version, we recommend choosing docker instead of podman as podman has some problem and wasn’t stable enough.

Rocky

# dnf install -y python3 lvm2 podman

Debian

# apt-get install ca-certificates curl gnupg lsb-release
# mkdir -p /etc/apt/keyrings
# curl -fsSL https://download.docker.com/linux/debian/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
# echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/debian \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
#apt-get update
# apt-get install python3 lvm2 docker-ce docker-ce-cli containerd.io docker-compose-plugin

6. Configure NTP server for every node in CEPH cluster

Rocky

[root@ceph-node0 ~]# vim /etc/chrony.conf
[root@ceph-node0 ~]# systemctl restart chronyd.service
[root@ceph-node0 ~]# chronyc sources
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
^* dilbert.linuxquery.org>     3   6    17     3    +25us[  +20us] +/-   44ms
^+ norbert.linuxquery.org>     3   6    17     3    -29us[  -34us] +/-   48ms
[root@ceph-node0 ~]#

Debian

apt install ntpdate

ntpdate 192.168.X.X

7. Install Cephadm using curl-based installation method

 
#set proper version for CEPH
version=quincy
curl --silent --remote-name --location
https://github.com/ceph/ceph/raw/$version/src/cephadm/cephadm
mv cephadm /usr/local/bin
chmod +x /usr/local/bin/cephadm
mkdir -p /etc/ceph
# add ceph common tools
cephadm add-repo --release $version
cephadm install ceph-common

7. Bootstrap new CEPH cluster

cephadm bootstrap --mon-ip 10.X.X.20 --allow-fqdn-hostname (--cluster-network 172.30.135.0/24)  (--docker)
Example:
cephadm bootstrap --mon-ip 10.X.X.20 –cluster-network 172.30.135.0/24 --allow-fqdn-hostname
Creating directory /etc/ceph for ceph.conf
Verifying podman|docker is present...
Verifying lvm2 is present...
Verifying time synchronization is in place...
Unit chronyd.service is enabled and running
Repeating the final host check...
podman (/usr/bin/podman) version 4.0.2 is present
systemctl is present
lvcreate is present
Unit chronyd.service is enabled and running
Host looks OK
Cluster fsid: 91568d64-1199-11ed-9928-54ab3a712c97
ERROR: hostname is a fully qualified domain name (ceph-node0.linuxquery.org); either fix (e.g., "sudo hostname ceph-node0" or similar) or pass --allow-fqdn-hostname
[root@ceph-node0 ~]# cephadm bootstrap --mon-ip 10.X.X.20 --allow-fqdn-hostname
Verifying podman|docker is present...
Verifying lvm2 is present...
Verifying time synchronization is in place...
Unit chronyd.service is enabled and running
Repeating the final host check...
podman (/usr/bin/podman) version 4.0.2 is present
systemctl is present
lvcreate is present
Unit chronyd.service is enabled and running
Host looks OK
Cluster fsid: c611c0e6-1199-11ed-b066-54ab3a712c97
Verifying IP 10.X.X.20 port 3300 ...
Verifying IP 10.X.X.20 port 6789 ...
Mon IP `10.X.X.20` is in CIDR network `10.X.X.18/27`
Mon IP `10.X.X.20` is in CIDR network `10.X.X.18/27`
Internal network (--cluster-network) has not been provided, OSD replication will default to the public_network
Pulling container image quay.io/ceph/ceph:v17...
Ceph version: ceph version 17.2.3 (dff484dfc9e19a9819f375586300b3b79d80034d) quincy (stable)
Extracting ceph user uid/gid from container image...
Creating initial keys...
Creating initial monmap...
Creating mon...
firewalld ready
Enabling firewalld service ceph-mon in current zone...
Waiting for mon to start...
Waiting for mon...
mon is available
Assimilating anything we can from ceph.conf...
Generating new minimal ceph.conf...
Restarting the monitor...
Setting mon public_network to 10.X.X.18/27
Wrote config to /etc/ceph/ceph.conf
Wrote keyring to /etc/ceph/ceph.client.admin.keyring
Creating mgr...
Verifying port 9283 ...
firewalld ready
Enabling firewalld service ceph in current zone...
firewalld ready
Enabling firewalld port 9283/tcp in current zone...
Waiting for mgr to start...
Waiting for mgr...
mgr not available, waiting (1/15)...
mgr not available, waiting (2/15)...
mgr not available, waiting (3/15)...
mgr is available
Enabling cephadm module...
Waiting for the mgr to restart...
Waiting for mgr epoch 5...
mgr epoch 5 is available
Setting orchestrator backend to cephadm...
Generating ssh key...
Wrote public SSH key to /etc/ceph/ceph.pub
Adding key to root@localhost authorized_keys...
Adding host ceph-node0.linuxquery.org...
Deploying mon service with default placement...
Deploying mgr service with default placement...
Deploying crash service with default placement...
Deploying prometheus service with default placement...
Deploying grafana service with default placement...
Deploying node-exporter service with default placement...
Deploying alertmanager service with default placement...
Enabling the dashboard module...
Waiting for the mgr to restart...
Waiting for mgr epoch 9...
mgr epoch 9 is available
Generating a dashboard self-signed certificate...
Creating initial admin user...
Fetching dashboard port number...
firewalld ready
Enabling firewalld port 8443/tcp in current zone...
Ceph Dashboard is now available at:
URL: https://ceph-node0.linuxquery.org:8443/
User: admin
Password: xyx
Enabling client.admin keyring and conf on hosts with "admin" label
Saving cluster configuration to /var/lib/ceph/c611c0e6-1199-11ed-b066-54ab3a712c97/config directory
Enabling autotune for osd_memory_target
You can access the Ceph CLI as following in case of multi-cluster or non-default config:
sudo /usr/sbin/cephadm shell --fsid c611c0e6-1199-11ed-b066-54ab3a712c97 -c/etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring
Or, if you are only running a single cluster on this host:
sudo /usr/sbin/cephadm shell
Please consider enabling telemetry to help improve Ceph:
ceph telemetry on
For more information see:
https://docs.ceph.com/docs/master/mgr/telemetry/
Bootstrap complete.
[root@ceph-node0 ~]#

8. Check Ceph dashboard, access IP address of https://ceph-node0.linuxquery.org:8443/ and use credentials from the cephadm bootstrap output then set a new password.

9. Verify Ceph CLI. Check status of ceph cluster, OK for [HEALTH_WARN] because OSDs are not added yet.

 

[root@ceph-node0 ~]#
[root@ceph-node0 ~]# ceph -s
cluster:
id:     c611c0e6-1199-11ed-b066-54ab3a712c97
health: HEALTH_WARN
OSD count 0 < osd_pool_default_size 3
services:
mon: 1 daemons, quorum ceph-node0.linuxquery.org (age 25m)
mgr: ceph-node0.linuxquery.org.xtlcez(active, since 23m)
osd: 0 osds: 0 up, 0 in
data:
pools:   0 pools, 0 pgs
objects: 0 objects, 0 B
usage:   0 B used, 0 B / 0 B avail
pgs:
[root@ceph-node0 ~]# ceph -v
ceph version 17.2.3 (dff484dfc9e19a9819f375586300b3b79d80034d) quincy (stable)
[root@ceph-node0 ~]#
[root@ceph-node0 ~]#

10. Verify containers are running for each service and check status for systemd service for each containers

[root@ceph-node0 ~]# podman ps
[root@ceph-node0 ~]# systemctl status ceph-* --no-page

11. Adding hosts to the cluster, you can do this via CLI or via CEPH dashboard:

[root@ceph-node0 ~]#ceph orch host add ceph-node1.linuxquery.org 10.2.93.35
[root@ceph-node0 ~]#ceph orch host add ceph-node2.linuxquery.org 10.2.93.36

12. Deploy OSDs to the cluster

[root@ceph-node0 ~]# ceph orch device ls

HOST                              PATH      TYPE  DEVICE ID                             SIZE  AVAILABLE  REFRESHED  REJECT REASONS
ceph-node0.linuxquery.org  /dev/sdb  ssd   INTEL_SSDSC2BX80_BTHC642408QM800NGN   800G  Yes        7m ago
ceph-node0.linuxquery.org  /dev/sdc  ssd   INTEL_SSDSC2BX80_BTHC642409S6800NGN   800G  Yes        7m ago
ceph-node0.linuxquery.org  /dev/sdd  ssd   INTEL_SSDSC2KG96_BTYG111608R6960CGN   960G  Yes        7m ago
ceph-node1.linuxquery.org  /dev/sdb  ssd   INTEL_SSDSC2BX80_BTHC642408Q1800NGN   800G  Yes        4m ago
ceph-node1.linuxquery.org  /dev/sdd  ssd   INTEL_SSDSC2KG96_BTYG111609JH960CGN   960G  Yes        4m ago
ceph-node2.linuxquery.org  /dev/sdb  ssd   INTEL_SSDSC2KG96_BTYG112101MY960CGN   960G  Yes        3m ago
ceph-node2.linuxquery.org  /dev/sdc  ssd   INTEL_SSDSC2KG96_BTYG111608T1960CGN   960G  Yes        3m ago
ceph-node2.linuxquery.org  /dev/sdd  ssd   INTEL_SSDSC2KG96_BTYG112000LF960CGN   960G  Yes        3m ago
[root@ceph-node0 ~]#
[root@ceph-node0 ~]# ceph orch apply osd --all-available-devices
Scheduled osd.all-available-devices update...
[root@ceph-node0 ~]#
[root@ceph-node0 ~]# ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME            STATUS  REWEIGHT  PRI-AFF
-1         7.27722  root default
-5         2.32867      host ceph-node0
0    ssd  0.72769          osd.0            up   1.00000  1.00000
4    ssd  0.72769          osd.4            up   1.00000  1.00000
6    ssd  0.87329          osd.6            up   1.00000  1.00000
-3         2.32867      host ceph-node1
1    ssd  0.72769          osd.1            up   1.00000  1.00000
3    ssd  0.87329          osd.3            up   1.00000  1.00000
8    ssd  0.72769          osd.8            up   1.00000  1.00000
-7         2.61987      host ceph-node2
2    ssd  0.87329          osd.2            up   1.00000  1.00000
5    ssd  0.87329          osd.5            up   1.00000  1.00000
7    ssd  0.87329          osd.7            up   1.00000  1.00000
[root@ceph-node0 ~]#


[root@ceph-node0 ~]# ceph -s
cluster:
id:     c611c0e6-1199-11ed-b066-54ab3a712c97
health: HEALTH_OK
services:
mon: 3 daemons, quorum ceph-node0.linuxquery.org,ceph-node1,ceph-node2 (age 21h)
mgr: ceph-node0.linuxquery.org.xtlcez(active, since 22h), standbys: ceph-node1.msyqqr
osd: 9 osds: 9 up (since 4h), 9 in (since 4h)
data:
pools:   1 pools, 1 pgs
objects: 2 objects, 1.9 MiB
usage:   84 MiB used, 7.3 TiB / 7.3 TiB avail
pgs:     1 active+clean

13. Deploy ceph-mon (ceph monitor daemon)

[root@ceph-node0 ~]# ceph orch apply mon --placement=" ceph-node0.linuxquery.org,ceph-node1.linuxquery.org,ceph-node2.linuxquery.org"

14. Deploy ceph-mgr (ceph manager daemon)

[root@ceph-node0 ~]# ceph orch apply mgr --placement=" ceph-node0.linuxquery.org,ceph-node1.linuxquery.org,ceph-node2.linuxquery.org"

15. Now verify ceph cluster and see everything OK

[root@ceph-node0 ~]# ceph -s
cluster:
id:     c611c0e6-1199-11ed-b066-54ab3a712c97
health: HEALTH_OK
services:
mon: 3 daemons, quorum ceph-node0.linuxquery.org,ceph-node1,ceph-node2 (age 5w)
mgr: ceph-node0.linuxquery.org.xtlcez(active, since 5w), standbys: ceph-node1.msyqqr
osd: 9 osds: 9 up (since 5w), 9 in (since 5w)
data:
pools:   2 pools, 33 pgs
objects: 1.95k objects, 7.5 GiB
usage:   27 GiB used, 7.3 TiB / 7.3 TiB avail
pgs:     33 active+clean
[root@ceph-node0 ~]# ceph orch ps

NAME                                         HOST                              PORTS        STATUS        REFRESHED  AGE  MEM USE  MEM LIM  VERSION  IMAGE ID      CONTAINER ID
alertmanager.ceph-node0                      ceph-node0.linuxquery.org *:9093,9094  running (5w)     2m ago   5w    43.6M        -           ba2b418f427c  8e45af1c44fa
crash.ceph-node0                             ceph-node0.linuxquery.org               running (5w)     2m ago   5w    6991k        -  17.2.3   44957ee5ff33  4de179c6b8ba
crash.ceph-node1                             ceph-node1.linuxquery.org              running (5w)     8m ago   5w    7134k        -  17.2.3   44957ee5ff33  488bc9a62ad8
crash.ceph-node2                             ceph-node2.linuxquery.org               running (5w)     2m ago   5w    7130k        -  17.2.3   44957ee5ff33  9088993f932b
grafana.ceph-node0                           ceph-node0.linuxquery.org*:3000       running (5w)     2m ago   5w     119M        -  8.3.5    dad864ee21e9  2c63ce8d19dd
mgr.ceph-node0.linuxquery.org.xtlcez  ceph-node0.linuxquery.org  *:9283       running (5w)     2m ago   5w     990M        -  17.2.3   44957ee5ff33  fa4b9d53eef5
mgr.ceph-node1.msyqqr                        ceph-node1.linuxquery.org  *:8443,9283  running (5w)     8m ago   5w     440M        -  17.2.3   44957ee5ff33  f2325ca62e49
mon.ceph-node0.linuxquery.org         ceph-node0.linuxquery.org               running (5w)     2m ago   5w     510M    2048M  17.2.3   44957ee5ff33  ef12e9e2ef10
mon.ceph-node1                               ceph-node1.linuxquery.org               running (5w)     8m ago   5w     499M    2048M  17.2.3   44957ee5ff33  c9fa415d9e95
mon.ceph-node2                               ceph-node2.theitcloudinfo.net               running (5w)     2m ago   5w     492M    2048M  17.2.3   44957ee5ff33  ce3558a7dee2
node-exporter.ceph-node0                     ceph-node0.linuxquery.org  *:9100       running (5w)     2m ago   5w    60.1M        -           1dbe0e931976  53276f0e94fc
node-exporter.ceph-node1                     ceph-node1.linuxquery.org *:9100       running (5w)     8m ago   5w    49.1M        -           1dbe0e931976  bc5bd78e205e
node-exporter.ceph-node2                     ceph-node2.linuxquery.org  *:9100       running (5w)     2m ago   5w    40.6M        -           1dbe0e931976  3d9f67b86630
osd.0                                        ceph-node0.linuxquery.org               running (5w)     2m ago   5w    1135M    4096M  17.2.3   44957ee5ff33  382cdd964b66
osd.1                                        ceph-node1.linuxquery.org              running (5w)     8m ago   5w    1127M    4096M  17.2.3   44957ee5ff33  021ecddfd452
osd.2                                        ceph-node2.linuxquery.org               running (5w)     2m ago   5w    1398M    4096M  17.2.3   44957ee5ff33  9b4e979c292b
osd.3                                        ceph-node1.linuxquery.org              running (5w)     8m ago   5w    1830M    4096M  17.2.3   44957ee5ff33  b3c7314538fb
osd.4                                        ceph-node0.linuxquery.org               running (5w)     2m ago   5w     915M    4096M  17.2.3   44957ee5ff33  17da1fa27c3b
osd.5                                        ceph-node2.linuxquery.org               running (5w)     2m ago   5w    1267M    4096M  17.2.3   44957ee5ff33  402ac8da875b
osd.6                                        ceph-node0.linuxquery.org               running (5w)     2m ago   5w    2021M    4096M  17.2.3   44957ee5ff33  903b9e857216
osd.7                                        ceph-node2.linuxquery.org               running (5w)     2m ago   5w    1797M    4096M  17.2.3   44957ee5ff33  b8fb20dc44f7
osd.8                                        ceph-node1.linuxquery.org               running (5w)     8m ago   5w    1106M    4096M  17.2.3   44957ee5ff33  a70315f638a0
prometheus.ceph-node0                        ceph-node0.linuxquery.org  *:9095       running (5w)     2m ago   5w    1835M        -           514e6a882f6e  969a144d0f46


This marks the End of the installation and configuration of the 3 Node CEPH Cluster.


Leave a comment