A regular filesystem is not intended to be mounted on more than one server at a time. And doing otherwise can lead to serious inconsistencies, damaging its logical structure. For instance, being unaware of each other’s activities, two servers may try allocating the same block of storage to different files, relying on the information about free blocks loaded into their memory. Or certain blocks may already be modified by one server, and others will ignore this fact and use the outdated content instead. Such a problem can be addressed by using a clustered filesystem.
A clustered filesystem can be mounted on multiple servers at once, while being accessed by them on the block level and managed as a unified entity. It puts together the available storage capacity and shares it between the servers. At the same time, discrepancies are eliminated, since each server stays in sync with the actual filesystem state, as if all their applications were running on the same machine.
The clustered filesystem itself coordinates input/output operations and may lock them to avoid the so-called collisions.
There are several clustered file systems, including OCFS2.
Share a block storage on OCI
- create a block volume (for instance here volNicoTest)
- create or choose 3 VM instances that will share this block volume (here instance-nicotest2, instance-nicotest3, and instance-nicotest4, with Oracle Linux 8.6 as an OS)
- Attach this block volume to each instances :
- Use the OCI console to manually attach the volume to each instance using the option “read/write shareable”
- ssh on each instance in order to execute the iSCSI attach commands provided by OCI. You can find these commands on each link “volume – instance” via the 3 dots menu
- Now your volNicoTest volume is attached to each instances!
- Create 2 ingress rules in the subnet of your instances:
And on each instance authorize traffic on the 7777 port:
firewall-cmd --add-port=7777/tcp --add-port=7777/udp --permanent
firewall-cmd --reload
Now consider you have the following instances with private ip:
- instance-nicotest2, 10.0.6.199
- instance-nicotest3, 10.0.6.239
- instance-nicotest4, 10.0.6.100
The following operations need to be executed on each instances:
firewall-cmd --add-port=7777/tcp --add-port=7777/udp --permanent
firewall-cmd --reload
dnf install ocfs2-tools
o2cb add-cluster ocfs2
# ocfs2 is the name you choose for your cluster, you can give it a different name
# this command cause the file /etc/ocfs2/cluster.conf to be created
/sbin/o2cb.init configure
The /sbin/o2cb.init configure you have to choose as follow (on each instance it’s the same):
[root@instance-nicotest2 ~]# /sbin/o2cb.init configure
Configuring the O2CB driver.
This will configure the on-boot properties of the O2CB driver.
The following questions will determine whether the driver is loaded on
boot. The current values will be shown in brackets ('[]'). Hitting
<ENTER> without typing an answer will keep that current value. Ctrl-C
will abort.
Load O2CB driver on boot (y/n) [n]: y
Cluster stack backing O2CB [o2cb]:
Cluster to start on boot (Enter "none" to clear) [ocfs2]: ocfs2
Specify heartbeat dead threshold (>=7) [31]:
Specify network idle timeout in ms (>=5000) [30000]:
Specify network keepalive delay in ms (>=1000) [2000]:
Specify network reconnect delay in ms (>=2000) [2000]:
Writing O2CB configuration: OK
checking debugfs...
Loading stack plugin "o2cb": OK
Loading filesystem "ocfs2_dlmfs": OK
Creating directory '/dlm': OK
Mounting ocfs2_dlmfs filesystem at /dlm: OK
Setting cluster stack "o2cb": OK
Registering O2CB cluster "ocfs2": OK
Setting O2CB cluster timeouts : OK
→ cela a généré le fichier /etc/sysconfig/o2cb
Still on each nodes execute the following commands:
service o2cb start
service ocfs2 start
systemctl enable o2cb
systemctl enable ocfs2
sysctl -w kernel.panic=30
sysctl -w kernel.panic_on_oops=1
echo "kernel.panic=30" >> /etc/sysctl.d/99-yourcompanyname.conf
echo "kernel.panic_on_oops=1" >> /etc/sysctl.d/99-yourcompanyname.conf
o2cb register-cluster ocfs2
The following commands must be executed on only one instance, for instance on instance-nicotest2:
o2cb add-node ocfs2 instance-nicotest2 --ip 10.0.6.199
o2cb add-node ocfs2 instance-nicotest3 --ip 10.0.6.239
o2cb add-node ocfs2 instance-nicotest4 --ip 10.0.6.100
These commands fill the cluster conf file /etc/ocfs2/cluster.conf
[root@instance-nicotest2 ~]# cat /etc/ocfs2/cluster.conf
cluster:
heartbeat_mode = local
node_count = 3
name = ocfs2
node:
number = 0
cluster = ocfs2
ip_port = 7777
ip_address = 10.0.6.199
name = instance-nicotest2
node:
number = 1
cluster = ocfs2
ip_port = 7777
ip_address = 10.0.6.239
name = instance-nicotest3
node:
number = 2
cluster = ocfs2
ip_port = 7777
ip_address = 10.0.6.100
name = instance-nicotest4
Copy the content of /etc/ocfs2/cluster.conf from instance-nicotest2 to /etc/ocfs2/cluster.conf in instance-nicotest3, and instance-nicotest4
And then restart o2cb and ocfs2 services on instance-nicotest3 and instance-nicotest4:
service o2cb restart
service ocfs2 reload
o2cb start-heartbeat ocfs2
Now we have to format our shared storage with ocfs2, we do this only once on one of these instance, for instance on instance-nicotest2:
mkfs.ocfs2 -L "my_ocfs2_vol" /dev/sdb -N 32
# Take some time on "Formatting Journals:" , this is the time for disk format
# my_ocfs2_vol : is just a label we chose
# /dev/sdb is your disk, or if you have a partition on this disk, maybe you have /dev/sdb1. But a partition is not mandatory, it depends on your need
When you create the file system, it is important to consider the number of node slots you will need now and also in the future. This controls the number of nodes that can concurrently mount a volume. While you can add node slots in the future, there is a performance impact that may occur if they are added later due to the possibility of the slots being added at the far edge of the disk platter.
-N 32 reserves 32 slots for potential 32 nodes
[root@instance-nicotest2 ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 46.6G 0 disk
├─sda1 8:1 0 100M 0 part /boot/efi
├─sda2 8:2 0 1G 0 part /boot
└─sda3 8:3 0 45.5G 0 part
├─ocivolume-root 252:0 0 35.5G 0 lvm /
└─ocivolume-oled 252:1 0 10G 0 lvm /var/oled
sdb 8:16 0 70G 0 disk
Now we identifiy the uuid of this disk with the blkid command:
[root@instance-nicotest2 ~]# blkid
/dev/sda1: SEC_TYPE="msdos" UUID="2314-8847" BLOCK_SIZE="512" TYPE="vfat" PARTLABEL="EFI System Partition" PARTUUID="ds454dsf456fd-b1a6-436a-87e0-ds454dsf456fd"
/dev/sda2: UUID="ds454dsf456fd-d067-48a1-84c0-ds454dsf456fd" BLOCK_SIZE="4096" TYPE="xfs" PARTUUID="f3ab92a7-63c7-479d-a813-0b2afcd9a7ff"
/dev/sda3: UUID="ds454dsf456fd-49vg-4Ces-d456-SDSQD-SDSQD-nLQwgd" TYPE="LVM2_member" PARTUUID="ds454dsf456fd-903e-4fdc-bf40-ds454dsf456fd"
/dev/mapper/ocivolume-root: UUID="b7dfdfdf-adfdf2-dfdfdf-a47e-df45dfs45" BLOCK_SIZE="4096" TYPE="xfs"
/dev/mapper/ocivolume-oled: UUID="df45dfs45-adfdf2-dfdfdf-a47e-df45dfs45" BLOCK_SIZE="4096" TYPE="xfs"
/dev/sdb: LABEL="my_ocfs2_vol" UUID="f985240d-f83c-493c-871e-d0ec89a6a529" BLOCK_SIZE="4096" TYPE="ocfs2"
Now on each node:
mkdir /sharedstorage
echo "UUID=f985240d-f83c-493c-871e-d0ec89a6a529 /sharedstorage ocfs2 _netdev,defaults 0 0" >> /etc/fstab
mount -a
Can’t mount? This node could not connect to nodes
dmesg -T
[Thu Oct 12 12:37:54 2023] o2net: Connection to node instance-nicotest2 (num 0) at 10.0.6.199:7777 shutdown, state 7
[Thu Oct 12 12:37:55 2023] o2cb: This node could not connect to nodes:
[Thu Oct 12 12:37:55 2023] 0
[Thu Oct 12 12:37:55 2023] .
[Thu Oct 12 12:37:55 2023] o2cb: Cluster check failed. Fix errors before retrying.
[Thu Oct 12 12:37:55 2023] (mount.ocfs2,59941,1):ocfs2_dlm_init:3355 ERROR: status = -107
[Thu Oct 12 12:37:55 2023] (mount.ocfs2,59941,0):ocfs2_mount_volume:1803 ERROR: status = -107
[Thu Oct 12 12:37:55 2023] (mount.ocfs2,59941,0):ocfs2_fill_super:1177 ERROR: status = -107
In this case you need to restart o2cb and ocfs2 services
Add a node to your clustered filesystem
For instance, if you have a new instance-nicotest5 with private ip 10.0.6.200, then you should ssh to this instance and:
firewall-cmd --add-port=7777/tcp --add-port=7777/udp --permanent
firewall-cmd --reload
dnf install ocfs2-tools
o2cb add-cluster ocfs2 #
Copy # /etc/ocfs2/cluster.conf from another instance.
o2cb add-node ocfs2 instance-nicotest5 --ip 10.0.6.200
/sbin/o2cb.init configure
service o2cb start
service ocfs2 start
systemctl enable o2cb
systemctl enable ocfs2
echo "kernel.panic=30" >> /etc/sysctl.d/99-novrh.conf
echo "kernel.panic_on_oops=1" >> /etc/sysctl.d/99-novrh.conf
o2cb register-cluster ocfs2
mkdir /sharedstorage
echo "UUID=f985240d-f83c-493c-871e-d0ec89a6a529 /sharedstorage ocfs2 _netdev,defaults 0 0" >> /etc/fstab
mount -a
And on each other instance:
o2cb add-node ocfs2 instance-nicotest5 --ip 10.0.6.200
service o2cb restart
service ocfs2 restart
Increase your shared volume size
Go to OCI console, and list the block volumes. Modify your volume size.
OCI give you rescan commands that you must execute on each instances:
sudo dd iflag=direct if=/dev/oracleoci/oraclevd<paste device suffix here> of=/dev/null count=1
echo "1" | sudo tee /sys/class/block/`readlink /dev/oracleoci/oraclevd<paste device suffix here> | cut -d'/' -f 2`/device/rescan
To replace <paste device suffix here> just find it with:
[root@instance-nicotest2 ~]# ll /dev/oracleoci/oraclevd*
lrwxrwxrwx. 1 root root 6 Oct 13 12:30 /dev/oracleoci/oraclevda -> ../sdb
lrwxrwxrwx. 1 root root 7 Oct 6 13:22 /dev/oracleoci/oraclevda1 -> ../sda1
lrwxrwxrwx. 1 root root 7 Oct 6 13:22 /dev/oracleoci/oraclevda2 -> ../sda2
lrwxrwxrwx. 1 root root 7 Oct 6 13:22 /dev/oracleoci/oraclevda3 -> ../sda3
Grow your filesystem:
tunefs.ocfs2 -S /dev/sdb
# or tunefs.ocfs2 --volume-size /dev/sdb
be careful, lsblk doesn’t show if your filesystem has grown, you must use df command:
[root@instance-nicotest2 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 4.7G 0 4.7G 0% /dev
tmpfs 4.8G 0 4.8G 0% /dev/shm
tmpfs 4.8G 97M 4.7G 2% /run
tmpfs 4.8G 0 4.8G 0% /sys/fs/cgroup
/dev/mapper/ocivolume-root 36G 9.3G 27G 27% /
/dev/sda2 1014M 314M 701M 31% /boot
/dev/sda1 100M 5.1M 95M 6% /boot/efi
/dev/mapper/ocivolume-oled 10G 122M 9.9G 2% /var/oled
tmpfs 967M 0 967M 0% /run/user/0
tmpfs 967M 0 967M 0% /run/user/987
tmpfs 967M 0 967M 0% /run/user/1000
/dev/sdb 100G 2.1G 98G 2% /sharedstorage
Leave a Reply