!!! Running OpenShift Container Platform in bhyve containers on OpenIndiana
!! Overview
The scenario is as following, we have two physical machines:
* sin (runs various stuff like NFS)
* spin (only runs bhyve VMs for OCP)
Both run [OpenIndiana|https://www.openindiana.org/].
On "sin" we run multiple zones for LDAP, DNS, DHCP, etc...I'm not going into full detail here.
The host "spin" is actually empty. It's an old Sun X4270 M2 machine with 2 Sockets, 6 Cores X5675 @ 3.07GHz and 144GB RAM.
The chassis has twelve disks and I also added two (consumer) NVMe's on PCIe adapters.
Two disks form a ZFS boot mirror (rpool), the other disks form a raidz stripe (localstripe) which has 2 hotspares, one slog device (NVMe) and one l2arc (also NVMe):
{{{root@spin:~# zpool status
pool: localstripe
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
localstripe ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
c8t2d0 ONLINE 0 0 0
c8t3d0 ONLINE 0 0 0
c8t4d0 ONLINE 0 0 0
c8t5d0 ONLINE 0 0 0
raidz1-1 ONLINE 0 0 0
c8t6d0 ONLINE 0 0 0
c8t7d0 ONLINE 0 0 0
c8t8d0 ONLINE 0 0 0
c8t9d0 ONLINE 0 0 0
logs
c5t0026B7682D581035d0 ONLINE 0 0 0
cache
c6t0026B7682D1B8DA5d0 ONLINE 0 0 0
spares
c8t10d0 AVAIL
c8t11d0 AVAIL
errors: No known data errors
pool: rpool
state: ONLINE
scan: resilvered 19.2G in 0 days 00:03:25 with 0 errors on Mon Feb 14 21:29:12 2022
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c8t0d0 ONLINE 0 0 0
c8t1d0 ONLINE 0 0 0
errors: No known data errors
}}}
The disks are standard 600GB SAS II drives.\\
\\
The machine has a 10G ixgbe NIC, where I configured the VNICs on.
{{{
root@spin:~# dladm show-vnic
LINK OVER SPEED MACADDRESS MACADDRTYPE VID
spinpub0 ixgbe0 10000 2:8:20:d2:1f:63 random 100
bootstrapint0 ixgbe0 10000 2:8:20:3b:26:2 random 100
master02int0 ixgbe0 10000 2:8:20:22:c1:10 random 100
master00int0 ixgbe0 10000 2:8:20:61:3b:13 random 100
master01int0 ixgbe0 10000 2:8:20:28:b3:a8 random 100
worker02int0 ixgbe0 10000 2:8:20:7c:12:3a random 100
worker03int0 ixgbe0 10000 2:8:20:62:e:e0 random 100
worker00int0 ixgbe0 10000 2:8:20:ad:2b:6c random 100
bastionint0 ixgbe0 10000 2:8:20:f9:2e:58 random 100
worker01int0 ixgbe0 10000 2:8:20:12:51:4d random 100
}}}
!! Setup Zones
! General
I took an ansible playbook to create the bhyve zones, but to have it more generic I'll show you the zone config here. All zones have the same config, except for bastion, this has less RAM and CPU.
For OpenShift we need to have:
* 1 bastion node (if you don't have any other linux machine)
* 1 bootstrap node
* 1 LoadBalancer zone
* 3 master servers
* 2 workers
! Bastion
The bastion will be used to run the OpenShift installer, which is only available for Linux/x64 and MacOS X/x64.
The zone config for the bastion looks like this:
{{{
root@spin:~# zonecfg -z bastion export
create -b
set zonepath=/localstripe/zones/bastion
set brand=bhyve
set autoboot=true
set ip-type=exclusive
add fs
set dir="/localstripe/install/rhel-8.4-x86_64-dvd.iso"
set special="/localstripe/install/rhel-8.4-x86_64-dvd.iso"
set type="lofs"
add options ro
add options nodevices
end
add net
set physical="bastionint0"
end
add device
set match="/dev/zvol/rdsk/localstripe/vm/bastiond0"
end
add attr
set name="bootdisk"
set type="string"
set value="localstripe/vm/bastiond0"
end
add attr
set name="vnc"
set type="string"
set value="on"
end
add attr
set name="vcpus"
set type="string"
set value="2"
end
add attr
set name="ram"
set type="string"
set value="6G"
end
add attr
set name="cdrom"
set type="string"
set value="/localstripe/install/rhel-8.4-x86_64-dvd.iso"
end
}}}
I took a RedHat Enterprise Linux 8.4 DVD, but CentOS or Fedora would do the job also.
This zone gets access to a ZVOL:
{{{
root@spin:~# zfs get volsize localstripe/vm/bastiond0
NAME PROPERTY VALUE SOURCE
localstripe/vm/bastiond0 volsize 20G local
}}}
No other special attributes are set on this, so that's it.
Start booting the zone, attach socat to the vnc socket and install Linux the usual way:
{{{
root@spin:~# socat TCP-LISTEN:5905,reuseaddr,fork UNIX-CONNECT:/localstripe/zones/bastion/root/tmp/vm.vnc
(1227) x230:/export/home/olbohlen$ vncviewer spin::5905
}}}
! LoadBalancer
OpenShift runs on different nodes (bhyve VMs) and we need an external LoadBalancer to access the Kubernetes API and the OpenShift Router. In production environment you would want to make that LoadBalancer HA with VRRP, but in this scenario we go the simple way.
First set up a standard ipkg OI zone (I did that on the other Hardware, "sin"):
{{{
root@sin:~# zonecfg -z api export
create -b
set zonepath=/localstripe/zones/api
set brand=ipkg
set autoboot=true
set ip-type=exclusive
add net
set physical="api0"
end
root@sin:~# zoneadm -z api install
[...]
}}}
We don't need anything fancy.\\
Once the api zone is installed, we install the OpenIndiana integrated LoadBalancer (ilb) and start it:
{{{
root@api:~# pkg install service/network/load-balancer/ilb
[...]
root@api:~# svcadm enable ilb
}}}
The configuration I use looks like the following:
{{{
root@api:~# ilbadm export-cf
create-servergroup masters
add-server -s server=172.18.3.10 masters
add-server -s server=172.18.3.20 masters
add-server -s server=172.18.3.30 masters
create-servergroup workers
add-server -s server=172.18.3.50 workers
add-server -s server=172.18.3.60 workers
create-healthcheck -n -h hc-test=tcp,hc-timeout=3,hc-count=3,hc-interval=60 hc-masters
create-healthcheck -n -h hc-test=tcp,hc-timeout=3,hc-count=3,hc-interval=60 hc-workers
create-rule -e -p -i vip=172.18.3.100,port=6443,protocol=tcp -m lbalg=roundrobin,type=NAT,proxy-src=172.18.3.100-172.18.3.100,pmask=/32 -h hc-name=hc-masters,hc-port=6443 -t conn-drain=70,nat-timeout=70,persist-timeout=70 -o servergroup=masters mastersrule
create-rule -e -p -i vip=172.18.3.100,port=22623,protocol=tcp -m lbalg=roundrobin,type=NAT,proxy-src=172.18.3.100-172.18.3.100,pmask=/32 -h hc-name=hc-masters,hc-port=22623 -t conn-drain=70,nat-timeout=70,persist-timeout=70 -o servergroup=masters mcsrule
create-rule -e -p -i vip=172.18.3.100,port=80,protocol=tcp -m lbalg=roundrobin,type=NAT,proxy-src=172.18.3.100-172.18.3.100,pmask=/32 -h hc-name=hc-workers,hc-port=80 -t conn-drain=70,nat-timeout=70,persist-timeout=70 -o servergroup=workers httprule
create-rule -e -p -i vip=172.18.3.100,port=443,protocol=tcp -m lbalg=roundrobin,type=NAT,proxy-src=172.18.3.100-172.18.3.100,pmask=/32 -h hc-name=hc-workers,hc-port=443 -t conn-drain=70,nat-timeout=70,persist-timeout=70 -o servergroup=workers httpsrule
}}}
Save this output to a file and import it with "ilbadm import-cf -p filename".
I use full NAT here for Load-Balancing as the API loadbalancer and the node are in the same IP range.
! bootstrap, master and worker zones
These zones look identical, just replace the host name:
{{{
root@spin:~# zonecfg -z master00 export
create -b
set zonepath=/localstripe/zones/master00
set brand=bhyve
set autoboot=true
set ip-type=exclusive
add net
set physical="master00int0"
end
add device
set match="/dev/zvol/rdsk/localstripe/vm/master00d0"
end
add attr
set name="bootdisk"
set type="string"
set value="localstripe/vm/master00d0"
end
add attr
set name="vnc"
set type="string"
set value="on"
end
add attr
set name="ram"
set type="string"
set value="16G"
end
add attr
set name="vcpus"
set type="string"
set value="4"
end
}}}
They all have access to their own ZVOL:
{{{
root@spin:~# zfs get volsize localstripe/vm/master00d0
NAME PROPERTY VALUE SOURCE
localstripe/vm/master00d0 volsize 250G local
}}}
Thankfully bhyve will try PXE if the bootdisk ZVOL is empty, so we don't have to setup additional things here.
Install all the zones with "zoneadm -z zonename install", which should be pretty fast for bhyve zones. Just don't boot them up yet.
We will follow the OpenShift installation instruction for UPI installations, see the appropriate docs for your OpenShift Version on [https://docs.openshift.com].
!! Installing OpenShift
! Download required material
You need to login to [https://cloud.redhat.com], select "OpenShift" and "Create Cluster". There scroll down to "Platform agnostic". This will take you to a page where you have to download:
* The Openshift installer (openshift-install)
* The OpenShift client (oc)
* The pull secret (access tokens for Red Hat registries)
Save these files on the bastion node which we created earlier.
! Setup up install-config.yaml
Create a empty directory as a non-root user somewhere, inside this directory create a file called "install-config.yaml":
{{{
[localadm@bastion ~]$ cat install-config.yaml
apiVersion: v1
baseDomain: home.eenfach.de
compute:
- hyperthreading: Enabled
name: worker
replicas: 2
controlPlane:
hyperthreading: Enabled
name: master
replicas: 3
metadata:
name: ocp4
networking:
clusterNetwork:
- cidr: 10.128.0.0/14
hostPrefix: 23
networkType: OpenShiftSDN
serviceNetwork:
- 172.30.0.0/16
platform:
none: {}
fips: false
pullSecret: |
{paste pull secret in here}
sshKey: |
{paste a ssh public key in here}
}}}
Copy this file into a backup location somewhere, as the openshift-install command will consume the file and delete it. If you need to restart, you can use the backup.
! Create Ignition Files
After creating the install-config.yaml we can process this and create ignition files for Red Hat CoreOS. We first create manifests (which you could review or customize, but we don't) and then we create ignitions:
{{{
[localadm@bastion ocp4]$ ls
install-config.yaml
[localadm@bastion ocp4]$ openshift-install create manifests
[...]
[localadm@bastion ocp4]$ openshift-install create ignition-configs
[...]
[localadm@bastion ocp4]$ ls
auth metadata.json worker.ign bootstrap.ign master.ign
}}}
The ignition files (*.ign) need to be pushed to a web-server that is accessible for the OpenShift nodes.
! Set up PXE boot
The relevant part of my dhcpd4.conf looks like this:
{{{
subnet 172.18.0.0 netmask 255.255.252.0 {
range 172.18.3.101 172.18.3.101;
option routers 172.18.0.200;
option domain-name "srv.home.eenfach.de";
option domain-search "srv.home.eenfach.de","home.eenfach.de","eenfach.de";
option domain-name-servers 172.18.1.53;
option subnet-mask 255.255.224.0;
option ntp-servers 192.53.103.108,192.53.103.104,192.53.103.103;
group {
filename "shimx64.efi";
next-server 172.18.1.67;
host master00.ocp4.home.eenfach.de {
hardware ethernet 2:8:20:61:3b:13;
fixed-address 172.18.3.10;
option host-name "master00";
[...]
}
}
}
}}}
Of course you want to put all masters, workers and the bootstrap in there also.
Then you need to get the shimx64.efi and grubx64.efi from the RHEL shim-x64.x86_64 rpm. I just installed it on the bastion and copied it to the /tftpboot on my DHCP Server.
The reason is that bhyve boots UEFI only and the classic pxelinux from RHEL only supports BIOS boots, so we use the grub shim boot.
Create grub configs for every MAC.
An example for a Master:
{{{
root@skirnir:/tftpboot# cat grub.cfg-01-02-08-20-22-c1-10
menuentry 'Master: Install Red Hat Enterprise Linux CoreOS' --class fedora --class gnu-linux --class gnu --class os {
linuxefi rhcos-4.6.1-x86_64-live-kernel-x86_64 coreos.inst.install_dev=/dev/vda coreos.live.rootfs_url=http://172.18.1.80/install/ocp4/rhcos-4.6.1-x86_64-live-rootfs.x86_64.img coreos.inst.ignition_url=http://172.18.1.80/install/ocp4/ignitions/master.ign console=tty0 console=ttyS0
initrdefi rhcos-4.6.1-x86_64-live-initramfs.x86_64.img
}
}}}
An example for a worker:
{{{
root@skirnir:/tftpboot# cat grub.cfg-01-02-08-20-12-51-4d
menuentry 'Worker: Install Red Hat Enterprise Linux CoreOS' --class fedora --class gnu-linux --class gnu --class os {
linuxefi rhcos-4.6.1-x86_64-live-kernel-x86_64 coreos.inst.install_dev=/dev/vda coreos.live.rootfs_url=http://172.18.1.80/install/ocp4/rhcos-4.6.1-x86_64-live-rootfs.x86_64.img coreos.inst.ignition_url=http://172.18.1.80/install/ocp4/ignitions/worker.ign console=tty0 console=ttyS0
initrdefi rhcos-4.6.1-x86_64-live-initramfs.x86_64.img
}
}}}
And an example for the bootstrap node:
{{{
root@skirnir:/tftpboot# cat grub.cfg-01-02-08-20-3b-26-02
menuentry 'Bootstrap: Install Red Hat Enterprise Linux CoreOS' --class fedora --class gnu-linux --class gnu --class os {
linuxefi rhcos-4.6.1-x86_64-live-kernel-x86_64 coreos.inst.install_dev=/dev/vda coreos.live.rootfs_url=http://172.18.1.80/install/ocp4/rhcos-4.6.1-x86_64-live-rootfs.x86_64.img coreos.inst.ignition_url=http://172.18.1.80/install/ocp4/ignitions/bootstrap.ign console=tty0 console=ttyS0
initrdefi rhcos-4.6.1-x86_64-live-initramfs.x86_64.img
}
}}}
Ensure that you have rhcos-4.6.1-x86_64-live-initramfs.x86_64.img and rhcos-4.6.1-x86_64-live-kernel-x86_64 in the /tftpboot as well, and that rhcos-4.6.1-x86_64-live-rootfs.x86_64.img is accessible from the specified web url (try with curl for example).
! Setting up DNS
My DNS zone looks like this:
{{{
root@voluspa:/var/named# cat ocp4.home.eenfach.de
$TTL 3600
@ IN SOA ocp4.home.eenfach.de. hostmaster.eenfach.de. (
2022021500 ; Serial
3600 ; Refresh
300 ; Retry
3600000 ; Expire
3600 ) ; Minimum
IN NS voluspa.srv.home.eenfach.de.
bastion IN A 172.18.3.1
bootstrap IN A 172.18.3.5
master00 IN A 172.18.3.10
etcd00 IN A 172.18.3.10
_etcd-server-ssl._tcp.ocp4 IN SRV 0 10 2380 etcd00
master01 IN A 172.18.3.20
etcd01 IN A 172.18.3.20
_etcd-server-ssl._tcp.ocp4 IN SRV 0 10 2380 etcd01
master02 IN A 172.18.3.30
etcd02 IN A 172.18.3.30
_etcd-server-ssl._tcp.ocp4 IN SRV 0 10 2380 etcd02
worker00 IN A 172.18.3.50
worker01 IN A 172.18.3.60
*.apps IN A 172.18.3.100
api IN A 172.18.3.100
api-int IN A 172.18.3.100
dns IN CNAME voluspa.srv.home.eenfach.de.
}}}
Ensure that you set up matching PTR records also.
That should be it, we could kick off the installations.
! Starting the RHCOS installation
Boot your bootstrap zone, zlogin -C bootstrap and watch the console.
When the grub menu appears, hit return (or put timeout values in the grub.cfg files) and let the boot proceed. The bootstrap will reboot after a while and finally it will show a login prompt.
Now also boot your master zones, also observe the console and wait until they show the login prompt, after that do the same with the workers.
! Observing the installation
On the bastion node, observe the installation with
{{{ openshift-install wait-for bootstrap-complete }}}
At some point it should say that it is now safe to remove the bootstrap node. Log in to the loadbalancer zone and remove the bootstrap from the masters servergroup.
Then shut down the bootstrap.
Again on the bastion run
{{{ openshift-install wait-for install-complete --log-level=debug }}}
Open another shell to the bastion and watch the cluster:
{{{
export KUBECONFIG=$HOME/ocp4/auth/kubeconfig
oc get nodes; oc get csr
}}}
You should see the nodes getting read. If CSRs are in a pending state, approve them with "oc adm certificate approve".
Depending on your hardware and network this takes some time, but finally your cluster should be ready.