Overview#
The scenario is as following, we have two physical machines:- sin (runs various stuff like NFS)
- spin (only runs bhyve VMs for OCP)
Both run OpenIndiana. On "sin" we run multiple zones for LDAP, DNS, DHCP, etc...I'm not going into full detail here.
The host "spin" is actually empty. It's an old Sun X4270 M2 machine with 2 Sockets, 6 Cores X5675 @ 3.07GHz and 144GB RAM. The chassis has twelve disks and I also added two (consumer) NVMe's on PCIe adapters.
Two disks form a ZFS boot mirror (rpool), the other disks form a raidz stripe (localstripe) which has 2 hotspares, one slog device (NVMe) and one l2arc (also NVMe):
root@spin:~# zpool status pool: localstripe state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM localstripe ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 c8t2d0 ONLINE 0 0 0 c8t3d0 ONLINE 0 0 0 c8t4d0 ONLINE 0 0 0 c8t5d0 ONLINE 0 0 0 raidz1-1 ONLINE 0 0 0 c8t6d0 ONLINE 0 0 0 c8t7d0 ONLINE 0 0 0 c8t8d0 ONLINE 0 0 0 c8t9d0 ONLINE 0 0 0 logs c5t0026B7682D581035d0 ONLINE 0 0 0 cache c6t0026B7682D1B8DA5d0 ONLINE 0 0 0 spares c8t10d0 AVAIL c8t11d0 AVAIL errors: No known data errors pool: rpool state: ONLINE scan: resilvered 19.2G in 0 days 00:03:25 with 0 errors on Mon Feb 14 21:29:12 2022 config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c8t0d0 ONLINE 0 0 0 c8t1d0 ONLINE 0 0 0 errors: No known data errorsThe disks are standard 600GB SAS II drives.
The machine has a 10G ixgbe NIC, where I configured the VNICs on.
root@spin:~# dladm show-vnic LINK OVER SPEED MACADDRESS MACADDRTYPE VID spinpub0 ixgbe0 10000 2:8:20:d2:1f:63 random 100 bootstrapint0 ixgbe0 10000 2:8:20:3b:26:2 random 100 master02int0 ixgbe0 10000 2:8:20:22:c1:10 random 100 master00int0 ixgbe0 10000 2:8:20:61:3b:13 random 100 master01int0 ixgbe0 10000 2:8:20:28:b3:a8 random 100 worker02int0 ixgbe0 10000 2:8:20:7c:12:3a random 100 worker03int0 ixgbe0 10000 2:8:20:62:e:e0 random 100 worker00int0 ixgbe0 10000 2:8:20:ad:2b:6c random 100 bastionint0 ixgbe0 10000 2:8:20:f9:2e:58 random 100 worker01int0 ixgbe0 10000 2:8:20:12:51:4d random 100
Setup Zones#
General#
I took an ansible playbook to create the bhyve zones, but to have it more generic I'll show you the zone config here. All zones have the same config, except for bastion, this has less RAM and CPU.For OpenShift we need to have:
- 1 bastion node (if you don't have any other linux machine)
- 1 bootstrap node
- 1 LoadBalancer zone
- 3 master servers
- 2 workers
Bastion#
The bastion will be used to run the OpenShift installer, which is only available for Linux/x64 and MacOS X/x64.The zone config for the bastion looks like this:
root@spin:~# zonecfg -z bastion export create -b set zonepath=/localstripe/zones/bastion set brand=bhyve set autoboot=true set ip-type=exclusive add fs set dir="/localstripe/install/rhel-8.4-x86_64-dvd.iso" set special="/localstripe/install/rhel-8.4-x86_64-dvd.iso" set type="lofs" add options ro add options nodevices end add net set physical="bastionint0" end add device set match="/dev/zvol/rdsk/localstripe/vm/bastiond0" end add attr set name="bootdisk" set type="string" set value="localstripe/vm/bastiond0" end add attr set name="vnc" set type="string" set value="on" end add attr set name="vcpus" set type="string" set value="2" end add attr set name="ram" set type="string" set value="6G" end add attr set name="cdrom" set type="string" set value="/localstripe/install/rhel-8.4-x86_64-dvd.iso" end
I took a RedHat Enterprise Linux 8.4 DVD, but CentOS or Fedora would do the job also. This zone gets access to a ZVOL:
root@spin:~# zfs get volsize localstripe/vm/bastiond0 NAME PROPERTY VALUE SOURCE localstripe/vm/bastiond0 volsize 20G local
No other special attributes are set on this, so that's it. Start booting the zone, attach socat to the vnc socket and install Linux the usual way:
root@spin:~# socat TCP-LISTEN:5905,reuseaddr,fork UNIX-CONNECT:/localstripe/zones/bastion/root/tmp/vm.vnc (1227) x230:/export/home/olbohlen$ vncviewer spin::5905
LoadBalancer#
OpenShift runs on different nodes (bhyve VMs) and we need an external LoadBalancer to access the Kubernetes API and the OpenShift Router. In production environment you would want to make that LoadBalancer HA with VRRP, but in this scenario we go the simple way.First set up a standard ipkg OI zone (I did that on the other Hardware, "sin"):
root@sin:~# zonecfg -z api export create -b set zonepath=/localstripe/zones/api set brand=ipkg set autoboot=true set ip-type=exclusive add net set physical="api0" end root@sin:~# zoneadm -z api install [...]We don't need anything fancy.
Once the api zone is installed, we install the OpenIndiana integrated LoadBalancer (ilb) and start it:
root@api:~# pkg install service/network/load-balancer/ilb [...] root@api:~# svcadm enable ilb
The configuration I use looks like the following:
root@api:~# ilbadm export-cf create-servergroup masters add-server -s server=172.18.3.10 masters add-server -s server=172.18.3.20 masters add-server -s server=172.18.3.30 masters create-servergroup workers add-server -s server=172.18.3.50 workers add-server -s server=172.18.3.60 workers create-healthcheck -n -h hc-test=tcp,hc-timeout=3,hc-count=3,hc-interval=60 hc-masters create-healthcheck -n -h hc-test=tcp,hc-timeout=3,hc-count=3,hc-interval=60 hc-workers create-rule -e -p -i vip=172.18.3.100,port=6443,protocol=tcp -m lbalg=roundrobin,type=NAT,proxy-src=172.18.3.100-172.18.3.100,pmask=/32 -h hc-name=hc-masters,hc-port=6443 -t conn-drain=70,nat-timeout=70,persist-timeout=70 -o servergroup=masters mastersrule create-rule -e -p -i vip=172.18.3.100,port=22623,protocol=tcp -m lbalg=roundrobin,type=NAT,proxy-src=172.18.3.100-172.18.3.100,pmask=/32 -h hc-name=hc-masters,hc-port=22623 -t conn-drain=70,nat-timeout=70,persist-timeout=70 -o servergroup=masters mcsrule create-rule -e -p -i vip=172.18.3.100,port=80,protocol=tcp -m lbalg=roundrobin,type=NAT,proxy-src=172.18.3.100-172.18.3.100,pmask=/32 -h hc-name=hc-workers,hc-port=80 -t conn-drain=70,nat-timeout=70,persist-timeout=70 -o servergroup=workers httprule create-rule -e -p -i vip=172.18.3.100,port=443,protocol=tcp -m lbalg=roundrobin,type=NAT,proxy-src=172.18.3.100-172.18.3.100,pmask=/32 -h hc-name=hc-workers,hc-port=443 -t conn-drain=70,nat-timeout=70,persist-timeout=70 -o servergroup=workers httpsrule
Save this output to a file and import it with "ilbadm import-cf -p filename". I use full NAT here for Load-Balancing as the API loadbalancer and the node are in the same IP range.
bootstrap, master and worker zones#
These zones look identical, just replace the host name:root@spin:~# zonecfg -z master00 export create -b set zonepath=/localstripe/zones/master00 set brand=bhyve set autoboot=true set ip-type=exclusive add net set physical="master00int0" end add device set match="/dev/zvol/rdsk/localstripe/vm/master00d0" end add attr set name="bootdisk" set type="string" set value="localstripe/vm/master00d0" end add attr set name="vnc" set type="string" set value="on" end add attr set name="ram" set type="string" set value="16G" end add attr set name="vcpus" set type="string" set value="4" endThey all have access to their own ZVOL:
root@spin:~# zfs get volsize localstripe/vm/master00d0 NAME PROPERTY VALUE SOURCE localstripe/vm/master00d0 volsize 250G local
Thankfully bhyve will try PXE if the bootdisk ZVOL is empty, so we don't have to setup additional things here.
Install all the zones with "zoneadm -z zonename install", which should be pretty fast for bhyve zones. Just don't boot them up yet.
We will follow the OpenShift installation instruction for UPI installations, see the appropriate docs for your OpenShift Version on https://docs.openshift.com.
Installing OpenShift#
Download required material#
You need to login to https://cloud.redhat.com, select "OpenShift" and "Create Cluster". There scroll down to "Platform agnostic". This will take you to a page where you have to download:- The Openshift installer (openshift-install)
- The OpenShift client (oc)
- The pull secret (access tokens for Red Hat registries)
Save these files on the bastion node which we created earlier.
Setup up install-config.yaml#
Create a empty directory as a non-root user somewhere, inside this directory create a file called "install-config.yaml":[localadm@bastion ~]$ cat install-config.yaml apiVersion: v1 baseDomain: home.eenfach.de compute: - hyperthreading: Enabled name: worker replicas: 2 controlPlane: hyperthreading: Enabled name: master replicas: 3 metadata: name: ocp4 networking: clusterNetwork: - cidr: 10.128.0.0/14 hostPrefix: 23 networkType: OpenShiftSDN serviceNetwork: - 172.30.0.0/16 platform: none: {} fips: false pullSecret: | {paste pull secret in here} sshKey: | {paste a ssh public key in here}
Copy this file into a backup location somewhere, as the openshift-install command will consume the file and delete it. If you need to restart, you can use the backup.
Create Ignition Files#
After creating the install-config.yaml we can process this and create ignition files for Red Hat CoreOS. We first create manifests (which you could review or customize, but we don't) and then we create ignitions:
[localadm@bastion ocp4]$ ls install-config.yaml [localadm@bastion ocp4]$ openshift-install create manifests [...] [localadm@bastion ocp4]$ openshift-install create ignition-configs [...] [localadm@bastion ocp4]$ ls auth metadata.json worker.ign bootstrap.ign master.ign
The ignition files (*.ign) need to be pushed to a web-server that is accessible for the OpenShift nodes.
Set up PXE boot#
The relevant part of my dhcpd4.conf looks like this:subnet 172.18.0.0 netmask 255.255.252.0 { range 172.18.3.101 172.18.3.101; option routers 172.18.0.200; option domain-name "srv.home.eenfach.de"; option domain-search "srv.home.eenfach.de","home.eenfach.de","eenfach.de"; option domain-name-servers 172.18.1.53; option subnet-mask 255.255.224.0; option ntp-servers 192.53.103.108,192.53.103.104,192.53.103.103; group { filename "shimx64.efi"; next-server 172.18.1.67; host master00.ocp4.home.eenfach.de { hardware ethernet 2:8:20:61:3b:13; fixed-address 172.18.3.10; option host-name "master00"; [...] } } }Of course you want to put all masters, workers and the bootstrap in there also.
Then you need to get the shimx64.efi and grubx64.efi from the RHEL shim-x64.x86_64 rpm. I just installed it on the bastion and copied it to the /tftpboot on my DHCP Server.
The reason is that bhyve boots UEFI only and the classic pxelinux from RHEL only supports BIOS boots, so we use the grub shim boot.
Create grub configs for every MAC.
An example for a Master:
root@skirnir:/tftpboot# cat grub.cfg-01-02-08-20-22-c1-10 menuentry 'Master: Install Red Hat Enterprise Linux CoreOS' --class fedora --class gnu-linux --class gnu --class os { linuxefi rhcos-4.6.1-x86_64-live-kernel-x86_64 coreos.inst.install_dev=/dev/vda coreos.live.rootfs_url=http://172.18.1.80/install/ocp4/rhcos-4.6.1-x86_64-live-rootfs.x86_64.img coreos.inst.ignition_url=http://172.18.1.80/install/ocp4/ignitions/master.ign console=tty0 console=ttyS0 initrdefi rhcos-4.6.1-x86_64-live-initramfs.x86_64.img }
An example for a worker:
root@skirnir:/tftpboot# cat grub.cfg-01-02-08-20-12-51-4d menuentry 'Worker: Install Red Hat Enterprise Linux CoreOS' --class fedora --class gnu-linux --class gnu --class os { linuxefi rhcos-4.6.1-x86_64-live-kernel-x86_64 coreos.inst.install_dev=/dev/vda coreos.live.rootfs_url=http://172.18.1.80/install/ocp4/rhcos-4.6.1-x86_64-live-rootfs.x86_64.img coreos.inst.ignition_url=http://172.18.1.80/install/ocp4/ignitions/worker.ign console=tty0 console=ttyS0 initrdefi rhcos-4.6.1-x86_64-live-initramfs.x86_64.img }
And an example for the bootstrap node:
root@skirnir:/tftpboot# cat grub.cfg-01-02-08-20-3b-26-02 menuentry 'Bootstrap: Install Red Hat Enterprise Linux CoreOS' --class fedora --class gnu-linux --class gnu --class os { linuxefi rhcos-4.6.1-x86_64-live-kernel-x86_64 coreos.inst.install_dev=/dev/vda coreos.live.rootfs_url=http://172.18.1.80/install/ocp4/rhcos-4.6.1-x86_64-live-rootfs.x86_64.img coreos.inst.ignition_url=http://172.18.1.80/install/ocp4/ignitions/bootstrap.ign console=tty0 console=ttyS0 initrdefi rhcos-4.6.1-x86_64-live-initramfs.x86_64.img }
Ensure that you have rhcos-4.6.1-x86_64-live-initramfs.x86_64.img and rhcos-4.6.1-x86_64-live-kernel-x86_64 in the /tftpboot as well, and that rhcos-4.6.1-x86_64-live-rootfs.x86_64.img is accessible from the specified web url (try with curl for example).
Setting up DNS#
My DNS zone looks like this:
root@voluspa:/var/named# cat ocp4.home.eenfach.de $TTL 3600 @ IN SOA ocp4.home.eenfach.de. hostmaster.eenfach.de. ( 2022021500 ; Serial 3600 ; Refresh 300 ; Retry 3600000 ; Expire 3600 ) ; Minimum IN NS voluspa.srv.home.eenfach.de. bastion IN A 172.18.3.1 bootstrap IN A 172.18.3.5 master00 IN A 172.18.3.10 etcd00 IN A 172.18.3.10 _etcd-server-ssl._tcp.ocp4 IN SRV 0 10 2380 etcd00 master01 IN A 172.18.3.20 etcd01 IN A 172.18.3.20 _etcd-server-ssl._tcp.ocp4 IN SRV 0 10 2380 etcd01 master02 IN A 172.18.3.30 etcd02 IN A 172.18.3.30 _etcd-server-ssl._tcp.ocp4 IN SRV 0 10 2380 etcd02 worker00 IN A 172.18.3.50 worker01 IN A 172.18.3.60 *.apps IN A 172.18.3.100 api IN A 172.18.3.100 api-int IN A 172.18.3.100 dns IN CNAME voluspa.srv.home.eenfach.de.
Ensure that you set up matching PTR records also.
That should be it, we could kick off the installations.
Starting the RHCOS installation#
Boot your bootstrap zone, zlogin -C bootstrap and watch the console. When the grub menu appears, hit return (or put timeout values in the grub.cfg files) and let the boot proceed. The bootstrap will reboot after a while and finally it will show a login prompt.
Now also boot your master zones, also observe the console and wait until they show the login prompt, after that do the same with the workers.
Observing the installation#
On the bastion node, observe the installation with
openshift-install wait-for bootstrap-complete
At some point it should say that it is now safe to remove the bootstrap node. Log in to the loadbalancer zone and remove the bootstrap from the masters servergroup. Then shut down the bootstrap.
Again on the bastion run
openshift-install wait-for install-complete --log-level=debug
Open another shell to the bastion and watch the cluster:
export KUBECONFIG=$HOME/ocp4/auth/kubeconfig oc get nodes; oc get csr
You should see the nodes getting read. If CSRs are in a pending state, approve them with "oc adm certificate approve". Depending on your hardware and network this takes some time, but finally your cluster should be ready.