16-Feb-2022 18:15
Running OpenShift Container Platform in bhyve containers on OpenIndiana#

Overview#

The scenario is as following, we have two physical machines:
  • sin (runs various stuff like NFS)
  • spin (only runs bhyve VMs for OCP)

Both run OpenIndiana. On "sin" we run multiple zones for LDAP, DNS, DHCP, etc...I'm not going into full detail here.

The host "spin" is actually empty. It's an old Sun X4270 M2 machine with 2 Sockets, 6 Cores X5675 @ 3.07GHz and 144GB RAM. The chassis has twelve disks and I also added two (consumer) NVMe's on PCIe adapters.

Two disks form a ZFS boot mirror (rpool), the other disks form a raidz stripe (localstripe) which has 2 hotspares, one slog device (NVMe) and one l2arc (also NVMe):

root@spin:~# zpool status
  pool: localstripe
 state: ONLINE
  scan: none requested
config:

	NAME                     STATE     READ WRITE CKSUM
	localstripe              ONLINE       0     0     0
	  raidz1-0               ONLINE       0     0     0
	    c8t2d0               ONLINE       0     0     0
	    c8t3d0               ONLINE       0     0     0
	    c8t4d0               ONLINE       0     0     0
	    c8t5d0               ONLINE       0     0     0
	  raidz1-1               ONLINE       0     0     0
	    c8t6d0               ONLINE       0     0     0
	    c8t7d0               ONLINE       0     0     0
	    c8t8d0               ONLINE       0     0     0
	    c8t9d0               ONLINE       0     0     0
	logs	
	  c5t0026B7682D581035d0  ONLINE       0     0     0
	cache
	  c6t0026B7682D1B8DA5d0  ONLINE       0     0     0
	spares
	  c8t10d0                AVAIL   
	  c8t11d0                AVAIL   

errors: No known data errors

  pool: rpool
 state: ONLINE
  scan: resilvered 19.2G in 0 days 00:03:25 with 0 errors on Mon Feb 14 21:29:12 2022
config:

	NAME        STATE     READ WRITE CKSUM
	rpool       ONLINE       0     0     0
	  mirror-0  ONLINE       0     0     0
	    c8t0d0  ONLINE       0     0     0
	    c8t1d0  ONLINE       0     0     0

errors: No known data errors
The disks are standard 600GB SAS II drives.

The machine has a 10G ixgbe NIC, where I configured the VNICs on.
root@spin:~# dladm show-vnic
LINK         OVER         SPEED  MACADDRESS        MACADDRTYPE         VID
spinpub0     ixgbe0       10000  2:8:20:d2:1f:63   random              100
bootstrapint0 ixgbe0      10000  2:8:20:3b:26:2    random              100
master02int0 ixgbe0       10000  2:8:20:22:c1:10   random              100
master00int0 ixgbe0       10000  2:8:20:61:3b:13   random              100
master01int0 ixgbe0       10000  2:8:20:28:b3:a8   random              100
worker02int0 ixgbe0       10000  2:8:20:7c:12:3a   random              100
worker03int0 ixgbe0       10000  2:8:20:62:e:e0    random              100
worker00int0 ixgbe0       10000  2:8:20:ad:2b:6c   random              100
bastionint0  ixgbe0       10000  2:8:20:f9:2e:58   random              100
worker01int0 ixgbe0       10000  2:8:20:12:51:4d   random              100

Setup Zones#

General#

I took an ansible playbook to create the bhyve zones, but to have it more generic I'll show you the zone config here. All zones have the same config, except for bastion, this has less RAM and CPU.

For OpenShift we need to have:

  • 1 bastion node (if you don't have any other linux machine)
  • 1 bootstrap node
  • 1 LoadBalancer zone
  • 3 master servers
  • 2 workers

Bastion#

The bastion will be used to run the OpenShift installer, which is only available for Linux/x64 and MacOS X/x64.

The zone config for the bastion looks like this:

root@spin:~# zonecfg -z bastion export
create -b
set zonepath=/localstripe/zones/bastion
set brand=bhyve
set autoboot=true
set ip-type=exclusive
add fs
set dir="/localstripe/install/rhel-8.4-x86_64-dvd.iso"
set special="/localstripe/install/rhel-8.4-x86_64-dvd.iso"
set type="lofs"
add options ro
add options nodevices
end
add net
set physical="bastionint0"
end
add device
set match="/dev/zvol/rdsk/localstripe/vm/bastiond0"
end
add attr
set name="bootdisk"
set type="string"
set value="localstripe/vm/bastiond0"
end
add attr
set name="vnc"
set type="string"
set value="on"
end
add attr
set name="vcpus"
set type="string"
set value="2"
end
add attr
set name="ram"
set type="string"
set value="6G"
end
add attr
set name="cdrom"
set type="string"
set value="/localstripe/install/rhel-8.4-x86_64-dvd.iso"
end

I took a RedHat Enterprise Linux 8.4 DVD, but CentOS or Fedora would do the job also. This zone gets access to a ZVOL:

root@spin:~# zfs get volsize localstripe/vm/bastiond0
NAME                      PROPERTY  VALUE    SOURCE
localstripe/vm/bastiond0  volsize   20G      local

No other special attributes are set on this, so that's it. Start booting the zone, attach socat to the vnc socket and install Linux the usual way:

root@spin:~# socat TCP-LISTEN:5905,reuseaddr,fork UNIX-CONNECT:/localstripe/zones/bastion/root/tmp/vm.vnc

(1227) x230:/export/home/olbohlen$ vncviewer spin::5905

LoadBalancer#

OpenShift runs on different nodes (bhyve VMs) and we need an external LoadBalancer to access the Kubernetes API and the OpenShift Router. In production environment you would want to make that LoadBalancer HA with VRRP, but in this scenario we go the simple way.

First set up a standard ipkg OI zone (I did that on the other Hardware, "sin"):

root@sin:~# zonecfg -z api export
create -b
set zonepath=/localstripe/zones/api
set brand=ipkg
set autoboot=true
set ip-type=exclusive
add net
set physical="api0"
end
root@sin:~# zoneadm -z api install
[...]
We don't need anything fancy.
Once the api zone is installed, we install the OpenIndiana integrated LoadBalancer (ilb) and start it:
root@api:~# pkg install service/network/load-balancer/ilb
[...]
root@api:~# svcadm enable ilb

The configuration I use looks like the following:

root@api:~# ilbadm export-cf
create-servergroup masters
add-server -s server=172.18.3.10 masters
add-server -s server=172.18.3.20 masters
add-server -s server=172.18.3.30 masters
create-servergroup workers
add-server -s server=172.18.3.50 workers
add-server -s server=172.18.3.60 workers
create-healthcheck -n -h hc-test=tcp,hc-timeout=3,hc-count=3,hc-interval=60 hc-masters
create-healthcheck -n -h hc-test=tcp,hc-timeout=3,hc-count=3,hc-interval=60 hc-workers
create-rule -e -p -i vip=172.18.3.100,port=6443,protocol=tcp -m lbalg=roundrobin,type=NAT,proxy-src=172.18.3.100-172.18.3.100,pmask=/32 -h hc-name=hc-masters,hc-port=6443 -t conn-drain=70,nat-timeout=70,persist-timeout=70 -o servergroup=masters mastersrule
create-rule -e -p -i vip=172.18.3.100,port=22623,protocol=tcp -m lbalg=roundrobin,type=NAT,proxy-src=172.18.3.100-172.18.3.100,pmask=/32 -h hc-name=hc-masters,hc-port=22623 -t conn-drain=70,nat-timeout=70,persist-timeout=70 -o servergroup=masters mcsrule
create-rule -e -p -i vip=172.18.3.100,port=80,protocol=tcp -m lbalg=roundrobin,type=NAT,proxy-src=172.18.3.100-172.18.3.100,pmask=/32 -h hc-name=hc-workers,hc-port=80 -t conn-drain=70,nat-timeout=70,persist-timeout=70 -o servergroup=workers httprule
create-rule -e -p -i vip=172.18.3.100,port=443,protocol=tcp -m lbalg=roundrobin,type=NAT,proxy-src=172.18.3.100-172.18.3.100,pmask=/32 -h hc-name=hc-workers,hc-port=443 -t conn-drain=70,nat-timeout=70,persist-timeout=70 -o servergroup=workers httpsrule

Save this output to a file and import it with "ilbadm import-cf -p filename". I use full NAT here for Load-Balancing as the API loadbalancer and the node are in the same IP range.

bootstrap, master and worker zones#

These zones look identical, just replace the host name:
root@spin:~# zonecfg -z master00 export
create -b
set zonepath=/localstripe/zones/master00
set brand=bhyve
set autoboot=true
set ip-type=exclusive
add net
set physical="master00int0"
end
add device
set match="/dev/zvol/rdsk/localstripe/vm/master00d0"
end
add attr
set name="bootdisk"
set type="string"
set value="localstripe/vm/master00d0"
end
add attr
set name="vnc"
set type="string"
set value="on"
end
add attr
set name="ram"
set type="string"
set value="16G"
end
add attr
set name="vcpus"
set type="string"
set value="4"
end
They all have access to their own ZVOL:
root@spin:~# zfs get volsize localstripe/vm/master00d0
NAME                       PROPERTY  VALUE    SOURCE
localstripe/vm/master00d0  volsize   250G     local

Thankfully bhyve will try PXE if the bootdisk ZVOL is empty, so we don't have to setup additional things here.

Install all the zones with "zoneadm -z zonename install", which should be pretty fast for bhyve zones. Just don't boot them up yet.

We will follow the OpenShift installation instruction for UPI installations, see the appropriate docs for your OpenShift Version on https://docs.openshift.com.

Installing OpenShift#

Download required material#

You need to login to https://cloud.redhat.com, select "OpenShift" and "Create Cluster". There scroll down to "Platform agnostic". This will take you to a page where you have to download:
  • The Openshift installer (openshift-install)
  • The OpenShift client (oc)
  • The pull secret (access tokens for Red Hat registries)

Save these files on the bastion node which we created earlier.

Setup up install-config.yaml#

Create a empty directory as a non-root user somewhere, inside this directory create a file called "install-config.yaml":
[localadm@bastion ~]$ cat install-config.yaml 
apiVersion: v1
baseDomain: home.eenfach.de
compute:
- hyperthreading: Enabled
  name: worker
  replicas: 2
controlPlane:
  hyperthreading: Enabled
  name: master
  replicas: 3
metadata:
  name: ocp4
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16
platform:
  none: {}
fips: false
pullSecret: |
  {paste pull secret in here}
sshKey: |
  {paste a ssh public key in here}

Copy this file into a backup location somewhere, as the openshift-install command will consume the file and delete it. If you need to restart, you can use the backup.

Create Ignition Files#

After creating the install-config.yaml we can process this and create ignition files for Red Hat CoreOS. We first create manifests (which you could review or customize, but we don't) and then we create ignitions:

[localadm@bastion ocp4]$ ls
install-config.yaml
[localadm@bastion ocp4]$ openshift-install create manifests
[...]
[localadm@bastion ocp4]$ openshift-install create ignition-configs
[...]
[localadm@bastion ocp4]$ ls
auth  metadata.json  worker.ign  bootstrap.ign  master.ign

The ignition files (*.ign) need to be pushed to a web-server that is accessible for the OpenShift nodes.

Set up PXE boot#

The relevant part of my dhcpd4.conf looks like this:
subnet 172.18.0.0 netmask 255.255.252.0 {
  range 172.18.3.101 172.18.3.101;
  option routers 172.18.0.200;
  option domain-name "srv.home.eenfach.de";
  option domain-search "srv.home.eenfach.de","home.eenfach.de","eenfach.de";
  option domain-name-servers 172.18.1.53;
  option subnet-mask 255.255.224.0;
  option ntp-servers 192.53.103.108,192.53.103.104,192.53.103.103;
  group {
    filename "shimx64.efi";
    next-server 172.18.1.67;
    host master00.ocp4.home.eenfach.de {
      hardware ethernet 2:8:20:61:3b:13;
      fixed-address 172.18.3.10;
      option host-name "master00";
[...]
    }
  }
}
Of course you want to put all masters, workers and the bootstrap in there also.

Then you need to get the shimx64.efi and grubx64.efi from the RHEL shim-x64.x86_64 rpm. I just installed it on the bastion and copied it to the /tftpboot on my DHCP Server.

The reason is that bhyve boots UEFI only and the classic pxelinux from RHEL only supports BIOS boots, so we use the grub shim boot.

Create grub configs for every MAC.

An example for a Master:

root@skirnir:/tftpboot# cat grub.cfg-01-02-08-20-22-c1-10
menuentry 'Master: Install Red Hat Enterprise Linux CoreOS' --class fedora --class gnu-linux --class gnu --class os {
        linuxefi rhcos-4.6.1-x86_64-live-kernel-x86_64 coreos.inst.install_dev=/dev/vda coreos.live.rootfs_url=http://172.18.1.80/install/ocp4/rhcos-4.6.1-x86_64-live-rootfs.x86_64.img coreos.inst.ignition_url=http://172.18.1.80/install/ocp4/ignitions/master.ign console=tty0 console=ttyS0
        initrdefi rhcos-4.6.1-x86_64-live-initramfs.x86_64.img
}

An example for a worker:

root@skirnir:/tftpboot# cat grub.cfg-01-02-08-20-12-51-4d
menuentry 'Worker: Install Red Hat Enterprise Linux CoreOS' --class fedora --class gnu-linux --class gnu --class os {
        linuxefi rhcos-4.6.1-x86_64-live-kernel-x86_64 coreos.inst.install_dev=/dev/vda coreos.live.rootfs_url=http://172.18.1.80/install/ocp4/rhcos-4.6.1-x86_64-live-rootfs.x86_64.img coreos.inst.ignition_url=http://172.18.1.80/install/ocp4/ignitions/worker.ign console=tty0 console=ttyS0
        initrdefi rhcos-4.6.1-x86_64-live-initramfs.x86_64.img
}

And an example for the bootstrap node:

root@skirnir:/tftpboot# cat grub.cfg-01-02-08-20-3b-26-02
menuentry 'Bootstrap: Install Red Hat Enterprise Linux CoreOS' --class fedora --class gnu-linux --class gnu --class os {
        linuxefi rhcos-4.6.1-x86_64-live-kernel-x86_64 coreos.inst.install_dev=/dev/vda coreos.live.rootfs_url=http://172.18.1.80/install/ocp4/rhcos-4.6.1-x86_64-live-rootfs.x86_64.img coreos.inst.ignition_url=http://172.18.1.80/install/ocp4/ignitions/bootstrap.ign console=tty0 console=ttyS0
        initrdefi rhcos-4.6.1-x86_64-live-initramfs.x86_64.img
}

Ensure that you have rhcos-4.6.1-x86_64-live-initramfs.x86_64.img and rhcos-4.6.1-x86_64-live-kernel-x86_64 in the /tftpboot as well, and that rhcos-4.6.1-x86_64-live-rootfs.x86_64.img is accessible from the specified web url (try with curl for example).

Setting up DNS#

My DNS zone looks like this:

root@voluspa:/var/named# cat ocp4.home.eenfach.de
$TTL    3600
@       IN      SOA     ocp4.home.eenfach.de. hostmaster.eenfach.de.  (
                                2022021500      ; Serial
                                3600            ; Refresh
                                300             ; Retry
                                3600000         ; Expire
                                3600 )          ; Minimum
                IN      NS      voluspa.srv.home.eenfach.de.
bastion         IN      A       172.18.3.1
bootstrap       IN      A       172.18.3.5
master00        IN      A       172.18.3.10
etcd00          IN      A       172.18.3.10
_etcd-server-ssl._tcp.ocp4      IN      SRV     0 10 2380 etcd00
master01        IN      A       172.18.3.20
etcd01          IN      A       172.18.3.20
_etcd-server-ssl._tcp.ocp4      IN      SRV     0 10 2380 etcd01
master02        IN      A       172.18.3.30
etcd02          IN      A       172.18.3.30
_etcd-server-ssl._tcp.ocp4      IN      SRV     0 10 2380 etcd02
worker00        IN      A       172.18.3.50
worker01        IN      A       172.18.3.60
*.apps          IN      A       172.18.3.100
api             IN      A       172.18.3.100
api-int         IN      A       172.18.3.100
dns             IN      CNAME   voluspa.srv.home.eenfach.de.

Ensure that you set up matching PTR records also.

That should be it, we could kick off the installations.

Starting the RHCOS installation#

Boot your bootstrap zone, zlogin -C bootstrap and watch the console. When the grub menu appears, hit return (or put timeout values in the grub.cfg files) and let the boot proceed. The bootstrap will reboot after a while and finally it will show a login prompt.

Now also boot your master zones, also observe the console and wait until they show the login prompt, after that do the same with the workers.

Observing the installation#

On the bastion node, observe the installation with

 openshift-install wait-for bootstrap-complete 

At some point it should say that it is now safe to remove the bootstrap node. Log in to the loadbalancer zone and remove the bootstrap from the masters servergroup. Then shut down the bootstrap.

Again on the bastion run

 openshift-install wait-for install-complete --log-level=debug 

Open another shell to the bastion and watch the cluster:

export KUBECONFIG=$HOME/ocp4/auth/kubeconfig
oc get nodes; oc get csr

You should see the nodes getting read. If CSRs are in a pending state, approve them with "oc adm certificate approve". Depending on your hardware and network this takes some time, but finally your cluster should be ready.

Posted by Olaf Bohlen  Permalink