Solved: Re: Open shift 4.6 deployment failed

TheLinuxguy18 · ‎09-28-2021

Please I deployed open shift 4.6 cluster using UPI method on vmware enviroment

I have 1 bootstrap and 3 masters an 2 workers

When the deployment finished I check the the below command

#openshift-install --dir ocp4 wait-for bootstrap-complete --log-level=debug

and it gives the error below and take too much time to finish with failure

So please how to investigate to fix the issue

DEBUG Still waiting for the Kubernetes API: Get "https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s": EOF
DEBUG Still waiting for the Kubernetes API: Get "https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s": EOF
DEBUG Still waiting for the Kubernetes API: Get "https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s": EOF
DEBUG Still waiting for the Kubernetes API: Get "https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s": EOF
DEBUG Still waiting for the Kubernetes API: Get "https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s": net/http: TLS handshake timeout
DEBUG Still waiting for the Kubernetes API: Get "https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s": EOF
DEBUG Still waiting for the Kubernetes API: Get "https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s": EOF
DEBUG Still waiting for the Kubernetes API: Get "https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s": EOF
DEBUG Still waiting for the Kubernetes API: Get "https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s": EOF
DEBUG Still waiting for the Kubernetes API: Get "https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s": EOF
DEBUG Still waiting for the Kubernetes API: Get "https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s": EOF
DEBUG Still waiting for the Kubernetes API: Get "https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s": EOF
DEBUG Still waiting for the Kubernetes API: Get "https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s": EOF
DEBUG Still waiting for the Kubernetes API: Get "https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s": EOF
DEBUG Still waiting for the Kubernetes API: Get "https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s": EOF
ERROR Attempted to gather ClusterOperator status after wait failure: listing ClusterOperator objects: Get "https://api.cluster-jkt01.consultme.info:6443/apis/config.openshift.io/v1/clusterope
INFO Use the following commands to gather logs from the cluster
INFO openshift-install gather bootstrap --help
FATAL failed waiting for Kubernetes API: Get "https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s": EOF

[root@helper ocp4]# vi .openshift_install.log
time="2021-09-28T10:33:12+03:00" level=debug msg="Still waiting for the Kubernetes API: Get \"https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s\": EOF"
time="2021-09-28T10:33:42+03:00" level=debug msg="Still waiting for the Kubernetes API: Get \"https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s\": EOF"
time="2021-09-28T10:34:12+03:00" level=debug msg="Still waiting for the Kubernetes API: Get \"https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s\": EOF"
time="2021-09-28T10:34:42+03:00" level=debug msg="Still waiting for the Kubernetes API: Get \"https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s\": EOF"
time="2021-09-28T10:35:12+03:00" level=debug msg="Still waiting for the Kubernetes API: Get \"https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s\": EOF"
time="2021-09-28T10:35:42+03:00" level=debug msg="Still waiting for the Kubernetes API: Get \"https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s\": EOF"
time="2021-09-28T10:36:12+03:00" level=debug msg="Still waiting for the Kubernetes API: Get \"https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s\": EOF"
time="2021-09-28T10:36:21+03:00" level=error msg="Attempted to gather ClusterOperator status after wait failure: listing ClusterOperator objects: Get \"https://api.cluster-jkt01.consultme.info:6443/apis/config.openshift.io/v1/clusteroperators\": EOF"
time="2021-09-28T10:36:21+03:00" level=info msg="Use the following commands to gather logs from the cluster"
time="2021-09-28T10:36:21+03:00" level=info msg="openshift-install gather bootstrap --help"
time="2021-09-28T10:36:21+03:00" level=fatal msg="failed waiting for Kubernetes API: Get \"https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s\": EOF"
time="2021-09-28T11:08:35+03:00" level=debug msg="OpenShift Installer 4.6.1"
time="2021-09-28T11:08:35+03:00" level=debug msg="Built from commit ebdbda57fc18d3b73e69f0f2cc499ddfca7e6593"
time="2021-09-28T11:08:35+03:00" level=info msg="Waiting up to 20m0s for the Kubernetes API at https://api.cluster-jkt01.consultme.info:6443..."
time="2021-09-28T11:08:35+03:00" level=debug msg="Still waiting for the Kubernetes API: Get \"https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s\": EOF"
time="2021-09-28T11:09:05+03:00" level=debug msg="Still waiting for the Kubernetes API: Get \"https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s\": EOF"
time="2021-09-28T11:09:35+03:00" level=debug msg="Still waiting for the Kubernetes API: Get \"https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s\": EOF"
time="2021-09-28T11:10:06+03:00" level=debug msg="Still waiting for the Kubernetes API: Get \"https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s\": EOF"
time="2021-09-28T11:10:20+03:00" level=debug msg="Still waiting for the Kubernetes API: Get \"https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s\": net/http: TLS handshake timeout"
time="2021-09-28T11:10:37+03:00" level=debug msg="Still waiting for the Kubernetes API: Get \"https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s\": EOF"
time="2021-09-28T11:11:07+03:00" level=debug msg="Still waiting for the Kubernetes API: Get \"https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s\": EOF"
time="2021-09-28T11:11:37+03:00" level=debug msg="Still waiting for the Kubernetes API: Get \"https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s\": EOF"
time="2021-09-28T11:12:07+03:00" level=debug msg="Still waiting for the Kubernetes API: Get \"https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s\": EOF"
time="2021-09-28T11:12:37+03:00" level=debug msg="Still waiting for the Kubernetes API: Get \"https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s\": EOF"
time="2021-09-28T11:13:08+03:00" level=debug msg="Still waiting for the Kubernetes API: Get \"https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s\": EOF"
time="2021-09-28T11:13:38+03:00" level=debug msg="Still waiting for the Kubernetes API: Get \"https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s\": EOF"
time="2021-09-28T11:14:08+03:00" level=debug msg="Still waiting for the Kubernetes API: Get \"https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s\": EOF"
time="2021-09-28T11:14:38+03:00" level=debug msg="Still waiting for the Kubernetes API: Get \"https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s\": EOF"
time="2021-09-28T11:15:08+03:00" level=debug msg="Still waiting for the Kubernetes API: Get \"https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s\": EOF"
time="2021-09-28T11:15:38+03:00" level=debug msg="Still waiting for the Kubernetes API: Get \"https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s\": EOF"
time="2021-09-28T11:15:58+03:00" level=debug msg="Still waiting for the Kubernetes API: Get \"https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s\": net/http: TLS handshake timeout"
time="2021-09-28T11:16:14+03:00" level=debug msg="Still waiting for the Kubernetes API: Get \"https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s\": EOF"
time="2021-09-28T11:16:47+03:00" level=debug msg="Still waiting for the Kubernetes API: Get \"https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s\": EOF"
time="2021-09-28T11:17:17+03:00" level=debug msg="Still waiting for the Kubernetes API: Get \"https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s\": EOF"
time="2021-09-28T11:17:47+03:00" level=debug msg="Still waiting for the Kubernetes API: Get \"https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s\": EOF"
time="2021-09-28T11:18:18+03:00" level=debug msg="Still waiting for the Kubernetes API: Get \"https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s\": EOF"
time="2021-09-28T11:18:48+03:00" level=debug msg="Still waiting for the Kubernetes API: Get \"https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s\": EOF"
time="2021-09-28T11:19:18+03:00" level=debug msg="Still waiting for the Kubernetes API: Get \"https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s\": EOF"
time="2021-09-28T11:19:48+03:00" level=debug msg="Still waiting for the Kubernetes API: Get \"https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s\": EOF"

AlejandroC · ‎09-30-2021

I see two options here:

1) the bootstrap node installation is failing

2) the load balancer tor api.cluster-jkt01.consultme.info:6443 is not configured properly

----

In any case, it would help you generating and inserting an SSH key to troubleshoot the bootstrap node:

To generate the SSH key:

https://docs.openshift.com/container-platform/4.6/installing/installing_bare_metal/installing-bare-m...

Then, include the public key in the install-config.yaml file. Example:

https://docs.openshift.com/container-platform/4.6/installing/installing_bare_metal/installing-bare-m...

You should then be able to SSH into the bootstrap server with a command similar to:

ssh -i <path_to_private_SSH_key> core@<bootstrap_ip>

After you log into the bootstrap server, you will see a journalctl example command to troubleshoot the bootstrap node installation, and probably find more information about what's going wrong.

Hope that helps!

View solution in original post

AlejandroC · ‎10-01-2021

From the logs, it looks like the bootstrap machine is installed correctly.

The issues seems to be with your LB configuration. Control plane and compute nodes need access to the APIserver and machineconfig server ports (6443 and 22623) in the bootstrap server, and it seems they can't reach the APIserver port in https://api.cluster-jkt01.consultme.info:6443.

Notice that your curl test was done from bootstrap to bootstrap, but the real connection should be from the control plane nodes to bootstrap.

As an example, this is a valid haproxy.conf file:

#---------------------------------------------------------------------
# round robin balancing for RHOCP Kubernetes API Server
#---------------------------------------------------------------------
frontend k8s_api
bind *:6443
mode tcp
default_backend k8s_api_backend
backend k8s_api_backend
balance roundrobin
mode tcp
server bootstrap 192.168.50.9:6443 check
server master01 192.168.50.10:6443 check
server master02 192.168.50.11:6443 check
server master03 192.168.50.12:6443 check

# ---------------------------------------------------------------------
# round robin balancing for RHOCP Machine Config Server
# ---------------------------------------------------------------------
frontend machine_config
bind *:22623
mode tcp
default_backend machine_config_backend
backend machine_config_backend
balance roundrobin
mode tcp
server bootstrap 192.168.50.9:22623 check
server master01 192.168.50.10:22623 check
server master02 192.168.50.11:22623 check
server master03 192.168.50.12:22623 check

# ---------------------------------------------------------------------
# round robin balancing for RHOCP Ingress Insecure Port
# ---------------------------------------------------------------------
frontend ingress_insecure
bind *:80
mode tcp
default_backend ingress_insecure_backend
backend ingress_insecure_backend
balance roundrobin
mode tcp
server worker01 192.168.50.13:80 check
server worker02 192.168.50.14:80 check

# ---------------------------------------------------------------------
# round robin balancing for RHOCP Ingress Secure Port
# ---------------------------------------------------------------------
frontend ingress_secure
bind *:443
mode tcp
default_backend ingress_secure_backend
backend ingress_secure_backend
balance roundrobin
mode tcp
server worker01 192.168.50.13:443 check
server worker02 192.168.50.14:443 check

# ---------------------------------------------------------------------
# Exposing HAProxy Statistic Page
# ---------------------------------------------------------------------
listen stats
bind :32700
stats enable
stats uri /
stats hide-version
stats auth admin:passwd

######################

After the installation is complete, you need to remove the entries about boostrap from the load balancer config because those ports are "moved" to the control plane machines.

Hope that helps!

View solution in original post

AlejandroC · ‎09-30-2021

I see two options here:

1) the bootstrap node installation is failing

2) the load balancer tor api.cluster-jkt01.consultme.info:6443 is not configured properly

----

In any case, it would help you generating and inserting an SSH key to troubleshoot the bootstrap node:

To generate the SSH key:

https://docs.openshift.com/container-platform/4.6/installing/installing_bare_metal/installing-bare-m...

Then, include the public key in the install-config.yaml file. Example:

https://docs.openshift.com/container-platform/4.6/installing/installing_bare_metal/installing-bare-m...

You should then be able to SSH into the bootstrap server with a command similar to:

ssh -i <path_to_private_SSH_key> core@<bootstrap_ip>

After you log into the bootstrap server, you will see a journalctl example command to troubleshoot the bootstrap node installation, and probably find more information about what's going wrong.

Hope that helps!

TheLinuxguy18 · ‎10-01-2021

Thanks for your reply

Please check the below output for journalctl on bootstrap node

============================================

[core@bootstrap ~]$ journalctl -b -f -u release-image.service -u bootkube.service
-- Logs begin at Sun 2021-09-26 23:03:44 +03. --
Oct 01 12:33:24 bootstrap.consultme.info bootkube.sh[142642]: Starting temporary bootstrap control plane...
Oct 01 12:33:24 bootstrap.consultme.info bootkube.sh[142642]: Error: open /etc/kubernetes/manifests/bootstrap-pod.yaml: file exists
Oct 01 12:33:24 bootstrap.consultme.info bootkube.sh[142642]: Tearing down temporary bootstrap control plane...
Oct 01 12:33:24 bootstrap.consultme.info bootkube.sh[142642]: Error: open /etc/kubernetes/manifests/bootstrap-pod.yaml: file exists
Oct 01 12:33:25 bootstrap.consultme.info systemd[1]: bootkube.service: Main process exited, code=exited, status=1/FAILURE
Oct 01 12:33:25 bootstrap.consultme.info systemd[1]: bootkube.service: Failed with result 'exit-code'.
Oct 01 12:33:30 bootstrap.consultme.info systemd[1]: bootkube.service: Service RestartSec=5s expired, scheduling restart.
Oct 01 12:33:30 bootstrap.consultme.info systemd[1]: bootkube.service: Scheduled restart job, restart counter is at 17.
Oct 01 12:33:30 bootstrap.consultme.info systemd[1]: Stopped Bootstrap a Kubernetes cluster.
Oct 01 12:33:30 bootstrap.consultme.info systemd[1]: Started Bootstrap a Kubernetes cluster.

journalctl -b -f -u release-image.service -u bootkube.service
Last login: Fri Oct 1 12:33:25 2021 from 192.168.43.116

bootkube service status

====================
[core@bootstrap ~]$ systemctl status bootkube.service
● bootkube.service - Bootstrap a Kubernetes cluster
Loaded: loaded (/etc/systemd/system/bootkube.service; static; vendor preset: disabled)
Active: active (running) since Fri 2021-10-01 12:38:41 +03; 12s ago
Main PID: 159535 (bash)
Tasks: 24 (limit: 50300)
Memory: 58.0M
CGroup: /system.slice/bootkube.service
├─159535 bash /usr/local/bin/bootkube.sh
├─160026 bash /usr/local/bin/bootkube.sh
├─160027 podman run --quiet --rm --net=none quay.io/openshift-release-dev/ocp-release@sha256:d78292e9730dd387ff6198197c8b0598da340be7678e8e1e4810b557a926c2b9 image pod
├─160116 /usr/bin/conmon --api-version 1 -s -c fb844a8b9faa8cd09046ac3821f25f3e4b04c9e16cd57bc4e06296be4356599c -u fb844a8b9faa8cd09046ac3821f25f3e4b04c9e16cd57bc4e06296be4356599c -r /usr/bin/runc -b /var/>
└─160142 /usr/bin/runc start fb844a8b9faa8cd09046ac3821f25f3e4b04c9e16cd57bc4e06296be4356599c

Oct 01 12:38:41 bootstrap.consultme.info systemd[1]: Started Bootstrap a Kubernetes cluster.

Also I can ping ,nslookup , curl api from bootstrap machine

[root@bootstrap core]# ping /api.cluster-jkt01.consultme.info
ping: /api.cluster-jkt01.consultme.info: Name or service not known
[root@bootstrap core]# ping api.cluster-jkt01.consultme.info
PING api.cluster-jkt01.consultme.info (192.168.43.116) 56(84) bytes of data.
64 bytes from api.consultme.info (192.168.43.116): icmp_seq=1 ttl=64 time=2.88 ms
64 bytes from api.consultme.info (192.168.43.116): icmp_seq=2 ttl=64 time=0.983 ms
64 bytes from api.consultme.info (192.168.43.116): icmp_seq=3 ttl=64 time=0.588 ms
--- api.cluster-jkt01.consultme.info ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 4ms
rtt min/avg/max/mdev = 0.588/1.482/2.876/0.999 ms

[core@bootstrap ~]$ nslookup
> api.cluster-jkt01.consultme.info
Server: 192.168.43.116
Address: 192.168.43.116#53

Name: api.cluster-jkt01.consultme.info
Address: 192.168.43.116

[core@bootstrap ~]$ curl -vv https://api.cluster-jkt01.consultme.info:6443
* Rebuilt URL to: https://api.cluster-jkt01.consultme.info:6443/
* Trying 192.168.43.116...
* TCP_NODELAY set
* Connected to api.cluster-jkt01.consultme.info (192.168.43.116) port 6443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/pki/tls/certs/ca-bundle.crt
CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to api.cluster-jkt01.consultme.info:6443
curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to api.cluster-jkt01.consultme.info:6443

The crio is working and pod listed below

============================

[root@bootstrap core]# crictl ps -a
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID
0a36ad628c4bf 8f13cacfce95f5b2a0c2b50df05869e3236b07dfc9a0c5102b41b04d07b8fea0 35 seconds ago Exited cloud-credential-operator 29 da63da4f69937
5e72fe0c251d3 ee1ad20f0063bb88b3bffd9cfd0ceddeca5ef14d9fb7e1b033fac4d8e61bc140 2 minutes ago Exited kube-scheduler 28 5b23ca938cf7f
8f6c53a244f78 ee1ad20f0063bb88b3bffd9cfd0ceddeca5ef14d9fb7e1b033fac4d8e61bc140 3 minutes ago Exited kube-controller-manager 29 8a240d242aaa4
051629551b8a3 ee1ad20f0063bb88b3bffd9cfd0ceddeca5ef14d9fb7e1b033fac4d8e61bc140 4 minutes ago Exited kube-apiserver 28 3ad6aab8a91a9
d8939968a9c5d 806a0bf7684a4a9e1b20f3c422d4fb96513412e7d11632c1a482d29734b89ba7 4 minutes ago Exited cluster-policy-controller 28 8a240d242aaa4
138e3f50c62b7 09a01faefd4fcc313b7653d468878bf3427265d995cf7ca621e7e18992d47c51 2 hours ago Running machine-config-server 0 cdf8a94e8a69a
0a35dff3b9f04 2abf30ff6ba99c2a03e2c6b4b9ccd6f389c1aa9f6bd0f020a21e5f7f0f5a493a 2 hours ago Running kube-apiserver-insecure-readyz 0 3ad6aab8a91a9
35fdf2518e813 quay.io/openshift-release-dev/ocp-release@sha256:d78292e9730dd387ff6198197c8b0598da340be7678e8e1e4810b557a926c2b9 2 hours ago Running cluster-version-operator 0 ec7074a9fa5e1
263aef2e3140d 09a01faefd4fcc313b7653d468878bf3427265d995cf7ca621e7e18992d47c51 2 hours ago Exited machine-config-controller 0 cdf8a94e8a69a
6438f41bc1a76 ee1ad20f0063bb88b3bffd9cfd0ceddeca5ef14d9fb7e1b033fac4d8e61bc140 2 hours ago Exited setup 0 3ad6aab8a91a9
16174593774e0 3264a20df581bff1f88671f55733f7f58bb4841702b6e6659858af640bc6855b 2 hours ago Running etcd-member 0 e5e3134bbfcf8

So is there any service or port needed to be checked

AlejandroC · ‎10-01-2021

From the logs, it looks like the bootstrap machine is installed correctly.

The issues seems to be with your LB configuration. Control plane and compute nodes need access to the APIserver and machineconfig server ports (6443 and 22623) in the bootstrap server, and it seems they can't reach the APIserver port in https://api.cluster-jkt01.consultme.info:6443.

Notice that your curl test was done from bootstrap to bootstrap, but the real connection should be from the control plane nodes to bootstrap.

As an example, this is a valid haproxy.conf file:

#---------------------------------------------------------------------
# round robin balancing for RHOCP Kubernetes API Server
#---------------------------------------------------------------------
frontend k8s_api
bind *:6443
mode tcp
default_backend k8s_api_backend
backend k8s_api_backend
balance roundrobin
mode tcp
server bootstrap 192.168.50.9:6443 check
server master01 192.168.50.10:6443 check
server master02 192.168.50.11:6443 check
server master03 192.168.50.12:6443 check

# ---------------------------------------------------------------------
# round robin balancing for RHOCP Machine Config Server
# ---------------------------------------------------------------------
frontend machine_config
bind *:22623
mode tcp
default_backend machine_config_backend
backend machine_config_backend
balance roundrobin
mode tcp
server bootstrap 192.168.50.9:22623 check
server master01 192.168.50.10:22623 check
server master02 192.168.50.11:22623 check
server master03 192.168.50.12:22623 check

# ---------------------------------------------------------------------
# round robin balancing for RHOCP Ingress Insecure Port
# ---------------------------------------------------------------------
frontend ingress_insecure
bind *:80
mode tcp
default_backend ingress_insecure_backend
backend ingress_insecure_backend
balance roundrobin
mode tcp
server worker01 192.168.50.13:80 check
server worker02 192.168.50.14:80 check

# ---------------------------------------------------------------------
# round robin balancing for RHOCP Ingress Secure Port
# ---------------------------------------------------------------------
frontend ingress_secure
bind *:443
mode tcp
default_backend ingress_secure_backend
backend ingress_secure_backend
balance roundrobin
mode tcp
server worker01 192.168.50.13:443 check
server worker02 192.168.50.14:443 check

# ---------------------------------------------------------------------
# Exposing HAProxy Statistic Page
# ---------------------------------------------------------------------
listen stats
bind :32700
stats enable
stats uri /
stats hide-version
stats auth admin:passwd

######################

After the installation is complete, you need to remove the entries about boostrap from the load balancer config because those ports are "moved" to the control plane machines.

Hope that helps!

TheLinuxguy18 · ‎10-03-2021

Thanks I bulit the bootstrap from scratch again with out any controller or compute node But with failure you can check the error messages on the videos url below

https://youtu.be/fYHPw8Kzbog

[core@bootstrap ~]$ journalctl -b -f -u release-image.service -u bootkube.service

Oct 03 19:37:58 bootstrap.consultme.info bootkube.sh[25157]: Failed to create "99_kubeadmin-password-secret.yaml" secrets.v1./kubeadmin -n kube-system: Post "https://localhost:64ets": net/http: TLS handshake timeout
Oct 03 19:37:59 bootstrap.consultme.info bootkube.sh[25157]: E1003 19:37:59.856732 1 reflector.go:134] github.com/openshift/cluster-bootstrap/pkg/start/status.go:66: Failedost:6443/api/v1/pods": net/http: TLS handshake timeout
Oct 03 19:38:08 bootstrap.consultme.info bootkube.sh[25157]: Failed to create "99_openshift-cluster-api_master-user-data-secret.yaml" secrets.v1./master-user-data -n openshift-ma43/api/v1/namespaces/openshift-machine-api/secrets": net/http: TLS handshake timeout
Oct 03 19:38:20 bootstrap.consultme.info bootkube.sh[25157]: Skipped "99_openshift-cluster-api_worker-user-data-secret.yaml" secrets.v1./worker-user-data -n openshift-machine-api
Oct 03 19:38:20 bootstrap.consultme.info bootkube.sh[25157]: Skipped "99_openshift-machineconfig_99-master-ssh.yaml" machineconfigs.v1.machineconfiguration.openshift.io/99-master
Oct 03 19:38:20 bootstrap.consultme.info bootkube.sh[25157]: Skipped "99_openshift-machineconfig_99-worker-ssh.yaml" machineconfigs.v1.machineconfiguration.openshift.io/99-worker
Oct 03 19:38:20 bootstrap.consultme.info bootkube.sh[25157]: Skipped "cco-cloudcredential_v1_credentialsrequest_crd.yaml" customresourcedefinitions.v1beta1.apiextensions.k8s.io/censhift.io -n as it already exists
Oct 03 19:38:20 bootstrap.consultme.info bootkube.sh[25157]: Skipped "cco-cloudcredential_v1_operator_config_custresdef.yaml" customresourcedefinitions.v1.apiextensions.k8s.io/cl-n as it already exists

From the helper node

[root@helper ~]# oc get csr
No resources found
[root@helper ~]# oc get nodes
No resources found

Also the bootstrap wait

DEBUG OpenShift Installer 4.6.1
DEBUG Built from commit ebdbda57fc18d3b73e69f0f2cc499ddfca7e6593
INFO Waiting up to 20m0s for the Kubernetes API at https://api.cluster-jkt01.consultme.info:6443...
DEBUG Still waiting for the Kubernetes API: Get "https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s": EOF
DEBUG Still waiting for the Kubernetes API: Get "https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s": EOF
DEBUG Still waiting for the Kubernetes API: Get "https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s": EOF
DEBUG Still waiting for the Kubernetes API: Get "https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s": EOF
DEBUG Still waiting for the Kubernetes API: Get "https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s": EOF
DEBUG Still waiting for the Kubernetes API: Get "https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s": EOF
DEBUG Still waiting for the Kubernetes API: Get "https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s": EOF
DEBUG Still waiting for the Kubernetes API: Get "https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s": EOF
DEBUG Still waiting for the Kubernetes API: Get "https://api.cluster-jkt01.consultme.info:6443/version?timeout=32s": net/http: TLS handshake timeout
INFO API v1.19.0+d59ce34 up
INFO Waiting up to 30m0s for bootstrapping to complete...
W1003 21:42:27.304934 4908 reflector.go:326] k8s.io/client-go/tools/watch/informerwatcher.go:146: watch of *v1.ConfigMap ended with: very short watch: k8s.io/client-go/tools/watch/informerwatcher.go:146: Unexpected watch close - watch lasted less than a second and no items received
E1003 21:42:30.447515 4908 reflector.go:153] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to list *v1.ConfigMap: Get "https://api.cluster-jkt01.consultme.info:6443/api/v1/namespaces/kube-system/configmaps?fieldSelector...": EOF
E1003 21:42:31.450321 4908 reflector.go:153] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to list *v1.ConfigMap: Get "https://api.cluster-jkt01.consultme.info:6443/api/v1/namespaces/kube-system/configmaps?fieldSelector...": EOF
E1003 21:42:32.453220 4908 reflector.go:153] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to list *v1.ConfigMap: Get "https://api.cluster-jkt01.consultme.info:6443/api/v1/namespaces/kube-system/configmaps?fieldSelector...": EOF

Please need to islolate the error and determine the root casue

Here also the haproxy.cfg

[root@helper ~]# cat /etc/haproxy/haproxy.cfg
#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
# to have these messages end up in /var/log/haproxy.log you will
# need to:
#
# 1) configure syslog to accept network log events. This is done
# by adding the '-r' option to the SYSLOGD_OPTIONS in
# /etc/sysconfig/syslog
#
# 2) configure local2 events to go to the /var/log/haproxy.log
# file. A line like the following can be added to
# /etc/sysconfig/syslog
#
# local2.* /var/log/haproxy.log
#
log 127.0.0.1 local2

chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 4000
user haproxy
group haproxy
daemon

# turn on stats unix socket
stats socket /var/lib/haproxy/stats

listen stats
bind :9000
mode http
stats enable
stats uri /

#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will # use if not designated in their block
#---------------------------------------------------------------------
defaults
mode http
log global
option httplog
option dontlognull
option http-server-close
option forwardfor except 127.0.0.0/8
option redispatch
retries 3
timeout http-request 10s
timeout queue 1m
timeout connect 10s
timeout client 1m
timeout server 1m
timeout http-keep-alive 10s
timeout check 10s
maxconn 3000

# clitimeout 100s
# srvtimeout 100s
# contimeout 100s
#listen stats
# bind *:1936
# mode http
# log global
# maxconn 10
# timeout client 100s
# timeout server 100s
# timeout connect 100s
# timeout queue 100s
# stats enable
# stats hide-version
# stats refresh 30s
# stats show-node
# stats auth admin:password
# stats uri /haproxy?stats

listen api-server-6443
bind *:6443
mode tcp
balance source
server bootstrap.consultme.info 192.168.43.119:6443 check
server master01.consultme.info 192.168.43.114:6443 check
server master02.consultme.info 192.168.43.117:6443 check
server master03.consultme.info 192.168.43.118:6443 check

listen machine-config-server-22623
bind *:22623
mode tcp
balance source
# server bootstrapnode 192.168.43.140:22623 check backup # Can be removed or commented out after install completes
server bootstrap.consultme.info 192.168.43.119:6443 check

server master01.consultme.info 192.168.43.114:6443 check
server master02.consultme.info 192.168.43.117:6443 check
server master03.consultme.info 192.168.43.118:6443 check

listen ingress-router-80
bind *:80
mode tcp
balance source
server master01.consultme.info 192.168.43.114:80 check
server master02.consultme.info 192.168.43.117:80 check
server master03.consultme.info 192.168.43.118:80 check
server worker01.consultme.info 192.168.43.120:80 check
server worker02.consultme.info 192.168.43.121:80 check

listen ingress-router-443
bind *:443
mode tcp
balance source
server master01.consultme.info 192.168.43.114:443 check
server master02.consultme.info 192.168.43.117:443 check
server master03.consultme.info 192.168.43.118:443 check

server worker01.consultme.info 192.168.43.120:443 check
server worker02.consultme.info 192.168.43.121:443 check

AlejandroC · ‎10-04-2021

Hello.

Your bootstrap node was installed correctly in one of your previous installation attempts on Friday:

"Oct 01 12:38:41 bootstrap.consultme.info systemd[1]: Started Bootstrap a Kubernetes cluster."

It seems to be failing to provision again in your last video.

Remember that you must wipe all your servers before attempting a new installation. Master and workers included. Also, do not attempt an installation using only a bootstrap server, it will always fail.

You can check how to provision the master nodes here: https://docs.openshift.com/container-platform/4.6/installing/installing_vsphere/installing-vsphere.h...

After the bootstrap installation is complete, you should see in the console (TTY) of the master nodes attempts to connect to the cluster API. Once the masters are able to connect to the API, the bootstrap process continues and "moves" the APIserver, etcd, etc to the master nodes.

In the following link we performed a UPI baremetal installation (second half of the video recording). There you can see the installation flow for clarification.

https://www.redhat.com/en/events/webinar/sneak-peek-upcoming-installation-training-red-hat-openshift...

If we are not able to figure out your issue, I'd recommend you to open a support case at https://access.redhat.com/

Thosi-Fernandas · ‎09-30-2021

Share your load balancer configuration for review. This is critical to pass the installation. You can reach me @ +91 9008088066

Open shift 4.6 deployment failed

containers

DevOps

Kubernetes

OpenShift