I just passed the ex280 exam (with openshift 4.10). I completed almost everything and I had more than an hour for troubleshooting this particular scenario:
Pods were up and running in this project, (there were one pod running and a deploy pod as complete).
When I wanted to check whether the pod works, I tried:
oc rsh pod/thepod-0001 curl http://localhost:8080
oc rsh pod/thepod-0001 curl http//xx.xx.xx.xx:8080
There was a message like...
ERRO exec failed: xxxxxxxxx:XXX: starting container process caused "not recognize curl: executable file not found in $PATH"
Looks like curl was not in the container. So I tried to open a session in this pod with:
oc rsh pod/thepod-0001
But i got a message like this:
ERRO exec failed: xxxxxxxxx:XXX:XXX: starting container process caused "not recognize sh: executable file not found in $PATH"
So, it looks like container can't reach the sh (or bash). Like the container can reach the /bin/; /usr/bin/ where these commands are. I check logs with oc logs thepod-0001 and there were no error message, I just saw two lines that said something like it connected to "port 8080" and "port 8443".
I tried to debug with:
oc debug dc/thepod --image registry.access.redhat.com/ubi8/ubi
oc debug dc/thepod --image registry.redhat.io/ubi8/ubi
oc debug pod/thepod-001 -image registry.access.redhat.com/ubi8/ubi
but the prompt never came back, just got freeze, so I had to cancel it.
I bounce the pods:
oc delete pods --all
oc rollout latest dc/thepod
All kept the same. I also compared the whole definition with another dc from other project:
oc get dc/thepod -o yaml > ./dc_proj1.yaml
oc get dc/otherpod -o yaml -n otherproject > ./dc_proj2.yaml
vim -d ./dc_proj1.yaml ./dc_proj2.yaml
Both definition were the same (beside obvious differences). They also declared the same image, so I didnt understand what was wrong.
Besides, I did an oc get pods -o wide and move the pod in other node (with nodeSelector). Nodes looked ok (ready) and since the pods were complete/running I didnt think this could be the issue.
I also check the cm and its declaration in dc, all was ok.
I googled about this kind of error, and I just ran into Dockerfile error message.
What else should I have checked? I tryied everything I knew in troubleshooting.
So, i am not alone! are you able to reproduce the error? Do you have the oc new-app that triggers this issue? would be great if the whole community troubleshoot your cluster. In my case this happened in ocp 4.10. Appreciate you to share more information about your case.