I opened a case regarding the performance of the labs. First feedback, it was my firewall or vpn.
I explained it was the lab, because we you have a console to compute resources in the lab, and there it is slow.
Installing the ceph cluster takes hours. I deleted and reprovisioned the lab again, but still it is slow.
When the cluster is up, everything is healthy, but creating a pool takes minutes.
More people experiencing these issues.
I have the exact same issue here (France).
It took me more than an hour to run each installation playbook (the training course says it's supposed to take 5 minutes).
Likewise, creating a pool takes about 10 minutes instead of a few seconds.
It's extremely annoying, first because it wastes a lot of my lab time, and second because I have scheduled the corresponding certification exam pretty soon and I can't afford to waste time :(
@cschunke, could you please check ?
Thanks for the response. I was so frustrated that i installed my own ceph lab on my laptop.
It took 5 minutes for the playbook to install the same setup as the lab environment.
To mimic the lab i used the following:
ceph-ansible branch table-3.2.30
ansible 2.6.9 => https://docs.ceph.com/ceph-ansible/master/#releases
I still haven't heard anything from my supportcase, since i escalated it.
I will put this thread in the case and maybe escalate again.
After some performance tests, it seems that the issue comes from the network only.
Maximum attainable bandwidtch between nodes is under 5MB/s, which is way below the 1Gb/s minimum requirement for Ceph.
You can easily verify that by copying a large file from one server to an other.
As a result, I'm stuck at the first step of lab 3.2, being totally unable to create a replicated pool on server[c-e]. It takes hours and never returns.
On the all-in-one cluster serverf, the same command works in about 2min45, which is fine. This confirms that the issue comes from poor inter-nodes network performance.
First, sorry to hear about the performance issue that you are all experiencing. This must be frustrating. I will forward that to the team responsible for this course to see if there's a workable solution.
Hi @Razique and thank you very mutch for your concern.
I had created a support case too. They have just changed my lab to a different datacenter. I had to re-provision the lab from scratch though, which I have just done. I can now transfer at 9MB/s between my lab's VMs, which is almost twice faster than before, but still pretty slow compared to Ceph network requirements (1Gb/s).
I will re-try the CEPH125 first labs later today and post an update here.
Hi @Razique :)
Thank you for asking.
I did the test only last night. Unfortunately, it still took 2 hours and 40 minutes to install Ceph on 3 nodes + 1 client with the ansible-playbook command.
I have updated my support case with that information.
It's troubling how even the skipped tasks seem to be pretty slow. Based on my personal experience with ansible in general, they usually go so fast you can't read them.
I had activated SSH pipelining for that run, but it didn't help at all.
Now I feel miserable because I have my EX125 exam scheduled early January with travel arrangements, and I feel like I won't be able to practice at all until then :(
Another hurdle is that even preparing a clean lab environment for such an Ansible installation itself is pretty tedious with the ansible inventory file and all the group_vars files to populate manually before being able to run the ansible-playbook command.
Thank you very much in advance if you can help here. Since the issue has been encountered by @rmokkink too, and since it has occured in different datacenters too, I guess it should be pretty easy for Red Hat to reproduce it.
I have heard from support that the issue is escalated to the ROL backend team, this was on the 10th of december. If i hear anything i will report back.
At the moment i am still doing labs on my laptop, most labs i can do pretty easy.