OpenShift Data Foundation

Johnny · ‎08-18-2023

Hi,

anybody with deeper Data Foundation experience?

On a bare metal OCP cluster I run in a problem with a full BlockPool. To free space I deleted one PVC and PV. And used "ceph osd set-full-ratio 0.9" to set the pool from ready only to read write, afterwards. But it seams the pool doesn't reclaim the space. Deleted a PVC and PV with 1.5 TiB and it still near full. It looks like the images on the rbd pool remain but I can't check this directly with rbd, because "rbd ls <poolname>" hangs. The only idea I have now is cloning existing PVCs on that BlockPool and delete it.

Does anybody has experience and other ideas?

Thank you in advance,

Hannes

Ravi_Shanker · ‎09-05-2023

Check for snapshots. Check PVC reclaim policy.

Certification ID: 111-010-393

AbbasMohammed · ‎09-05-2023

Dears,

Actually there are multiple things to check one by one to understand and reclaim the space.

1. Ensure that the PVCs were properly deleted oc get pvc

2. PVCs are bound to Persistent Volumes (PVs). Make sure that the PVs associated with the deleted PVCs have also been deleted - oc get pv

3.Review the configuration of the StorageClass used for the PVCs. Some StorageClasses may have settings that prevent immediate space reclamation upon PVC deletion.

4.Check the reclaim policy set for the PVs in the StorageClass. The reclaim policy can be set to Delete, Retain, or Recycle. If it's set to Retain, the PV and its associated storage won't be deleted automatically.

5.Verify that the ODF CSI provisioner is properly configured. Configuration issues with the provisioner can lead to problems with reclaiming storage.

6. Sometime worst case you may need to reclaim the storage manually. This involves deleting the associated BlockPools or storage resources.

7.Review the logs and monitoring data for your ODF or CSI installation. Look for any error messages or warnings related to storage reclamation. This can help pinpoint the cause of the issue.

8. Finally if still issue is expereiecing raise the support call

sharath_vutpala · ‎09-06-2023

One more thing that you can try is trimming space on RBD disks directly on the node. Starting from Openshift 4.10, we can enable ReclaimSpace feature to enable this feature automatically.

If this feature is not enabled in the cluster, we need to perform this operation to regain the space from the filesystem.

Identify the PVC where data is deleted
Check the Persistent Volume associated to it and copy the name of PV
Look for the pod which is using this PVC
Find out the node on which this pod is running
Start a debug session to that node
df -kh | grep PV_NAME - This gives information of mount point of the disk on Node
fstrim MOUNT_POINT

AhmedEidNassef · ‎09-06-2023

I asked ai chat about his advise and it replied with below,

It seems like you are experiencing issues with reclaiming space in an OpenShift Data Foundation (ODF) cluster. While I can provide some general guidance, it's important to note that troubleshooting and resolving specific issues in an ODF environment may require deeper knowledge and expertise in the platform.

Here are some steps you can try to troubleshoot and reclaim space in your ODF cluster:

Verify PVC and PV deletion: Ensure that the PVC and PV you deleted are actually removed from the cluster. You can use the following commands to check their status:

oc get pvc

oc get pv

Confirm that the PVC and PV are not listed. If they are still present, there might be an issue with the deletion process.

2- Check Ceph OSD status: Verify the status of the Ceph OSDs (Object Storage Daemons) in your ODF cluster. You can use the following command to list the OSDs and their status:

````bash

oc get cephblockpool -n openshift-storage

oc describe cephblockpool <pool-name> -n openshift-storage

```

Ensure that all OSDs are in a healthy state. If any OSDs are down or in a warning state, it could affect the space reclamation process.

3- Monitor Ceph cluster utilization: Monitor the overall utilization of the Ceph cluster to get insights into the storage usage and identify any potential issues. You can use the Ceph Dashboard or command-line tools like `ceph df` to check the cluster utilization.

````bash

ceph df

```

Look for any abnormal utilization patterns or discrepancies that could indicate the cause of the full BlockPool.

4- Verify RBD image deletion: If you suspect that the RBD images are not getting deleted correctly, you can try listing the images using the `rbd` command-line tool from a Ceph monitor node:

````bash

rbd -p <pool-name> ls

```

If the command hangs, it suggests a potential issue with the RBD images. You may need to investigate further or consider involving the support or community resources for OpenShift Data Foundation.

5- Clone and delete PVCs: As a workaround, you can try cloning existing PVCs that are using the BlockPool and then deleting the clones. This process will create new RBD images associated with the clones and might potentially free up space. However, be cautious when performing operations like cloning, as it can impact the availability and data integrity of your applications.

Before cloning, ensure you have a backup of any important data stored in the PVCs.

Again, it's important to consider seeking assistance from the OpenShift Data Foundation community or reaching out to the support channels provided by Red Hat for more specific guidance tailored to your environment. They can provide you with expert advice and troubleshooting steps to resolve issues related to reclaiming space in an ODF cluster.

Johnny · ‎09-10-2023

At first, thank you all for the suggestions.

I think, after 10 years playing and working with OpenShift I have enough experience I have enough experience with it and the most important resources.

Before I write to an open community I read, try and google a lot.

In that case, starting to get expierence with ODF and installed on a completly new bare metal OpenShift Cluster. I do have three workers with on each one HDD and one SSD disk.
Contacted already the support, but as Red Hat external Instructor I get access to the software but don't get support. So I tried, maybe the will answer.

Now I know from them HDD are not supported, but I think its not the issue.
Now I know, if you use different sizes on the disks this is also not supported, but I think its not the issue. I am wondering about this information, because if I have the same size on all of your disks and crashes, maybe you will not get a disk with exact the same size.

I started with a completed new installed OCP cluster and new default configured ODF on it. I started to migrate my projects and data. Didn't check early enough and run a disk full issue. So I deleted the PVC/PV but before I changed the full_ratio with ceph. So the images on the disks didn't get deleted because the blockpool was placed for read only. After changing the full_ratio size and getting it back to read write the data was not released.

Tried to get the images with "rbd ls ocs-storagecluster-cephblockpool" but this command hangs forever. And that is my problem which is not working on each cephblockpool. So I can't see the images on all cephblockpools. Because, if I blockpool run out of space, it happend a second time I did it now on the right way with setting it read write again before deleteing PVC/PVs, you have to delete the whole blockpool with ceph directly otherwise you wouldn't get the space back.

So my point is, how can I get "rbd ls <poolname>" running to see all the images and being able to compare them with existing PVs, to delete the not needed ones manually.

I am now at the point to delete the whole ODF stuff and starting new. Create for every cupple of PVs their own cephblockpool and don't get out of disk space!

Thank you a lot for your time, it takes a lot time to get me here.
Hannes

OpenShift Data Foundation - after deleting PVC and PC BlockPool is not reclaiming deleted space