-
Notifications
You must be signed in to change notification settings - Fork 686
Description
The problem occurred when filesystem went to read only mode. That was fixed, but still in the metrics I was able to see the counter and gauge set up to 1.
I conducted a test and multiple times injected the FileSystemIsReadOnly to the /dev/kmsg (https://github.com/kubernetes/node-problem-detector/blob/master/config/kernel-monitor.json):
1 log_monitor.go:160] New status generated: &{Source:kernel-monitor Events:[{Severity:info Timestamp:2020-10-08 06:44:16.09315274 +0000 UTC m=+1331754.148888064 Reason:FilesystemIsReadOnly Message:Node condition ReadonlyFilesystem is now: True, reason: FilesystemIsReadOnly}] Conditions:[{Type:KernelDeadlock Status:False Transition:2020-09-22 20:48:21.98500453 +0000 UTC m=+0.040739839 Reason:KernelHasNoDeadlock Message:kernel has no deadlock} {Type:ReadonlyFilesystem Status:True Transition:2020-10-08 06:44:16.09315274 +0000 UTC m=+1331754.148888064 Reason:FilesystemIsReadOnly Message:Remounting filesystem read-only}]}
Still the metrics were shown as 1 and it did not downgraded to 0. Even the the issue with ro filesystem was fixed, still the metric was 1:
problem_counter{reason="FilesystemIsReadOnly"} 1
problem_gauge{reason="FilesystemIsReadOnly",type="ReadonlyFilesystem"} 1
As a workaround the pod was deleted and after that metrics were reset to 0.
What is the reason of that behaviour? The type "permanent"? Is deleting a pod the only solution?
kernel-monitor.json
{
"type": "permanent",
"condition": "ReadonlyFilesystem",
"reason": "FilesystemIsReadOnly",
"pattern": "Remounting filesystem read-only"
}