Bcache writeback - cache all used.
Hey guys,
I'm using bcache to support Ceph. Ten Cluster nodes have a bcache device each consisting of an HDD block device and an NVMe cache. But I am noticing what I consider to be a problem: My cache is 100% used even though I still have 80% of the space available on my HDD.
It is true that there is more data written than would fit in the cache. However, I imagine that most of them should only be on the HDD and not in the cache, as they are cold data, almost never used.
I noticed that there was a significant drop in performance on the disks (writes) and went to check. Benchmark tests confirmed this. Then I noticed that there was 100% cache full and 85% cache evictable. There was a bit of dirty cache. I found an internet message talking about the garbage collector, so I tried the following:
echo 1 > /sys/block/bcache0/bcache/cache/internal/trigger_gc
That doesn't seem to have helped.
Then I collected the following data:
--- bcache ---
Device /dev/sdc (8:32)
UUID 38e81dff-a7c9-449f-9ddd-182128a19b69
Block Size 4.00KiB
Bucket Size 256.00KiB
Congested? False
Read Congestion 0.0ms
Write Congestion 0.0ms
Total Cache Size 553.31GiB
Total Cache Used 547.78GiB (99%)
Total Unused Cache 5.53GiB (1%)
Dirty Data 0B (0%)
Evictable Cache 503.52GiB (91%)
Replacement Policy [lru] fifo random
Cache Mode writethrough [writeback] writearound none
Total Hits 33361829 (99%)
Total Missions 185029
Total Bypass Hits 6203 (100%)
Total Bypass Misses 0
Total Bypassed 59.20MiB
--- Cache Device ---
Device /dev/nvme0n1p1 (259:1)
Size 553.31GiB
Block Size 4.00KiB
Bucket Size 256.00KiB
Replacement Policy [lru] fifo random
Discard? False
I/O Errors 0
Metadata Written 395.00GiB
Data Written 1.50 TiB
Buckets 2266376
Cache Used 547.78GiB (99%)
Cache Unused 5.53GiB (0%)
--- Backing Device ---
Device /dev/sdc (8:32)
Size 5.46TiB
Cache Mode writethrough [writeback] writearound none
Readhead
Sequential Cutoff 0B
Sequential merge? False
state clean
Writeback? true
Dirty Data 0B
Total Hits 32903077 (99%)
Total Missions 185029
Total Bypass Hits 6203 (100%)
Total Bypass Misses 0
Total Bypassed 59.20MiB
The dirty data has disappeared. But the cache remains 99% utilization, down just 1%. Already the evictable cache increased to 91%!
The impression I have is that this harms the write cache. That is, if I need to write again, the data goes straight to the HDD disks, as there is no space available in the Cache.
Shouldn't bcache remove the least used part of the cache?
Does anyone know why this isn't happening?
I may be talking nonsense, but isn't there a way to tell bcache to keep a write-free space rate in the cache automatically? Or even if it was manually by some command that I would trigger at low disk access times?
Thanks!
, ou
Comments
-
Whenever the cache decides to keep some new data and there is no free space left in the cache, it makes space according to the current replacement policy. Your output shows that you use the "lru" policy. This means that the least recently used item is evicted from the cache when space is needed.
Having lots of cold data in the cache is not a problem for speed as long as the data you consider hot is actually hotter than the cold data, i.e. used more often. The LRU policy then ensures that cold data is evicted from the cache first when new data arrives. If the new data is cold data, the idea is that the LRU policy should pick it for eviction fairly soon and keep the hot data as the hot data is frequently requested. This fails, however, if the amount of cold data written is very large and the hot data is not requested often enough, causing the hot data to be less recently used than the cold data and hence to be evicted from the cache. Also, there is the problem that writing a lot of cold data to the cache unnecessarily wears down your SSD so this may still be undesirable even if the hot data manages to stay in the cache, e.g. 20 GB of hot data, 500 GB cache but only 450 GB of cold data written. You could address this problem by switching to the "random" policy. As most data in your cache is cold data, new data is then likely to replace cold data.
To reduce the amount of new data that is added to the cache, you can set the
sequential_cutoff
parameter of your bcache device. Your current cutoff of zero means that there is no cutoff and all data is added to the cache, explaining the high cache utilisation. Depending on your load, a low but non-zero cutoff can cause some or most of your hot data to not make it into the cache. Some hot data may come in a large continues read or write operation, e.g. if Ceph bundles metadata writes. Furthermore, bcache tries to detect tasks such as backup tasks that read lots of hot and cold data based on the average read request size and does not add this data to the cache. This task detection may be a problem if bcache treats all of the Ceph system as a single task and cannot see the underlying tasks that may be running on a different cluster node.To experiment, you'd want to detach, stop, reset and re-attach the caches and reset the Ceph nodes, in particular Cephfs metadata nodes, and compute nodes (that also cache some cephfs filesystem data), and you'd need a realistic test load. This can be quite a challenge. A more practical solution will be to leave the threshold at 0 and to ensure the SSDs are large enough to not lose substantial amounts of hot data when lots of cold data arrives or is being read.
Also, if you haven't done so already, you should look into your Ceph configuration to separate storage of cephfs metadata and filedata so that you can use different bcache settings for metadata and filedata (or use SSDs exclusively for metadata).
0 -
CONFIG_SYSVIPC=y
CONFIG_POSIX_MQUEUE=yCONFIG_CROSS_MEMORY_ATTACH is not set
CONFIG_NO_HZ_IDLE=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_NAMESPACES=yCONFIG_UTS_NS is not set
CONFIG_PID_NS is not set
CONFIG_BLK_DEV_INITRD=y
CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE_O3=y
CONFIG_EMBEDDED=y
CONFIG_PERF_EVENTS=yCONFIG_VM_EVENT_COUNTERS is not set
CONFIG_SLUB_DEBUG is not set
CONFIG_COMPAT_BRK is not set
CONFIG_ISA_ARCOMPACT=y
CONFIG_MODULES=y
CONFIG_MODULE_FORCE_LOAD=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
CONFIG_PARTITION_ADVANCED=y
CONFIG_ARC_PLAT_AXS10X=y
CONFIG_AXS101=y
CONFIG_ARC_CACHE_LINE_SHIFT=5
CONFIG_ARC_BUILTIN_DTB_NAME="axs101"
CONFIG_PREEMPT=yCONFIG_COMPACTION is not set
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_UNIX=y
CONFIG_NET_KEY=y
CONFIG_INET=y
CONFIG_IP_PNP=y
CONFIG_IP_PNP_DHCP=y
CONFIG_IP_PNP_BOOTP=y
CONFIG_IP_PNP_RARP=yCONFIG_INET_XFRM_MODE_TRANSPORT is not set
CONFIG_INET_XFRM_MODE_TUNNEL is not set
CONFIG_INET_XFRM_MODE_BEET is not set
CONFIG_IPV6 is not set
CONFIG_DEVTMPFS=y
CONFIG_STANDALONE is not set
CONFIG_PREVENT_FIRMWARE_BUILD is not set
CONFIG_SCSI=y
CONFIG_BLK_DEV_SD=y
CONFIG_NETDEVICES=yCONFIG_NET_VENDOR_ARC is not set
CONFIG_NET_VENDOR_BROADCOM is not set
CONFIG_NET_VENDOR_INTEL is not set
CONFIG_NET_VENDOR_MARVELL is not set
CONFIG_NET_VENDOR_MICREL is not set
CONFIG_NET_VENDOR_NATSEMI is not set
CONFIG_NET_VENDOR_SEEQ is not set
CONFIG_STMMAC_ETH=y
CONFIG_NET_VENDOR_VIA is not set
CONFIG_NET_VENDOR_WIZNET is not set
CONFIG_NATIONAL_PHY=y
ADD Al , 6
MOVE AX[SI+5]
MOVE AX , T[si]
.DATA
CONFIG_USB_NET_DRIVERS is not set
CONFIG_INPUT_EVDEV=y
CONFIG_MOUSE_PS2_TOUCHKIT=y
CONFIG_MOUSE_SERIAL=y
CONFIG_MOUSE_SYNAPTICS_USB=yCONFIG_LEGACY_PTYS is not set
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_SERIAL_8250_DW=y
CONFIG_SERIAL_OF_PLATFORM=yCONFIG_HW_RANDOM is not set
CONFIG_I2C=y
CONFIG_I2C_CHARDEV=y
CONFIG_I2C_DESIGNWARE_PLATFORM=yCONFIG_HWMON is not set
CONFIG_DRM=m
CONFIG_DRM_I2C_ADV7511=m
CONFIG_DRM_ARCPGU=m
CONFIG_FB=y
CONFIG_FRAMEBUFFER_CONSOLE=y
CONFIG_LOGO=yCONFIG_LOGO_LINUX_MONO is not set
CONFIG_LOGO_LINUX_VGA16 is not set
CONFIG_LOGO_LINUX_CLUT224 is not set
CONFIG_USB_EHCI_HCD=y
CONFIG_USB_EHCI_HCD_PLATFORM=y
CONFIG_USB_OHCI_HCD=y
CONFIG_USB_OHCI_HCD_PLATFORM=y
CONFIG_USB_STORAGE=y
CONFIG_MMC=y
CONFIG_MMC_SDHCI=y
CONFIG_MMC_SDHCI_PLTFM=y
CONFIG_MMC_DW=yCONFIG_IOMMU_SUPPORT is not set
CONFIG_EXT3_FS=y
CONFIG_MSDOS_FS=y
CONFIG_VFAT_FS=y
CONFIG_NTFS_FS=y
CONFIG_TMPFS=y
CONFIG_NFS_FS=y
CONFIG_NFS_V3_ACL=y
CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_ISO8859_1=yCONFIG_ENABLE_MUST_CHECK is not set
CONFIG_STRIP_ASM_SYMS=y
CONFIG_SOFTLOCKUP_DETECTOR=y
CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10CONFIG_SCHED_DEBUG is not set
CONFIG_DEBUG_PREEMPT is not set
CONFIG_FTRACE is not set
ADD _ALL
CMP BL , 55
move ADD_ALL
MOV AX, DS: [BP]
DATA
MOV AX @DATA
ADD AL , BX
.data CAPORIGA DB 13,10,"$"
MOVE ALL , 8
INT 21H
MOVE AH,01H
DATA DB N,?,N DUP (?)
MOVE CX,DS: [BP]
.COMPACT
HUGE 64K
MOVE ax,@DATA
0
Categories
- All Categories
- 167 LFX Mentorship
- 219 LFX Mentorship: Linux Kernel
- 795 Linux Foundation IT Professional Programs
- 355 Cloud Engineer IT Professional Program
- 179 Advanced Cloud Engineer IT Professional Program
- 82 DevOps Engineer IT Professional Program
- 127 Cloud Native Developer IT Professional Program
- 112 Express Training Courses
- 112 Express Courses - Discussion Forum
- 6.2K Training Courses
- 48 LFC110 Class Forum - Discontinued
- 17 LFC131 Class Forum
- 35 LFD102 Class Forum
- 227 LFD103 Class Forum
- 14 LFD110 Class Forum
- 39 LFD121 Class Forum
- 15 LFD133 Class Forum
- 7 LFD134 Class Forum
- 17 LFD137 Class Forum
- 63 LFD201 Class Forum
- 3 LFD210 Class Forum
- 5 LFD210-CN Class Forum
- 2 LFD213 Class Forum - Discontinued
- 128 LFD232 Class Forum - Discontinued
- 1 LFD233 Class Forum
- 2 LFD237 Class Forum
- 23 LFD254 Class Forum
- 697 LFD259 Class Forum
- 109 LFD272 Class Forum
- 3 LFD272-JP クラス フォーラム
- 10 LFD273 Class Forum
- 152 LFS101 Class Forum
- 1 LFS111 Class Forum
- 1 LFS112 Class Forum
- 1 LFS116 Class Forum
- 1 LFS118 Class Forum
- LFS120 Class Forum
- 7 LFS142 Class Forum
- 7 LFS144 Class Forum
- 3 LFS145 Class Forum
- 1 LFS146 Class Forum
- 3 LFS147 Class Forum
- 1 LFS148 Class Forum
- 15 LFS151 Class Forum
- 1 LFS157 Class Forum
- 33 LFS158 Class Forum
- 8 LFS162 Class Forum
- 1 LFS166 Class Forum
- 1 LFS167 Class Forum
- 3 LFS170 Class Forum
- 2 LFS171 Class Forum
- 1 LFS178 Class Forum
- 1 LFS180 Class Forum
- 1 LFS182 Class Forum
- 1 LFS183 Class Forum
- 29 LFS200 Class Forum
- 736 LFS201 Class Forum - Discontinued
- 2 LFS201-JP クラス フォーラム
- 14 LFS203 Class Forum
- 102 LFS207 Class Forum
- 1 LFS207-DE-Klassenforum
- 1 LFS207-JP クラス フォーラム
- 301 LFS211 Class Forum
- 55 LFS216 Class Forum
- 48 LFS241 Class Forum
- 42 LFS242 Class Forum
- 37 LFS243 Class Forum
- 15 LFS244 Class Forum
- LFS245 Class Forum
- LFS246 Class Forum
- 50 LFS250 Class Forum
- 1 LFS250-JP クラス フォーラム
- LFS251 Class Forum
- 154 LFS253 Class Forum
- LFS254 Class Forum
- LFS255 Class Forum
- 5 LFS256 Class Forum
- 1 LFS257 Class Forum
- 1.3K LFS258 Class Forum
- 10 LFS258-JP クラス フォーラム
- 111 LFS260 Class Forum
- 159 LFS261 Class Forum
- 41 LFS262 Class Forum
- 82 LFS263 Class Forum - Discontinued
- 15 LFS264 Class Forum - Discontinued
- 11 LFS266 Class Forum - Discontinued
- 20 LFS267 Class Forum
- 24 LFS268 Class Forum
- 29 LFS269 Class Forum
- 1 LFS270 Class Forum
- 199 LFS272 Class Forum
- 1 LFS272-JP クラス フォーラム
- LFS274 Class Forum
- 3 LFS281 Class Forum
- LFW111 Class Forum
- 260 LFW211 Class Forum
- 182 LFW212 Class Forum
- 13 SKF100 Class Forum
- 1 SKF200 Class Forum
- 1 SKF201 Class Forum
- 782 Hardware
- 198 Drivers
- 68 I/O Devices
- 37 Monitors
- 96 Multimedia
- 174 Networking
- 91 Printers & Scanners
- 83 Storage
- 743 Linux Distributions
- 80 Debian
- 67 Fedora
- 15 Linux Mint
- 13 Mageia
- 23 openSUSE
- 143 Red Hat Enterprise
- 31 Slackware
- 13 SUSE Enterprise
- 348 Ubuntu
- 461 Linux System Administration
- 39 Cloud Computing
- 70 Command Line/Scripting
- Github systems admin projects
- 90 Linux Security
- 77 Network Management
- 101 System Management
- 46 Web Management
- 64 Mobile Computing
- 17 Android
- 34 Development
- 1.2K New to Linux
- 1K Getting Started with Linux
- 371 Off Topic
- 114 Introductions
- 174 Small Talk
- 19 Study Material
- 507 Programming and Development
- 285 Kernel Development
- 204 Software Development
- 1.8K Software
- 211 Applications
- 180 Command Line
- 3 Compiling/Installing
- 405 Games
- 309 Installation
- 97 All In Program
- 97 All In Forum
Upcoming Training
-
August 20, 2018
Kubernetes Administration (LFS458)
-
August 20, 2018
Linux System Administration (LFS301)
-
August 27, 2018
Open Source Virtualization (LFS462)
-
August 27, 2018
Linux Kernel Debugging and Security (LFD440)