Introduction
I saw a discussion on LinkedIn by Nikita Gordeev about a .NET Core app running on AWS EC2. The app was experiencing memory spikes, and the team identified the problem and provided a solution. I thought I’d share this information in our dev notes, as it might help others facing the same issue."
The Problem They Found
They found that service memory utilization significantly increased, but there no additional load, no recent changes in code base, utilization increased for all containers. What can be the reason?
A bit more context:
- application in netcore 8
- hosted on AWS ECS based on EC2 (not fargate)
P.S. Yes, we found the reason, I'm not writing post when something wrong in the system 😅
The Key Insight: vCPU Mismatch in AWS and .NET Garbage Collection
From Raif A. The problem comes from AWS concept of vCPU which is not the same as a CPU core visible to the OS. vCPU is implemented on the hypervisor level by time slicing of CPU time in the kernel, but the OS is still exposing all physical CPU cores to all docker containers. So .NET GC (unaware of AWS) thinks it has all these cores, so it tries to "optimize" GC for multicore by allocating many GC heaps and many GC collector threads one per "core". So heap size increases by underlying EC2 cores accordingly. Setting the variable tells the GC to not be smart and optimize for X cores (as set in the variable) that should be approximately the same as number of vCPU. AWS 1 vCPU unit is roughly around 1 2-3 GHz Skylake CPU core in IPC.
The Fix
The solution that helped Nikita’s team was setting the environment variable DOTNET_GCHeapCount
to match the container’s vCPU limit. This change tells the .NET garbage collector to optimize based on the container’s actual vCPU count instead of the physical core count. By doing so, memory usage returned to normal levels.
Additional Insights from the Discussion
Feng Yuan also chimed in with a useful tip: If memory usage is unexpectedly high, it’s always a good idea to capture a dump for analysis. This will give you a deeper look into what’s happening under the hood. In this case, the dump showed that the memory was being allocated but not used, pointing to inefficient memory management by the GC.
Oualid E. pointed out another potential cause for memory increases: changing the EC2 instance type. If the instance type was changed to one with more CPUs, the .NET Thread Pool might automatically adjust to the increased core count. More threads result in more stack allocations, which could lead to higher memory usage. If the memory increase coincided with the instance change, this might be the root cause.
Conclusion
In summary, if your .NET Core application’s memory usage spikes unexpectedly in an AWS ECS environment, the issue may lie in how the application interprets vCPU allocation.
To avoid any misunderstandings in the whole discussion, I suggest reading the related discussion on LinkedIn..