kswapd0 at 99% - observations and conclusion [KERNEL ISSUE]

Excuse the apparent delay, but I've been testing things thoroughly. I have a knack for making things TL;DR, so let me give an exec summary first

  • test kernel 4 is the best yet
  • my previous tests were flawed*
  • test kernel 4 has a new problem
  • test kernel 4 appears to have "moved" the kswapd0 issue to a later/further point

*I've kept my testing the same to ensure I was comparing apples with apples, but when I pulled up the chrome task list at /u/reynhout 's request, I noticed that one of my tabs (processes) in the Chrome browser was autoloading itself at regular intervals. This may have caused an apparent kswapd0 runway condition as kswapd0 never got to complete swapping out before the process reloaded itself.

As soon as I realised that, I've cut that (ingress) from the tests. But that also meant I had no baseline.

So have spent two days just pushing and pushing with various tabs inside chrome, programs outside of chrome, and I think we're at this stage:

  • everything goes fine until we hit the 1.7m out 1.9 avail point at that stage, both CPUs can go to 100%. At one stage that didn't come back for many, many minutes. Even the mouse pointer wouldn't move, although it was buffering, so it would move later on. I've also experienced this for a shorter period, like 5-10 seconds. And also not experienced it at all. This happens the first time that kswapd0 kicks in, and never subsequently. You'd have to reboot to try and get it to do it again.

  • kswapd0 goes bananas as it used to, but in a reasonable timeframe (10 to 30 seconds, up to a minute, depending) it does what it needs to do and it goes back to 0% CPU I can load more tabs and processes this way than previously. Every time kswapd0 kicks in, after some time, it completes.

  • Eventually, well before swap is full (15-25%), with many tabs/processes open, more than double than the 4.1 beta kernel, adding the next process does seem to take us to the point where swapd0 doesn't end any more. It seems that at that point, we're back to where we were previously.

This is my current view of it - I'm going to push it some more. It's a time consuming thing to do, and I need to repeat it a number of times to feel there is some consistency to it.

That said, whatever the change in test 4 is, it appears to have pushed the problem further away. There still is a point where there is lots of swap to go and it gets stuck.

The behaviour of kswapd0 returning to 0% was new to me, and may actually have happened in test 3 as well as the beta 4.1 kernels because of my flawed (real life) test. I plan to rerun some of my new (real life) testing on the test 3 and beta 4.1 kernel to see what it does and I'll report back so nobody is chasing ghosts. Give me a day to push test 4 some more so my observations are consistent, then I'll go back and run test 3 and beta 4.1 and report on any previous observations that may have been red herrings.

EDIT: corrected 4.0 to 4.1

/r/GalliumOS Thread Parent