Tackling Gastropoda's memory usage - Round One
I’ve been having to power cycle my Gastropoda Digital Ocean droplet every day lately because something was hogging up all the memory. It was a little annoying to diagnose because I know the problem had to do with the recurring events that are run using a cron job, of which there are quite a few. I know ideally PHP isn’t something you would use for a long running server process, but these jobs weren’t long running - they were just frequent.
The problem ended up being something very obvious. Every minute I run an event called
CheckIdleSnailActions. It loops through all living snails in the world and sees what they’re doing. Unfortunately at some point that event began to take more than a minute to run. So when I ran
ps aux --sort -rss to sort all running processes by memory usage I saw two instances of
CheckIdleSnailActions running at the same time, one taking something like 60% memory and the other ~30%.
I always knew running these checks every minute would see me encounter issues like these and know I need to find a more efficient way of doing this. In the meantime, the first step was to avoid running these events in parallel.
To do this I ended up using lock files. The cron job runs a Dispatcher command called
RunRecurringEvents. This command grabs all recurring events that are scheduled to run at this time from the database and runs them. These events include things like:
I created a new directory in app/storage called lockFiles.
I didn’t want to stop all of these things from running in parallel for now. The heaviest event by far is
CheckIdleSnailActions. This is where we generate the brain, where the neurons do their work for each snail, where we comb through each snail’s memories for recognition of an object, retrieve and record mood impact, etc.
When each recurring event is triggered it runs only if there is no lock file matching the event’s name in
app/storage/lockFiles. So if there is a file called
app/storage/lockFiles/CheckIdleSnailActions.txt, the event will not run. If there is no lock file by this name it creates one and starts running, and when it finishes running it just deletes the lock file and allows the next instance of the event to run when it is triggered.
So far I haven’t had to power cycle the droplet since this went in. It’s not a full solution (I can’t even imagine how long it might take for one CheckIdleSnailActions event to run if the world ever has thousands of living snails in it), so this is a work in progress.