ETH ZS server lags
We need your help to solve a server lag problem on the Ethernity ZS server.
It’s been happening for months now after we switched to the official 2016 vanilla version coming from Github. The server was running the 2014 vanilla version prior to that and we were not experiencing this problem.
What we know :
- Server starts lagging when ~40 players or more are connected
- Players latency then fluctuate between normal and 10x more (ex. 11 to 120)
- Lags last around 1 minute then goes back to normal for a few seconds
- Physical server can sustain a 64+ slots ZS server and got a decent fibre internet connexion (gigabytes)
- Server OS is the latest Linux Debian
- Logs doesnt show any errors
- It’s not DDOS attacks
What we tried :
- Allocating up to 8gb of RAM to the ZS server
- Changing server tickrate to 33 then to 22
- Monitoring the physical server which didn’t show any problem (see CPU usage, upload and download)
- Monitoring ping routes using Pingplotter didn’t show any problem
We are running out of clues to solve those lags, last one being the gamepanel causing trouble which is unlikely since it’s installed properly on the same machine. Server is not vanilla anymore, we did update it to change the menus and balance the gameplay but there are no addons and those changes don’t impact networking. Lags appeared before the updates after installing the latest ZS official version.
It happens in any given situation, from early game to the end, in survival or objective maps, when there is action and when it’s “calm”.
Do you have any idea how to fix this, or where it comes from ?
Any help would be appreciated.
Prof. Dru last edited by
@azona I can assure you it isn’t a bandwidth issue, it’s most likely a CPU issue. I’m no developer, but most of the problems Nox faces are due to gmod’s heavy demands on both the server and client. If you look at the update log for the official ZS server you’ll see a ton of optimizations have been made to reduce server load.
My suggestion to you is to try the github build without any of your modifications, and see how that runs. Without knowing what kind of CPU your server is running I can’t really tell you if it can handle it though.
The Darker One last edited by
I don’t get how an older version that wasn’t as optimised as the current public one wasn’t lagging.
I didnt specify in the first post, the CPU is a Xeon E3-1245v2 4c/8t (3,4ghz / 3,8ghz)
A gmod server is executed on only 1 thread of a CPU core ?
Looking at the CPU usage picture, it would explain a lot, since 1 thread represent 12,5% of 8 threads.
Concerning server load optimisation, I guess our release of ZS is up to date because it’s the latest available, we did apply some of the fixes released by C0nw0nk but it was a few weeks ago.
The 2014 ZS version we used before rarely went over 40 players, and no one seemed to remember similar lags but we may be wrong.
Are the updates concerning server load included in the version we use ?
Note : We are going to try another gamepanel (Gameserver manager) in case the one we currently use is causing trouble.
srcds currently and probably always will use just one thread. You can’t go above 12.5% cpu usage so single/dual core servers are best.
Thanks for your answers, I hope we can solve this soon I’ll come post again when we do !
We finally found the problem, it was caused by a script preventing players to proplaunch objects on other players.
With the script enabled, 2 players on the server : 56% of the CPU core thread used
With the script disabled, 2 players on the server : 20% of the CPU core thread used
Still need to be tested live but normally it’s fixed !
Hello again !
We did test the previous fix one month ago, unfortunately for us the lag spykes continued like nothing changed.
Since then SliderCJ, the server programmer, and me have worked our asses off to fix this problem.
We are still convinced this is a CPU overload that is causing it.
Since last month he tried the following :
- Testing and optimising all the scripts we implemented to fix known bugs, reducing server CPU charge to 16% with 2 players on the server (compared to the initial 56% see above)
- Optimisation of the balancing code we implemented by removing everything possible using the “Think” function inside the init.lua file
- Changing the gmod server version that was not set up to the automated update mode (version was from 2014), which eventually stopped the random server crashes during map changes we were experiencing
- Setting up fine network settings for the server.cfg file
As a result, the server stopped to crash randomly, and can now handle ~45 players without unplayable lags.
But still, our game design direction was pointed to a 65-70 players server maximum, so right now half of the gameplay is not visible to the players. It is finely balanced and we can make it even finer easily now if needed for higher populations, since we couldn’t test it live in these conditions.
That’s why I’m asking again for help.
Last track we investigated : looking to see if Litesql gmod database could cause it, and to the possible advantages to use Mysql instead.
We are ready to discuss this somewhere else than the forums, in a professionnal way, why not using teamviewer if needed.
Since the internet failed to help us, well if you can’t either we will just have to reduce server slots to 35.
Thanks for reading
PS : If you actually help us to resolve this issue, I’ll give away my personnal, just installed, up-to-date (12/2017), optimised, ready to play version of Brutal Doom in this thread
( ͡⦿ ͜ʖ ͡⦿)
commandhat last edited by commandhat
But still, our game design direction was pointed to a 65-70 players server maximum, so right now half of the gameplay is not visible to the players.
Gmod Tower (now Tower Unite, on a different engine) also had problems with any more than this many players – I suspect something kicks in past 64-ish players that makes it really, really hard to do a certain thing needed for networking.
What happened at Gmod Tower:
- At 71 players, physics no longer replicated correctly - everyone would see something different in a given physics situation. The server would only update a given prop’s position if a client or another prop interacted with it.
- At 81 players, the voice codec became extremely unreliable, almost to the point of unusability, even with just one player sending anything at all.
- At 91 players, you couldn’t see player movement - their network connection was entirely eaten by other things. Chat, some form of physics, and networked lua still worked – but player interaction didn’t. Granted, this may be from the Gmod Tower devs not having extremely slimmed down netcode.
- 100+ players was completely unworkable. Chat responded, but was insanely laggy, and you couldn’t move at all. Either from DDoS, SRCDS falling to it’s knees, or something in the hardware going ape shit at that much information to process.
Gmod Tower always had their limit at 80 players shortly after the 144 player limit test, and about 6 months later they set their max to 70. That was as far as they were willing to go for player count, and I suspect that’s as far as SRCDS is willing to go stability wise.
Note that this is a total of:
- 70 players in the lobby
- 12 players in PvP
- 8 players in Ball Race
- 12 players in Virus
for a total of 102 players running off of one box, across 4 SRCDS instances.
I suspect you could go higher, if you maintained only one SRCDS instance, if you trusted your life with your netcode, if you honestly believe your box has the resources to handle it, but I wouldn’t recommend it.
Yes and I agree, we don’t want or need to go above 65 / 70 / 75 players, this was a clear game design element since we started working on our ZS version 10 months ago.
We can’t go above 45 at the moment, and I guess it’s a normal thing coming from a vanilla-like version. But we went to the maximum extend of our understanding of the ZS gamemode code and decided we can’t optimise it more based on what we know and learned from the past 2 months. SliderCJ did his best to optimise the code he wrote himself to the point a few changes started to cause trouble and had to be reverted. That’s why we won’t push it any further unless we get external help.
I come here again because I know Noxiousnet ZS goes up to 75 players without unplayable lags, and that some other servers can go or went above creating similar problems to what you have described above.
On the same box, there is also a gmod Jailbreak server running (20 players max), installed 2 months ago, and a few inactive CSGO servers, plus the gamepanel.
We haven’t touched the netcode from the vanilla version, and I can post here the server.cfg network settings if needed, but I don’t think it’s coming from the tickrate or related parameters.
We could easily host 100 players if there was no reliance on every player calling a custom ShouldCollide. There’s just no way around having to use it though, I do wish there was better ways to do it. I optimized it as much as possible earlier on but that’s probably the biggest cpu hog.
There’s an addon that helps you measure perf down to individual functions. fprofiler or dbugr or something. Wouldn’t leave it on your server since that’s a hog itself.
We will investigate a bit more then. Anyways, thank you nox !
AtomiC last edited by
@jetboom This is for barricade and prop ghosting?
Raox last edited by Raox
Pretty sure it’s also used just for dealing with team to team collision. It gets very bad on moving objects where players are clustered together.
That’s mostly why. Bullets are completely custom and don’t use it. Using the stock engine behavior for non-solid team collision will screw up collision with other objects. Requested a fix a long time ago, never got one.
This remembers me some people breaking their coding skills and all the collisions on a full live server, trying to fix the turret bullet collisions with humans players lol
After days of brainstorming and testing, SliderCJ managed to improve the ghosting system along with the shouldcollide.
62 human players can stand right in the same place, server CPU being able to respond correctly to the collision requests.
Then again, thanks for your help, from all the ETH ZS community !
If you’re using the ShouldCollideWIthTeammates bind or whatever it is then you’ll be breaking some other stuff.
SliderCJ : “The Shouldcollide and Shouldnotcollide remain untouched, I removed the Shouldcollidewithteamates (it was creating trouble) and added restrictions for the execution of the ghosting check which was being executed all the time.”