Yup, that’s exactly the root of the problem. It has nothing to do with precision of data, or networking itself. Variable processing times ( aka: a slight slowdown on when you process the snapshots ) can cause it. So this happens all the time.
Ideally, you’d specify a constant time offset, let’s say 50 ms in the past, so that you have enough snapshots to interpolate.
If the snapshots get delayed, at some point you’re not going to receive the next snapshot in time. That’s where you do extrapolation. But then when you eventually receive your missing snapshot, you might be receiving the 2 or 3 next snapshots at the same time that got hold up. At that point you’re too far behind the “latest snapshot”’ server time, so you need to catch up. If you do it instantly you’d see the position getting teleported. So what I’ve been doing is to slowly interpolate the client time to match up the server’s time ( minus the artificial offset of, say, 50 ms ). That works but results in the stuttering problem.
The effect is essentially what happens with cars on a highway. When one car at the front slows down then accelerates again, it creates a bottleneck kilometers behind.
This stuttering is basically the ship/servertime slowing down and speeding up when the snapshots catch up again? I would have though that when an extrapolation happens that the ship keeps its velocity/acceleration and just continues at the same (increasing/decreasing) speed, in my imagination the only difference between the extrapolation and the snapshots that are catching up would be if the ship made any changes in its speed or direction during these missed/delayed snapshots.
So a visible correction would only occur then and not with every missed/delayed snapshot.
If I was to receive a timestamped message exactly every minute to move a figure one inch forward and at the 30 minute I don’t receive the message at the exact minute I would assume that the message is delayed and continue to move the figure, say after 20 seconds I receive the delayed message and check if my anticipated move was correct or if the sender made any changes in movement in the new message. I would then move the figure at its correct position or it stays where I moved it because my anticipated move was correct. After 40 seconds the next message is received and I move the figure again one inch.
How I understand it now is that I wait for the message, move the figure one inch. This happens again till the 30 minute, the message is delayed, I don’t move the figure, after 20 seconds the delayed message arrives and I move the figure one inch, after 40 seconds the next message is received in time and I move the figure again one inch. This would then result in a slow down and speed up in movement.
Could this stutter be resolved if the position is extrapolated every time a snapshot is delayed and not just when a snapshot is missing? The delayed snapshot would be only used to check if the extrapolated position was correct and if not corrects the ships position accordingly.
After thinking about this for a while I think I understand. The only thing I don’t see is how the client can fall behind. I can see how he can miss snapshots but don’t see how the clients clock can be slower than the one from the server.
If the client receives 3 snapshots he can discard the ones that are behind his “playback” time (the 50 ms for instance).
On top of that extrapolation should generate local “snaps” when one from the server are missing. These ones are never replaced, even if they have a timestamp in front of the current “playback” time. This is because that could create hard snapping. An incoming snap will only be used if no local one has been generated.
The client should then adjust his “playback” time to further avoid extrapolation. As said this shouldn’t be too aggressive due to possible side effects.
You could use extrapolation to lower bandwidth and server resources by only sending every other snapshot to the client. The extrapolate would then fill in the gaps.
The client clock should never get out of sync with the server clock. It doesn’t need to be all that precise either. I think precision up to one ms is enough as you probably wouldn’t want to run any algorithm more often than that. What it should not is shift.
Brainstorming:
So we have the server that collects user info and combines them into snapshots and numbers them. (And of course other stuff).
At login the clocks are synchronized with the client … or not, don’t know how clock synchronization works over the net. That time we call “now time”.
The server himself needs time to receive data, process it and pack them into snaps. So the server also needs to have a somewhat have a buffer/leeway to do that. Depending on server power and calculation load this is set to something. We call that “truth time”. In order to do this the server gives himself some time by declaring that the snapshot that he is creating now is valid at some point in the future.
For instance. “time_truth”=“time_now”**+**5ms
As soon as he calculated that he sends it to the clients, may even be before the truth time arrived.
There’s also a paket interval. Lets say it is: “snap_interval”=10ms.
The clients also needs time to calculate and receive the snapshots. So in order to do that they playback the snapshots from the past. We call this “playback time”.
For instance: “time_playback”=“time_now”**-**50ms.
The client receives snapshots and adds them to its bank of snapshots if it’s not older than its “time_playback” and is not the next snapshot the playback is moving towards.
Example:
“time_now”= 1503575432068
Server Prepares snapshot Nr. 5 with “time_truth”=1503575432073
Server Sends packet at “time_now”=1503575432071
Client receives packet at “time_now”=1503575432090
Clients “time_playback”=1503575432040
Client has snapshot Nr.1 with “time_truth”=1503575432033
Client has “extrapolated” snapshot Nr.2 with “time_truth”=1503575432043
Client has snapshot Nr.3 with “time_truth”=1503575432053
Client has snapshot Nr.4 with “time_truth”=1503575432063
Client checks if truth time of snapshot Nr. 5 is older than the snapshot it’s is currently traveling to. Nr. 2.
(Edit: Actually just checking if 5 > 2 is doing the same thing.)
Client adds snapshot Nr. 5 with “time_truth”=1503575432073
Client continues playback and sends own snapshot versions to the server (shots fired, hits landed … etc)
Snapshots could be set up so they could have as much and as little information as needed with the rest being assembled/extrapolated by the server/client
It’s a pretty simple concept. Note the time (T0), send a message to the server. Server sends its current time back (TS). Client receives message and notes time again (TC). Do some math to determine the difference in clocks by assuming that the trip times between client/server and server/client are the same.
(TC-TS) - ((TC-T0) / 2)
Edit: While I suspect that the above is similar to NTP’s algorithm, I didn’t mean to suggest that’s the way they do it. It’s just what occurs to me as a way to do the synchronization.
Thanks. I’ll give it another try - though I think those are the defaults
Okay, I think I understand the problem you are getting at now, but I’m still confused on some of the specifics.
Why are you changing time itself? The snapshot is either there at the time of rendering, or it isn’t. You extrapolate if it isn’t. You then receive the correct data after the rendering, but instead of simply updating it, you do some interpolation corrections over time to bring it to the correct location without causing teleporting. Are you trying to vary the delta-T value in order to minimize extrapolated distance errors in the hopes of eventually getting the authoritative data and then smoothing the correct data back into the game?. i.e dD=v*dt – smaller change in time = smaller distance traveled = smaller corrective adjustments
Are you interpolating the time at which the snapshots are played or are you interpolating the positional data between each snapshot?
If this is the case, wouldn’t you end up needing to interpolate all snapshots? Also, wouldn’t this be smoothed out within the “ideally” 50ms buffer?
Anyway, thanks for taking the time to answer Flavien. Just trying to understand the problem and I would love to be able to help you out.
As Flavien said earlier, the problem - whatever it’s cause - is evident only at high absolute velocities, ie 5 KPS.
Therefore, reducing these velocities to near-zero - by transmitting ‘bubble’-relative velocities for nearby ships - will solve the problem.
This is a design challenge, no doubt, but I think you would agree that such a system would resolve this issue.
Likely you’ll need to decouple client-side physics and interpolation from the server (which is already the case anyhow). Instead of receiving absolute values, clients within a bubble will receive bubble-relative coordinates/velocities for everything else (or you could have multiple physics sub-spaces, whatever… Depends on the flexibility required). The sever still operates exclusively with absolute positions, but tailors each client’s network physics update to respect the relativity bubbles. This is analogous to a moving origin technique, but also considers the ship’s velocity ‘origin’ (i.e. the velocity of the local reference frame / bubble).
Thus the physical values used by the client engine will be near zero for nearby ships (because they represent relative velocities etc), and thus jitter or precision or interpolation/extrapolation artifacts will still happen but will not be visible because they will be relegated back to the millimeter scale of error.
Symptom.
So you suggest we should just stop looking for the cause?
I find it quite irresponsible to start a whole redesign of some major systems without justifying it with some proof of why it is needed. If it turns out there’s a limitation, which currently we can’t think or find any source of, to how the internet or high precission arithmetic works, then sure a redesign of the basis can be done to hide this limitation. As a side effect it would also allow for only transmitting dfloats for bubbles instead of every single entity, could reduce load on both network and computation.
Flavien already tried out such a system. JB implemented it in his warp prototype from the beginning. We discussed it. It’s cool and all. But we didn’t compare their pros and cons. Just claimed it would solve some symptom that appears in games and people, including me, just assumed was some inherent limitation that isn’t solvable with traditional coordinate systems. This doesn’t appear to be the case.
[quote=“thelazyjaguar, post:48, topic:5747”]
Why are you changing time itself? The snapshot is either there at the time of rendering, or it isn’t. You extrapolate if it isn’t.[/quote]
Allright, try to imagine this scenario: 20 snapshots per second = you receive a snapshot every 50 ms. You introduce a delay of 100 ms to ensure you have enough interpolation data, ie. you’re 2 snapshots behind for rendering. The latency to the server is 100 ms.
Everything is stable, you receive a bunch of snapshots every 50 ms, rendering is 100 ms behind, no problem. Then suddenly, latency to the server suddenly changes. To a whole second. Yeah it’s extreme, but it’s for the sake of the exemple, bear with me. And worse… it stays there, forever.
On the client, there’s a 900 ms gap during which you extrapolate. After that you start receiving the snapshots from the server again. They’re obviously all dated, so you reject them. In perfect conditions it’d just be a lag spike and latency would reduce back to 100 ms, which means you’d receive 18 snapshots at once, and the latest one would end up being synchronized again. But that’s not what happens here. You’re now in hell: you’re extrapolating all the time. Despite still receiving 20 snapshots per second from the server. Because the latency changed durably.
So there’s a need to re-synchronize the client time to be 100 ms behind the latest snapshot. If you change that time instantly, you see actors get instant teleported, which is no good. So instead you keep track of the “curretn” time on the interpolation curve and the “desired” time, which is 100 ms behind the latest snapshot. And you do a smooth interpolation of that time, aka. speeding up or slowing down the time itself. That way when latency changes, interpolation adapts smoothly to the varying conditions. But you get stuttering…
So the client realizes that the last couple of snapshots he received are way behind his playback time.
I assume this desired time is taken once. Before the time readjustment is started. If it’s not and the “life” time of always the newest snapshot is used we have a situation where you try to interpolate to a target that constantly changes. Making your playback time stutter.
If you interpolate between two points you keep track of your progress from point a to point b. If during the interpolation point b changes you will get different result for the same progress you had before. That’s basics but just wanted to point that out.
So the client realises what’s going on and calculates a new target playback time based on the last few snapshots it got. The arival time averages out at about “time_now” - 1000ms. So it chooses “time_playback_new”=“time_now”-1000ms-100ms.
Now it starts to interpolate its “time_playback” from “time_playback_old” to “time_playback_new”.
What happens in this timeframe? Each cycle the “time_playback” changes. So every calculation that takes that time into account and does stuff for more than one cycle is affected.
Position interpolation for instance. Position interpolation probably uses “time_playback” for its “progress”. Something like (most basic interpolation, just an example):
When “time_playback” is interpolated it may move faster, slower or even backwards. Compared to normally when it moves exactly like time does. One second in one second.
This depends on how large the lagspike is and how agressive the “time_playback” interpolation is configured. I make it simple here and just assume linear interpolation. But you can imagine how other interpolation would influence it. It’s basically the same with the difference that the speed is faster in some sections of the interpolation then in others instead of linear like it is in this simple example of linear interpolation.
So in some situations positional interpolation is done faster. If network connection gets better.
In some cases positional interpolation just stops. Because the the time interpolation moves backwards in time exactly one second per second.
In some cases positional interpolation moves backwards. Where the time interpolation moves backwards in time more than one second per second.
If you constantly try to adjust “time_playback” to be exactly 100ms behind the time you receive snapshots you will probably see what you are seeing with ships stuttering arround. Why?
Because you will never have packets arive at exactly the same time. Even on localhost. Network dirvers and the operating system aren’t set up for this kind of consistency.
Companies trying to use TCP/UDP for communication that is dependent on consistency (motor control, safety) have to use real time network drivers that are run in a precisely crafted environment (read: not Windows, Linux or similar).
But you don’t need that consistency. That’s what the 100 ms buffer and the “time_truth” is there for.
I would suggest checking if either interpolation targets are moved while interpolation is going on or if “playback_time” interpolation just happens too often or/and with extreme settings/speed.
The algorithm to decide if its needed should be quite picky.
Both could result in stutters like you and Keith described.
Edit: Now do these stutters increase with velocity? I think they do. The faster an entity gets the more difference there in his position in two successive snapshots. Making the distance over that is interpolated larger. Snapshot frequency on the other hand stays the same. The larger the interpolation distance is, the more it amplifies either of the two theoretical sources of the stutter.
So this theory lines up with observation.
It kind of acts like a regulatory circuit that hasn’t been configured correctly and is oversensitive.
@navyfish, the high velocity doesn’t matter if the precision is sufficient in the absolute method, also he’s interpolating position and the position changes at the same rate even in a relative scenario. As I said, [some relative island]+[offset] and [absolute value] is exactly the same thing if the precision in the absolute method is sufficient.
/edit @inovaeflavien I still don’t understand why there is stutter, you should be constantly interpolating the positions, even if you extrapolate at some point you should interpolate from that point. In essence you should be smoothing all the time and there won’t be any stutter, because even if you do get it wrong, you keep interpolating. The presence of stutter indicates to me that you ar at some point reverting with a hard reset to some position, don’t do that, always interpolate.
Well that picture speaks volumes and certainly makes clear how a simpleminded approach would introduce jitter. I’m having a rather more difficult time figuring out Flavien’s description. For example:
Why isn’t the latency 50ms? I should be keeping a queue of 2 updates that I use for interpolation. As a new update arrives I kick out the head and add to the tail of the queue.
This seems to be the meat of the issue, but I can’t make heads or tails out of it simply because I don’t think this way.
Stuttering is due to something being introduced to the interpolation process instantaneously, as depicted in @hrobertson’s picture, where the updated position is introduced wholly and instantaneously into the interpolation process, snapping the ship to its updated position. That’s not what Flavien’s code is doing, but it’s a good illustration of a source of jitter.
As I don’t have access to Flavien’s code, it’s impossible for me to guess the true source of the problem. I’d debug it by putting constants into the interpolation process until I found the source of the problem. For example, put in fake updates that arrive at exactly uniform intervals. Still have jitter? Starting making them arrive at exactly uniform delays. And so on, placing a tighter and tighter straightjacket on the code until some red flag pops up. Or red flags. There may be multiple sources or an interaction of multiple sources.
That said, I get the impression that Flavien knows the source of the jitter, but it can’t be removed entirely.
I’ll relate a story of a little bug that produced a kind of jitter in my own code once upon a time.
I was working on a flight simulator. I had just introduced the aerodynamics code that used lift and drag curves, thrust, gravity and so forth. Well, once aircraft started to speed up, they’d start to experience buffeting. Keep piling on the speed and the buffeting would eventually destroy the aircraft. We lived with it for a couple versions because I couldn’t figure out why it was happening. The equations all looked solid and we certainly understood it inside and out.
The problem? The lift and drag curves were a series of points, and we linearly interpolated between them. It turned out that the linear interpolation code was interpolating ‘backwards’. So instead of a smooth curve, the system was working off what looked like a saw tooth curve. So you can imagine that as the planes changed angle of attack across a ‘tooth’, there would be a sudden change in lift or drag, dramatically changing the characteristics of the aircraft and introducing the observed buffet. A quick fix to the interpolation code and all the aircraft flew beautifully.
You just never know where the source of a problem might lie.
I’m making a test app in Visual Studio C# to demonstrate the problem. That should speak more clearly about the need to introduce time smoothing, which is the cause of stuttering. I’ll upload it in a few hours.
Alright, so If I’m understanding this correctly, this problem revolves entirely around keeping the server and client synchronized.
In a perfect world, both clocks on the server and client would be the same when the old packets came in. However, when the packets come back, the snapshot ID’d by the server as being 100ms behind current might coincide with the client snapshot that’s say 90ms behind current. Therefore, you would have to slow the client’s clock by 10ms so that it is correctly synchronized with the server again, and as you said, at 5km/s, 1 ms = 5m that the position of the ship would be interpolated by or 50m at 10ms.
If my understanding is now correct, then I have a problem with doing it this way.
Let me start at when the client lost contact with the server. The client tracks a ship at position A and time T1, which is the last known authoritative answer from the server. The client then extrapolates during this time until the lost packets are received. At this point, the client registers an extrapolated position of B and a time of T2. The server on the other hand, says that the latest authoritative answer is position C at a time of T3, Mind you, when I say T3, I refer the the desired example time lag of 100ms.
Going forward, you would opt to synchronize T2 to the server, but why? Both position B and time T2 are flat out wrong. The player (not game client) only sees position B, but not time T2. That’s the illusion for the player. Your only real objective is to interpolate from position B to position C so that the object moves smoothly across the player’s screen. If T3 is 2 snapshots behind render, then that’s the number of snapshots you have to interpolate the position through. If you need more snapshots to make a smooth transition, you can create a forward bias that increasingly weights interpolation towards the server’s value until it reaches a “good enough” threshold and just uses the server’s value.
Use the snapshot at a given time T2 that is closest to time T3 or interpolate the two nearest if one is missing and said interpolation is closer to T3 than T2. Then, throw out time T2 and use time T3. The client’s previous times are irrelevant. You simply restart the client’s clock.