Weekly Update #76

Thanks for all the Info Flavien. Really apreciated.
Still confused and don’t see the problem. :confused:
I understand that order of packets can change, jitter and get lost. Maybe it’s something I’m missing in the interpolation code.
Anyway I’m happy to now know more about the netcode. Looking forward to the next patch. Maybe a broader test with multiple different kind of connections and stability can be a helpfull source of information.

5 Likes

So introducing alien languages should be a snap. Just render with Wingdings. :grin:

3 Likes

I’m using a small tool called “Clumsy”. It’s really cool. Nothing to install, just an .exe to run, and you can introduce artificial lag, throttle and drop packets randomly ( there’s a UI where you can select the numbers ) to simulate Internet conditions. Not that it helps with the stuttering, as I said, it even happens in localhost, with 0 latency and no packets dropped.

I believe all networked games suffers from this problem, but no other games have such high velocities. In your typical shooter a character will move at a few meters per second. The stuttering would be on the order of the centimeter or less, so it’s not visible. So I believe we’re probably the only game that suffers from it to the point it becomes noticeable.

If some of you are interested in brainstorming a solution I can go into details with a more concrete exemple of the problem, with numbers & stuff.

9 Likes

I don’t get why the stuttering would occur even without any lag and zero latency.
Why does high velocities introduce positional errors?

If the speed of the moving object remains the same during these snapshots why can the predicted position be different from the position that is send in the following snapshot? I would think that this occurs because the game renders the object at the position the snapshot provides, regardless if this snapshot was made a few milliseconds ago and would therefore reset the objects position to a position that is ‘old’. If that is the case stuttering would be inevitable because the game is always trying to catch up the objects position and moving it all over the place depending on the timing because it doesn’t take into account the fact that the snapshot isn’t timed perfectly.

I would think that decoupling the rendered current state of the moving object from the snapshot would resolve this issue. Is the current rendered position of the moving object always the exact position from the snapshot or does the game know when a snapshot is received out of timing and interpolates the actual current position based on the probable real position?

I would imagine it something like his

snapshot 1: position x1 y0 z0, timestamp

snapshot 2: position x2 y0 z0, timestamp +1s

snapshot 3: position x3 y0 z0, timestamp +2s (received out of timing 0.5s, the objects jumps back in the traveled space because the game thinks it is correct and does not account for the out of timing delay)

snapshot 4: position x4 y0 z0, timestamp +3s (the ojects now jumps forth because the delay is now gone and the object is back at its ‘real’ position)

Now couldn’t this just be resolved by the game looking at the timestamp from the server, seeing that the snapshot was received out of timing and taking this information into account? The then rendered position would be based on the snapshot combined with the timestamp and using this information to interpolate the ‘real’ probable position of the object.

The result would be that predictions on the current positions are not only made when a lag or packet loss happens but also when the snapshot is out of timing.
Or maybe this post is total garbage because I don’t know anything about this kinda thing. :thinking: :joy:

You’d think that’d be obvious right? And this is exactly what Lomsor has described too. There must be complications that we’re overlooking.

Hopefully some example data from Flavien will enable us to better understand the issues and then we can collaboratively achieve a good solution.

3 Likes

Yep, this seems like the logical theoretical solution to me as well. If two ships are flying next to each other at 5km/s along the same vector, why calculate everything from a frame of reference that’s not moving along with them?

It seems obvious that in a space game you could create a hierarchy where the player’s ship was always considered to be stationary and all other ships and objects moved around it. Jitter would then happen only to objects that are moving too fast to see properly anyway. The server could still hold all vectors relative to an absolute frame, then the client could translate those absolute vectors to ones relative to the ship for rendering to the player.

That being said, I totally understand that just because there seems to be an obvious theoretical solution, it doesn’t mean that its a practical solution. I’m also aware that it’s a solution that Flavien has already explored:

2 Likes

This sounds like floating-point precision errors. You are probably trying to take a difference between two numbers that should equate to essentially zero, but due to the high velocity even a small difference will cause a noticeable jitter when applying it to the position calculation somewhere.

If this is the case, then I think the best solution would be to drop significant digits in these situations from whatever calculation is being made so that the difference becomes exactly zero.

FYI, I came to this conclusion base on this discussion on guard digits: https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html#693

Food for thought anyway

2 Likes

It is talked about a problem that has no conceivable source … I think it’s a bit preemptive to say it is unsolvable if we can’t even pin down what the source of these jitters are.

We know that the client has all the information it needs to perfectly calculate the position of a player that has not changed his course.
The most rudimentary extrapolation method is very simple:

If the information provided by the snapshot is correct and accurate enough and the pilot did nothing in the meantime, it should be a near perfect match.
Even if the pilot did move, it shouldn’t have any more influence on his vector if he’s already moving 100m/s or 100km/s in the absolute frame.

It could be precission. The faster one moves the bigger velocity value gets. If there isn’t enough precission for velocity it could generate floating point errors.
From my test calculation usual floats should be more than precise enough though. Theoretecally.
The direction also plays an important role … although if the positional snapping is mostly back and forth in the velocity direction it should be fine.

I think in other games it’s the actual velocity precision that is the problem. They didn’t design the game for high velocity so maybe decided to use lower precision in order to save bandwidth. Just speculation though. We don’t know what causes stutter in other games …

One thing it could also be is the interpolation/extrapolation algorithm. There can be many situations that it could not be prepared for. What if it the snapshot distance changes? So it sometimes compares snapshots that are 20 ms apart and sometimes snapshots that re 30 ms apart.
Maybe floating point errors are amplified trough the algorithm?

Is there any place where information is lost? Are floating point numbers reduced in precision before they are send over the network?

Is velocity and direction “rasterized”? So if I speed up my ship it isn’t using all the available floating point precision but “step” from one allowed value to the next?
I have the feeling this is quite important.

Ninjad a little … but eh.
I personally also seem to have issues with floating point calculating in my current project … it’s also driving me nuts.

It kind of depends on the test he’s performing, but I’m tending to lean towards it being a projection calculation precision error.

If he flies along a parallel vector to the chase target at a constant velocity, then if all the calculations were done relative to each other, there wouldn’t be any difference between the calculation at one point in time and the next.

If they use absolute locations to perform a projection or derive relative values from the absolute locations, then those calculations will be numerically different every time, but should produce the same calculated result…unless you have a precision problem. This would also apply if they are using velocity to predict the next location.

This gets magnified if you end up using an angle calculation to project that ship onto your screen or predict it’s location in space in front of you. If that last significant digit keeps changing, then so does the position you end up placing the ship on screen. If you then try to forward predict based on current velocity, it gets magnified even more.

This is more critical in the chasing scenario where that angle is very small…essentially getting closer and closer to zero. That or the jitter is just plain more noticeable because the difference should be zero, but instead it just keeps dancing around because of that precision error. i.e. it’s probably jittering all the time, but you just don’t notice it until the position obviously shouldn’t change.

Kind of reminds me of the fun I had with python when I first tried it. I’d take an integer, divide it evenly by another integer, and then multiply it by that same integer. Instead of getting the original integer, I’d get a float that was really close boggle

1 Like

Using relative positions/velocities as suggested by @JB47394 and seconded by @Crayfish is clearly the theoretically ‘correct’ solution here. I’m really surprised to discover you aren’t already doing so! I guess I’d assumed the ICP’s ‘bubble’ system had been transplanted to I:BS.

Having read about this problem over the past 4 (more?) weekly updates has me wondering if, from a project management perspective, it wouldn’t be more efficient to step back and implement a proper relative reference frame solution before moving forward. Given how fundamental such a thing is, fixing it later on once more layers of systems have been built on top of the duct tape would be daunting.

Anyhow, one way or another, if @INovaeFlavien is serious about brainstorming a solution, count me in.

Are you testing a connection between two machines? I tried using Clumsy recently and couldn’t get it to affect ARMA when running multiple clients locally. Clumsy’s author warns quite a bit about loopbacks, and I’m assuming that’s what is making things difficult for me.

It makes no difference if velocities and positions are relative or absolute as long as you have the precision. Transferring something like (Island-A + position [0.1]) or (abs-position [0.00001]) has exactly the same effect. Also it doesn’t matter if the ships look like they are stationary to each other, you still have the forward velocity that can be a few km/s, which means that the position is changing regardless if the ships appear to be stationary to each other. An island just creates an offset for the position.

Even if they went with something extreme like making every single ship it’s own island and transferred each ship relative positions of every other ship, it solves nothing if the absolute position can be given at same precision.

As for the current problem of jitter with no lag whatsoever, it’s starting to smell like a code error or some technical oversight.

1 Like

You would communicate relative vectors (position, speed, direction) over the network instead of absolute. There’s the difference. As @thelazyjaguar noted it should not make a difference in the end result. Thanks for that link by the way. Great read.
I like the island/relative idea but there’s no point in implementing it if you don’t even know what for.
Here’s more of that. In there also a link to the first occurrence on the new forums of the discussion between JB and Flavien in the E:D thread:


But back then I forgot to think about if and why there was a problem at all. I just assumed there was one!

We theorize that it would shift this problem to the islands and it might but without actually knowing what the cause of these jitters is it doesn’t make a lot of sense to build a workaround for them. And @cybercritic might even be right and it wouldn’t change anything at all if you just translate it back to absolute at some point …


How would one go about debugging floating point errors? Especially ones that might accumulate? You can determine when they get larger and smaller. But how do you use that information to track down the places in the code/calculation where errors appear and might accumulate (the most)?

I don’t know how complex the prediction algorithm is but an insight into that would be helpful. Or maybe at least some sample outputs from different inputs. Although that wouldn’t tell us more than before.

No it’s not. Besides the code uses double precision everywhere. The randomness doesn’t come from the positions but from the time itself that is getting sped up to catch up with the latest snapshot or slowed down when snapshots are missing.

How do you initialize “Time_Now” ?

Also you can’t extrapolate all the time, only when missing snapshots. Otherwise you should interpolate in the past along the known path. If you extrapolate all the time it’ll render at the wrong position and you’ll even see ships passing through walls during collisions.

3 Likes

[quote=“NavyFish, post:34, topic:5747”]
Using relative positions/velocities as suggested by @JB47394 and seconded by @Crayfish is clearly the theoretically ‘correct’ solution here.[/quote]

The physics engine works in absolute coordinates though so at some point you have to add back your relative + absolute values into something you can feed to the physics engine.

No, it’s all localhost. Try using “localhost ipv4 all” preset. Or filtering: “outbound and ip.DstAddr >= 127.0.0.1 and ip.DstAddr <= 127.255.255.255”

That’s what I use, and it works even in other games ( tested in Overwatch ).

2 Likes

Even for the time? Shouldn’t matter there though.
Precision should be good, but there still will be rounding errors that might accumulate depending on the calculations made. As @thelazyjaguar mentioned sometimes higher precision floats have rounding errors that lower ones don’t.

Yeah maybe a bit too rudimentary. You would need to take the time when you intent to apply/hand these vectors to the physics code. So instead of “Time_Now” it would maybe be “Time_NextLocalTic” with some safety measures in there to check if there’s still enough time to calculate all of it before the next Physics Tic occurs. If there isn’t enough it need to skip one Tic.

Edit: After doing that you can now interpolate between the point you have and the one from the “future”. But ye I forgot to handle what happens when the actual data comes in from the network. Having the game run in the past is much easier …

So you make the local physics engine run a little in the past instead of letting it try to “simulate FTL communications”. That’s alright. How much in the past? Does that vary? Is it fixed like 50m/s / 5 ticks … etc?
What happens if the code needs to extrapolate a certain tick and that tick is then received through the network? Probably it’s treated like interpolation where it’s just “time between ticks”. But with interpolation the endpoint will not change, whereas with extrapolation it might. That needs to be considered.
By the sound of it in the localhost test extrapolation shouldn’t happen, does it maybe?

Ingame time? Why is the speed of time varied? If there is no data there is no data. What’s the point of making data you have last longer?
If the client experiences big amounts of jitter or the quality of the connection changed during the game the client can choose to run further in the past to have more leeway.
But these changes to distance from “Now” shouldn’t happen constantly. Maybe a few dozen times a match.
If it’s been switched instantly it will result in a “jolt”,
If it is done with a ramp it may create distortions (also note if floats are used for the time it may introduce rounding errors). I don’t know what influence a changing speed of time might have on the interpolation and extrapolation algorithms … it might be the source of the prolbem.

Yup, that’s exactly the root of the problem. It has nothing to do with precision of data, or networking itself. Variable processing times ( aka: a slight slowdown on when you process the snapshots ) can cause it. So this happens all the time.

Ideally, you’d specify a constant time offset, let’s say 50 ms in the past, so that you have enough snapshots to interpolate.

If the snapshots get delayed, at some point you’re not going to receive the next snapshot in time. That’s where you do extrapolation. But then when you eventually receive your missing snapshot, you might be receiving the 2 or 3 next snapshots at the same time that got hold up. At that point you’re too far behind the “latest snapshot”’ server time, so you need to catch up. If you do it instantly you’d see the position getting teleported. So what I’ve been doing is to slowly interpolate the client time to match up the server’s time ( minus the artificial offset of, say, 50 ms ). That works but results in the stuttering problem.

The effect is essentially what happens with cars on a highway. When one car at the front slows down then accelerates again, it creates a bottleneck kilometers behind.

7 Likes

This stuttering is basically the ship/servertime slowing down and speeding up when the snapshots catch up again? I would have though that when an extrapolation happens that the ship keeps its velocity/acceleration and just continues at the same (increasing/decreasing) speed, in my imagination the only difference between the extrapolation and the snapshots that are catching up would be if the ship made any changes in its speed or direction during these missed/delayed snapshots.
So a visible correction would only occur then and not with every missed/delayed snapshot.
If I was to receive a timestamped message exactly every minute to move a figure one inch forward and at the 30 minute I don’t receive the message at the exact minute I would assume that the message is delayed and continue to move the figure, say after 20 seconds I receive the delayed message and check if my anticipated move was correct or if the sender made any changes in movement in the new message. I would then move the figure at its correct position or it stays where I moved it because my anticipated move was correct. After 40 seconds the next message is received and I move the figure again one inch.

How I understand it now is that I wait for the message, move the figure one inch. This happens again till the 30 minute, the message is delayed, I don’t move the figure, after 20 seconds the delayed message arrives and I move the figure one inch, after 40 seconds the next message is received in time and I move the figure again one inch. This would then result in a slow down and speed up in movement.

Could this stutter be resolved if the position is extrapolated every time a snapshot is delayed and not just when a snapshot is missing? The delayed snapshot would be only used to check if the extrapolated position was correct and if not corrects the ships position accordingly.

1 Like

After thinking about this for a while I think I understand. The only thing I don’t see is how the client can fall behind. I can see how he can miss snapshots but don’t see how the clients clock can be slower than the one from the server.

If the client receives 3 snapshots he can discard the ones that are behind his “playback” time (the 50 ms for instance).

On top of that extrapolation should generate local “snaps” when one from the server are missing. These ones are never replaced, even if they have a timestamp in front of the current “playback” time. This is because that could create hard snapping. An incoming snap will only be used if no local one has been generated.
The client should then adjust his “playback” time to further avoid extrapolation. As said this shouldn’t be too aggressive due to possible side effects.
You could use extrapolation to lower bandwidth and server resources by only sending every other snapshot to the client. The extrapolate would then fill in the gaps.

The client clock should never get out of sync with the server clock. It doesn’t need to be all that precise either. I think precision up to one ms is enough as you probably wouldn’t want to run any algorithm more often than that. What it should not is shift.

Brainstorming:
So we have the server that collects user info and combines them into snapshots and numbers them. (And of course other stuff).
At login the clocks are synchronized with the client … or not, don’t know how clock synchronization works over the net. That time we call “now time”.

The server himself needs time to receive data, process it and pack them into snaps. So the server also needs to have a somewhat have a buffer/leeway to do that. Depending on server power and calculation load this is set to something. We call that “truth time”. In order to do this the server gives himself some time by declaring that the snapshot that he is creating now is valid at some point in the future.
For instance. “time_truth”=“time_now”**+**5ms
As soon as he calculated that he sends it to the clients, may even be before the truth time arrived.
There’s also a paket interval. Lets say it is: “snap_interval”=10ms.

The clients also needs time to calculate and receive the snapshots. So in order to do that they playback the snapshots from the past. We call this “playback time”.
For instance: “time_playback”=“time_now”**-**50ms.
The client receives snapshots and adds them to its bank of snapshots if it’s not older than its “time_playback” and is not the next snapshot the playback is moving towards.

Example:
“time_now”= 1503575432068
Server Prepares snapshot Nr. 5 with “time_truth”=1503575432073
Server Sends packet at “time_now”=1503575432071

Client receives packet at “time_now”=1503575432090
Clients “time_playback”=1503575432040
Client has snapshot Nr.1 with “time_truth”=1503575432033
Client has “extrapolated” snapshot Nr.2 with “time_truth”=1503575432043
Client has snapshot Nr.3 with “time_truth”=1503575432053
Client has snapshot Nr.4 with “time_truth”=1503575432063
Client checks if truth time of snapshot Nr. 5 is older than the snapshot it’s is currently traveling to. Nr. 2.
(Edit: Actually just checking if 5 > 2 is doing the same thing.)
Client adds snapshot Nr. 5 with “time_truth”=1503575432073
Client continues playback and sends own snapshot versions to the server (shots fired, hits landed … etc)

Snapshots could be set up so they could have as much and as little information as needed with the rest being assembled/extrapolated by the server/client

I work with realtime computers by the way. :grin:

2 Likes

You’d be lucky to get 1ms accuracy. Between 5ms and 10ms is more likely.
With NTP 90% of clients have offsets less than 10ms. (ref)

You typoed:

1 Like