View Issue Details

IDProjectCategoryLast Update
0023852AI War 2Bug - GameplayOct 6, 2020 4:29 pm
ReporterAndyroo Assigned ToChris_McElligottPark  
Status resolvedResolutionfixed 
Product Version2.603 Nanocaust And Civvies Strike Back 
Fixed in Version2.607 MP Stability, Exo Syncs, And Limbo Mods 
Summary0023852: MP saves breaking/stuck joining game
DescriptionClient is getting stuck on joining MP games after a few saves.
Client gets stuck forever on "sending profile name, expansion and mod status". Host just sees "waiting for client to build world from our data". Had it happen with and without mods on.

Attached are a few saves, one pair is a before/after a break.

Broken1 is simply broken, we haven't been able to recover it and the autosave before it is broken also.

City/City2 are a before/after of a break, when we had to reload to get a spire city working. City loads fine, City2 we are unable to play. We got the City save working by repeatedly building the city and reloading until it worked after a few goes, so cannot confirm it was the city that broke it.

The Broken1 save broke without any cities being built.
TagsNo tags attached.

Activities

Andyroo

Oct 1, 2020 4:20 pm

reporter  

city.save (493,820 bytes)
city.savemet (47 bytes)
city2.save (540,261 bytes)
city2.savemet (47 bytes)
Broken1.save (595,261 bytes)
broken1.savemet (47 bytes)

Chris_McElligottPark

Oct 1, 2020 4:36 pm

administrator   ~0058875

Sweet, thanks, hopefully I should be able to recreate the problem based on your saves.

On the client or server, do you see any exceptions logged in the PlayerData/ArcenDebugLogging.txt file? I would guess that there is a problem on the client, but it's hard to say.

Before I fix the actual bug here, I'd also like to fix the feedback to you so that it doesn't look like it's connecting when really it had an error.

Andyroo

Oct 1, 2020 5:43 pm

reporter   ~0058882

Host: 10/1/2020 8:59:52 PM STEAM NETWORKING: P2P connection from Mucosa connected.
10/1/2020 8:59:52 PM Client requested to join with the name 'Mucosa'
10/1/2020 8:59:52 PM Tell client 'Mucosa' that they are accepted.
10/1/2020 8:59:52 PM Send client 'Mucosa' the world data.
10/1/2020 9:00:17 PM Client Mucosa (connection 1) has closed their connection.

They all look like that from the host side of things, I will prod for the clients side of it. We have left it for several minutes before, to no change.

Andyroo

Oct 1, 2020 5:47 pm

reporter   ~0058883

Heres the clients debug log. It looks like most of it is errors trying to join, repeated forever. Hope it helps.
ArcenDebugLogMPsavebreak.txt (2,067,716 bytes)

Andyroo

Oct 2, 2020 12:42 pm

reporter   ~0058900

The save was still broken in .604, but after seeing there was a transport issue, I went back and tried taking all ships out of transports, saving, then rehosting. It has fixed it.

Chris_McElligottPark

Oct 2, 2020 6:38 pm

administrator   ~0058928

Not fixed yet, but these will help:

* PreCalculateAllRepairableTargetsForPlanetFaction now actually gives better error messages if something goes wrong.

* Data is probably very broken is this happens anyway, but it should no longer be possible for exceptions to happen during GetHasRelationshipWith() checks between two factions.

Thanks!

I should have an actual fix also in this release.

Chris_McElligottPark

Oct 2, 2020 9:25 pm

administrator   ~0058934

Bit more:

* It was not super common, but it was possible in multiplayer to have it silently fail to send a message from the host to the client for various reasons, and no error was shown anywhere and it just looked like they were connecting infinitely.
** This now actually shows an error message, so that we know what's up and people don't waste time just sitting there.

Chris_McElligottPark

Oct 2, 2020 10:14 pm

administrator   ~0058935

Okay, good grief:

* Our serialization buffers in general now do a more accurate and efficient job of expanding their capacity when they need to.

* Discovered that Steam networking is only willing to SEND up to 512kb of data at a time, thanks to a const k_cbMaxSteamNetworkingSocketsMessageSizeSend, even though it's willing to take in more data than that and can send messages that are larger than that if not for this one random constant.
** This is... really irritating, to be honest. We'll have to see how to best work around this, but this was not clearly documented at all and in fact seems to have gone against what we thought their networking protocol was.
** Our other two networking libraries don't have this limitation, so building in a workaround for just Steam (admittedly the majority case) is annoying either way.
** This is going to take us a bit to figure out how best to handle; this is literally a limitation purely based on a config setting that is hardcoded into the Steam API at the moment, so that's... not our favorite thing. We may simply be able to recompile the wrapper libraries, though, we're not sure.


So I can't finish this tonight, but I'll try to see what I can find over the weekend. This is purely a filesize thing. Blah!

Chris_McElligottPark

Oct 5, 2020 4:15 pm

administrator   ~0058984

Update:

* Put in changes to the Facepunch Steamworks wrapper to change k_cbMaxSteamNetworkingSocketsMessageSizeSend from 512 kb max send size to instead be 512 MB.
** As feared, this did not work. To override this limitation we would instead have to recompile the C++ client-side files for Steam itself, and we're not sure we could actually do that, since only the networking portion is open-source.
** We could ask Valve about this, but the nearest time a change would be in place that could affect this positively would be "whenever they next do a developer SDK update," which is a very different schedule from their public client update schedule.
** It's possible that we could also use some form of decompiler to alter the existing .dll (windows), .bundle (OSX), and .so (OSX and Linux) files and change the value of this one precompiled constant int32 in the version that we distribute with the game, but that risks messing up other things that are working fine right now. And it's probably a TOS violation for us to do that, though it would likely be a harmless one. Then again if someone started using that in a way that adversely affected Valve's networks, we'd be in major trouble, so best to punt on that entire line of thinking.
** So with that in mind, we're going to just have to work around the 512kb-per-message send limit on Steam as a hard limiter for now. This will take a bit more engineering to do, but it's not the end of the world. We used to have such things for the original AI War because network card drivers were such that this volume of data often swamped their internal buffers, so this is not a new problem. It's just unexpected in this context since we missed this obscure reference in the documentation of one of our three network libraries in use here.

Hopefully I can get a multi-segment approach working tomorrow, knock on wood.

Chris_McElligottPark

Oct 6, 2020 1:07 pm

administrator   ~0059024

Looking promising, though Steam's API is kind of pissing me off:

* Since Steam networking has a 512kb limit per message, kind of arbitrarily baked in from what we can tell, we're working around that by splitting any messages that are larger than (512kb minus 100 bytes) into multiple fragments for reassembly on the other side.
** The only known way to hit this right now is with a sufficiently large savegame, and so this should probably only happen from client to host. But technically it can happen in either direction.
** Our logic for doing the fragmentation and adding extra header information onto each fragment for reassembly on the other end seems to be successful, but we have not yet taken the steps to actually reassemble it and read in the interior message yet on the receiving end.
** Another challenge we ran into before that is that we get a mysterious "limit exceeded" result back when we are sending two large messages (aka the two fragments) too close behind one another. The documentation just has a cutesy "too much of a good thing" note as an explanation, so no idea what is actually happening there, or what the real limits are.
*** To work around THAT, we hang onto whatever the fragment is that it is failing to send, sleep the current thread for 10 milliseconds, and try again. We do this for up to 400 times if required, but so far in our testing it takes between 42 and 46 tries before it actually gives us a proper result.
*** It's important to stress that this would have been an incredibly roadblock if we were actually having during-gameplay code running into this more, but for now this seems to be a minor annoyance in some edge cases.
*** Next step is actually reconstituting the internal message on the receiving end from the fragments.

Chris_McElligottPark

Oct 6, 2020 4:29 pm

administrator   ~0059031

Boom, finally got it! :D

* Added a new setting to the Network section of the personal settings screen: Log Fragmented Messages
** Depending on the networking framework you are using, there may not even BE any fragmented messages (or rather, there are, but they are handled well below our code). In certain cases, such as anything over 512KB being sent on Steam, we have to do our own fragmentation.
** This fragmentation is pretty much limited to just the initial sending of exceptionally large savegames across the network. When this option is on, then any fragmented messages that are sent or reconstituted will be written into the PlayerData/NetworkFragmentedMessages folder. If there is some sort of problem, then by enabling this on all the involved computers and then comparing the files with the same names helps us figure out what is going on.

* The fragmented messages that are able to be sent now are properly reconstituted on the other end.
** Several messages from difference sources can be sent or received at the same time, if needed.
** This really should only apply for basically one message (initial world sync on Steam on some savegames) during a typical session, but it makes it so that those savegames once again finally work.

Thanks!!

Issue History

Date Modified Username Field Change
Oct 1, 2020 4:20 pm Andyroo New Issue
Oct 1, 2020 4:20 pm Andyroo File Added: city.save
Oct 1, 2020 4:20 pm Andyroo File Added: city.savemet
Oct 1, 2020 4:20 pm Andyroo File Added: city2.save
Oct 1, 2020 4:20 pm Andyroo File Added: city2.savemet
Oct 1, 2020 4:20 pm Andyroo File Added: Broken1.save
Oct 1, 2020 4:20 pm Andyroo File Added: broken1.savemet
Oct 1, 2020 4:36 pm Chris_McElligottPark Note Added: 0058875
Oct 1, 2020 5:43 pm Andyroo Note Added: 0058882
Oct 1, 2020 5:47 pm Andyroo File Added: ArcenDebugLogMPsavebreak.txt
Oct 1, 2020 5:47 pm Andyroo Note Added: 0058883
Oct 2, 2020 12:42 pm Andyroo Note Added: 0058900
Oct 2, 2020 6:38 pm Chris_McElligottPark Note Added: 0058928
Oct 2, 2020 9:25 pm Chris_McElligottPark Note Added: 0058934
Oct 2, 2020 10:14 pm Chris_McElligottPark Note Added: 0058935
Oct 2, 2020 10:16 pm Chris_McElligottPark Assigned To => Chris_McElligottPark
Oct 2, 2020 10:16 pm Chris_McElligottPark Status new => assigned
Oct 5, 2020 4:15 pm Chris_McElligottPark Note Added: 0058984
Oct 6, 2020 1:07 pm Chris_McElligottPark Note Added: 0059024
Oct 6, 2020 4:29 pm Chris_McElligottPark Status assigned => resolved
Oct 6, 2020 4:29 pm Chris_McElligottPark Resolution open => fixed
Oct 6, 2020 4:29 pm Chris_McElligottPark Fixed in Version => 2.607 MP Stability, Exo Syncs, And Limbo Mods
Oct 6, 2020 4:29 pm Chris_McElligottPark Note Added: 0059031