View Issue Details
ID | Project | Category | Date Submitted | Last Update | |
---|---|---|---|---|---|
0025690 | AI War 2 | Crash/Exception | Oct 27, 2021 7:48 pm | Nov 12, 2021 12:38 am | |
Reporter | Chris_McElligottPark | Assigned To | Chris_McElligottPark | ||
Status | resolved | Resolution | fixed | ||
Product Version | Beta 3.740 Code Panopticon | ||||
Fixed in Version | Beta 3.750 Smooth Simulator | ||||
Summary | 0025690: A bunch of inactive faction threads? And how to debug that. | ||||
Description | Asked by Tom, figured I'd write down the answer here: It's a problem I've been putting off a bit, because I can't think of a good way to debug it. The place where it SHOULD be debuggable is if you right-click the clock. There's a bunch of information in there, if you scroll down in particular, about each thread's status and why it thinks it has not run. I should clarify a few things if you're going to look at that one (which I am grateful if you do, but don't feel obligated). The background threads "marking themselves as run" and "actually running" are two different things. When you're paused, most of the background threads still go "hey, I'd be running about now, but nope, paused, so I mark myself as 'did what I needed to right now, which was nothing' and check back very soon (1 second or so)" When you're not paused, a thread is validly blocked for all sorts of reasons, but the two largest ones are: 1. It's not time yet. That thread's meant to run every 20 seconds, and it hasn't been that long since it last started its prior run (it times from start to start, not end to start). 2. Or, there are multiple factions that share some interlocking data (aka, multiple AIs, or a Zenith Trader and Zenith Devourer). In those cases, on a given cycle, first one goes, then it finishes and the next goes, and so on through the cycle. The cycle isn't done until they've all gone, but they can't go at once. Most factions are independent of one another, so most of them all run at once, with then a second pass for duplicates of whatever faction type, and a third pass if needed, and then probably done. Obviously if you have 10 AIs, it will take 10 passes, though. There's also a status of "I'm not allowed to run yet," which generally means that the game hasn't started yet, or it's under 3 seconds since the game started. All the factions spot you those first three seconds. A key point: the game is "running" during the lobby. But that's considered second zero, and also IsInSetupPhase or something like that. But the normal game flow is otherwise happening, and is not even paused. However, the setup status basically makes it so that nothing really happens in most of the code, and most ships also do not exist yet. I mention this because sometimes we get some funky data when coming out of the lobby, where if you were sitting there in the lobby for 5 minutes, it says it's been 5 seconds since it last run and freaks out right when you start. I thought I had fixed that one. The one that I see now is typically more sporadic, and often after a second load of a savegame, so I'm assuming some sort of pooled data is not being cleared. It's also possible that if something uses GameSecond to say when something can next run, that's a problem. If you are in savegame A and it's 10 minutes in, and then you exit and reload to 8 minutes in, then that stale data would say "you are not allowed to run for another 2 minutes, all background threads." I am very suspicious that the above one is what is happening. If you go back to minute 1 of a game after being 4 hours in, that's obviously an apocalyptic case. All that said, from what I can tell, everywhere I am actually tracking time on things should be based on ArcenTime, which is a float amount of time (1 is 1 second, 0.5 is half a second) since the program started. So those can never be stale, or too low, because the restart of the program would be required to get a lower value, and that of course clears any cache. So either there's some very funky bug, or I accidentally used GameSecond somewhere. GameSecond is the timer in the bottom left for a campaign, and is the amount of game-time since the campaign was started. This has a very loose relationship to realtime, assuming you ever save, load, pause, fast forward, or whatever. @Tom That's my big info dump on what I know for tracking that thing down. I can't really duplicate it reliably, which is doubly frustrating. | ||||
Tags | No tags attached. | ||||
related to | 0025784 | resolved | Chris_McElligottPark | All threads are idle |
|
I added a bit of debugging code to `AIWarExternalDeepProcessingCode/src/Sim/Planning/LongTerm/SpecialFactionPlanning.cs` and it seems (at least when I'm getting errors about everything not running), that when `GetNeedsToRun` is being called, that both `this.faction?.SpecialFactionData.InternalName` and `this.faction?.DeepInfo` are empty. (And thus it is returning false.) |
|
Thanks! * After attempting a fairly disastrous tasks-based replacement for our thread-based approach to the main background threads, I've instead started looking into improving the thread-based items. ** First issue: if there's a missing faction or deepinfo as it tries to run LRP, it will have an error (if it gets to that part of the code) and will also show the thread name appropriately. Thanks to tom.prince for the info on this avenue. I don't know if this is solved or not, it's hard to tell just yet. ** Second issue: It is possible for all of the faction threads to be unassigned to a faction. I don't yet know what causes this, but it seems to be repeat loads of various games. Daniexpert first reported it today, and I've seen it once so far. I will have to chase this further on Monday. *** Reproduction steps: load one savegame twice. Start a new game. Quit to main menu. Load savegame again. Boom. ** Third issue (solved): I never have been able to figure out why the threads seem to kick off so much more frequently than it seems like they should. That in and of itself seems to be a problem. It turns out that I was having these run at sim-speed, and I often play at 5x, so it was doing waaay too much on the background threads. Now it actually uses real time. This may also help with some theoretical issues with gametime-based "last kicked off" logic. ** Fourth issue: for some unknown reason, various non-faction threads sometimes get stuck. Executioncontext is the main one, but there are others. In these cases, Daniexpert was able to log that we get to the "THREAD Kicked next cycle" code. *** This... is significant, but I'm not quite sure of the nuances why. Essentially, the thread thinks it is done, but has not died or cleaned up. What I now have it doing is killing the thread if the thread has been in this state for at least 4 realtime seconds. I may tighten that down to a substantially smaller margin if it turns out to be messing with the simulation, but I'm not sure on that. *** Related to this, it looks like the Kill method had a bit of an oversight. If the thread was terminal, then it was not nulling it out and just let things go. Now it nulls it out, which makes the logic above do the proper thing. Or should. *** Speaking of, if we try to run a thread and it thinks it is complete but still exists, we now check if it's terminal and then null it out if it is. I suspect that when we kill threads by going to the main menu and back into the game, sometimes it was having issues related to being in the terminal state but never nulled. Why this never mattered until recently is a mystery, as this code has not changed much until now. ** Overall I expect things to be improved, and I'd be very interested in which cases still exist in the new code. |
|
Holy moly, finally fixed. Knock on wood! |
Date Modified | Username | Field | Change |
---|---|---|---|
Oct 27, 2021 7:48 pm | Chris_McElligottPark | New Issue | |
Oct 27, 2021 7:48 pm | Chris_McElligottPark | Status | new => assigned |
Oct 27, 2021 7:48 pm | Chris_McElligottPark | Assigned To | => Chris_McElligottPark |
Nov 1, 2021 1:52 am | tom.prince | Note Added: 0063103 | |
Nov 5, 2021 6:56 pm | Chris_McElligottPark | Relationship added | related to 0025784 |
Nov 5, 2021 9:03 pm | Chris_McElligottPark | Note Added: 0063237 | |
Nov 12, 2021 12:38 am | Chris_McElligottPark | Status | assigned => resolved |
Nov 12, 2021 12:38 am | Chris_McElligottPark | Resolution | open => fixed |
Nov 12, 2021 12:38 am | Chris_McElligottPark | Fixed in Version | => Beta 3.750 Smooth Simulator |
Nov 12, 2021 12:38 am | Chris_McElligottPark | Note Added: 0063268 |