|Anonymous | Login | Signup for a new account||2023-02-07 17:05 PST|
|Main | My View | View Issues | Change Log | Roadmap|
|View Issue Details|
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0011619||Dwarf Fortress||Technical -- General||public||2020-09-08 16:49||2021-06-08 18:07|
|Platform||PC desktop||OS||Windows||OS Version||10 Pro x64 v2004|
|Target Version||Fixed in Version|
|Summary||0011619: Random crash due to unhandled exception|
|Description||The game crashes randomly and silently, w/o any error messages. Just - whoop! - found yourself at your desktop.|
|Steps To Reproduce||Load the save, unpause, wait for a few in-game days. That's random crash, after all.|
|Additional Information||Files related to the report:|
Using PeridexisErrant's Starter Pack 0.47.04-r07.
Changes made to default settings of the pack:
- pop and invader caps raised (although wasn't hit yet);
- traffic route cost altered to [0:1:5:15] (although I didn't designate traffic zones in the save yet);
- seed caps raised to 6000 (6k) per plant species and 3000000 (3M) per fortress;
- item stacking in container aggressiveness raised to 1000.
Everything is packed along with the save in zip folder and uploaded to the google drive folder specified above.
The world have been generated in this version of DF as well as of the Pack. Fresh install, fresh everything.
In attempts to track the issue and figure out exact reproduction steps, I tried to narrow time window via save-load while the game still not crashed.
The first strange thing I noticed, it's that since I started to save-load, I passed the day of crash that I hit every time if I load from seasonal autosave and just play on.
The second strange thing is that at some point the game crashed during loading of the save, but! it loaded the save without a problem after I relaunched the game.
After some further attempts to figure reproduction steps via in-game actions, I ragequit, loaded the save, attached MS Visual Studio debugger to game and unpaused.
Finally, MSVS caught unhandled exception 0xC0000005 at address 0x00007FF675EC6780, access violation, as process attempts to read at address 0x0.
Plus, it said that the code is located in the ucrtbase.dll file.
As I don't have neither symbol files nor source code, I saved dumps if it may help: one with heap and another without heap. Both in the google drive folder specified above.
|Tags||No tags attached.|
edited on: 2020-09-08 18:08
Could you please upload just the relevant save folder separately? It's in the data/save folder, e.g. Dwarf Fortress/data/save/region1.
Does this crash occur with all utilities disabled? In particular, try disabling DFHack (from the PyLNP launcher, if you're using it) and see if the crash happens again.
Also, is ucrtbase.dll the third-party library you're referring to? If so, it's not actually a third-party library; it's supporting code that many programs rely on, and is essentially part of the operating system.
Save folder only:
As for ucrtbase.dll file. Visual Studio specified just the file name, 'ucrtbase.dll', when it caught exception, not the full path. I checked DF's folder and found just the file with that name there. Now, after your words, I also checked C:\Windows and found the file at 'C:\Windows\System32\ucrtbase.dll'. Checked timestamps - DF's folder of the pack contains obsolete file (2015's) Downloaded vanilla DF - its own copy of obsolete version of this file is also there.
I'm not sure which one of the files should DF load at startup. Checked VS log - it says what debugger loads system's one when debugger attaches to DF.
As for further testing. I disabled DFhack via PyLNP launcher and tried again. Same exception, thread of the same file, slightly different address:
Exception thrown at 0x00007FF754E46780 in Dwarf Fortress.exe: 0xC0000005: Access violation reading location 0x0000000000000000.
Dump files have been uploaded here:
Oh, ok, I had a hard time figuring out exactly where ucrtbase.dll came from (as I'm not on Windows), but "crt" usually stands for "C runtime", so it's something that's provided by the compiler DF uses. In this case, DF distributes its own copy because any copies present on end-users' systems could be too old or too new to use with DF. DF is intended to use the one present in its own folder (so don't remove it), but it's something that DF needs at a low level, not something it was specifically written to use. I guess my point is that, barring any serious compiler bugs, ucrtbase.dll won't be the cause of any crashes, but DF bugs may end up triggering a crash in code within it.
Anyway, thanks for the save! I'll see if I can reproduce the crash. Sometimes these are system-specific, but having a save that crashes within a few days usually helps, so thanks for that.
> too new to use with DF
Well, this, theoretically, might be the case. According to this
if DF requests DLLs via just filename ('ucrtbase.dll' in this case), Windows will never pick DF's own copy, because system's copy is in use since startup: it is used by system services and therefore is always already loaded whenever DF is launched.
Plus, I remembered that there were no that frequent crashes until some date.. i.e. until some of Windows updates. Before the date, DF was able to run fortress for dozens of ingame years without any technical issues.
I'll google if there a way to hijack DLL search order to force it to use DF's own copies for DF and to look how it will work out.
edited on: 2020-09-09 20:38
Unfortunately I wasn't able to reproduce the issue after 3 attempts, running for around an in-game month each time. It might be Windows-specific, though.
I'm no expert on how the Windows C runtime works, but from what I know about how DFHack works, I would expect an incompatibility like what you're describing to cause DF to crash nearly immediately after startup. If you're able to load any other saves successfully (particularly saves at least as old/large as the crashing one), I think that's probably not the issue. I would certainly be interested to see what happens if you force DF to use its own ucrtbase.dll, though! Thanks for the information about the Windows updates.
Unfortunately, it's not possible to use local copy of Universal CRT in Windows 10, as stated here:
"On Windows 10, the Universal CRT in the system directory is always used, even if an application includes an application-local copy of the Universal CRT. It's true even when the local copy is newer, because the Universal CRT is a core operating system component on Windows 10."
Nor hacking to specify full path, nor making manifest, nor anything.
As for exception, I opened disassembly at the instruction, and, well, the code is in Dwarf Fortress.exe and it doesn't evaluate returned value (rax register), especially as it is a pointer.
But anyway, I googled what upper half of rax is zeroed if any instruction writes to eax, but I'm not sure how this can be the case if Toady don't use assembly inlines. I'm not familiar with disassembly navigation much enough to track all possible call routes and check if this is the case.
Seeing almost exactly the same thing on Windows 10. More than once I have just been dumped back to the desktop with no error message and nothing in the logs after hours of work, though it's getting more frequent. It seems to come up quite a bit when calling quicksave. Exact exception was "Exception thrown at 0x00007FF6C57A1DFA in Dwarf Fortress.exe: 0xC0000005: Access violation reading location 0x00000000CFF62E60." Full core dump from VC2019 at https://dffd.bay12games.com/file.php?id=15329, [^] stack trace:
Dwarf Fortress.exe!00007ff6c57a1dfa() Unknown
Dwarf Fortress.exe!00007ff6c57e4eab() Unknown
Dwarf Fortress.exe!00007ff6c57e674d() Unknown
Dwarf Fortress.exe!00007ff6c5a32a0b() Unknown
Dwarf Fortress.exe!00007ff6c5a56870() Unknown
Dwarf Fortress.exe!00007ff6c5e06a6c() Unknown
Dwarf Fortress.exe!00007ff6c5b5dfa5() Unknown
Dwarf Fortress.exe!00007ff6c5efcc5f() Unknown
Dwarf Fortress.exe!00007ff6c5ba97e6() Unknown
Dwarf Fortress.exe!00007ff6c5baa919() Unknown
Is there a way to use df-structures or similar to generate a partial PDB file for Dwarf Fortress.exe? It seems like if we can generate address to a lot of the variables and methods in memory, we could help a lot with debugging if we can create debug symbols.
The particular instruction from the disassembly that this crashed on was `mov eax, dword ptr [rcx]`, the values of the registers captured by VC were:
RCX contains the address that was trying to be read from, which is zeroed out in the upper half, which does seem to suggest that this has something to do with implicit register zeroing, though I'm not sure how this could have appeared.
|I set a breakpoint on this instruction, and this is definitely part of the save code, as it is called multiple times on every save. This truncation doesn't happen every time though, as I have seen runs where the value in RCX has its upper bytes set.|
|I'm going through the disassembly. The presence of TWBT in the stacktrace is making me think it might be a TWBT bug, but I can't be sure.|
|There is also quite a lot of pointer arithmetic going on here, so it's possible some value is getting stepped on.|
|Shirasik: can you reproduce this with TWBT disabled?|
edited on: 2020-12-06 06:13
I disabled TWBT and did eventually get a crash while playing, though it took a lot longer than usual. I was able to play for several hours and save many times, while before, it was getting to the point where I couldn't play for more than a few minutes without a crash. I'm wondering if the issue is a memory leak, although Dwarf Fortress was consistently using only about 2GB of RAM, my system has 32GB, and I was nowhere close to being out. The crash here was from a different code location than last time, though I have seen crashes from many locations. This crash looks like a call from a vtable, and the address it is attempting to call is invalid. Of note is that the memory address it was calling was not truncated as before, so that may or may not be the actual issue, though something may be causing memory corruption, as it always seems to happen with trying to dereference pointers recently loaded from memory. Dwarf Fortress does a lot of pointer arithmetic, and we could be seeing the result of a failed heap allocation that wasn't checked for. It also does a lot of looping over arrays that look like they could be susceptible to buffer overruns, so I'm not sure which is happening here. In this case, the particular error was "Exception thrown at 0x00007FF6C60B4DAD in Dwarf Fortress.exe: 0xC0000005: Access violation reading location 0xFFFFFFFFFFFFFFFF."
The full stack trace was:
> Dwarf Fortress.exe!00007ff6c60b4dad() Unknown
Dwarf Fortress.exe!00007ff6c60b7d55() Unknown
Dwarf Fortress.exe!00007ff6c606d5f8() Unknown
Dwarf Fortress.exe!00007ff6c5b5e254() Unknown
Dwarf Fortress.exe!00007ff6c5efcc5f() Unknown
Dwarf Fortress.exe!00007ff6c5ba97e6() Unknown
Dwarf Fortress.exe!00007ff6c5baa919() Unknown
ucrtbase.dll!thread_start<unsigned int (__cdecl*)(void *),1>() Unknown
Values of some local registers:
Disassembly of some of the relevant code:
00007FF6C60B4D8E 48 8B F1 mov rsi,rcx
00007FF6C60B4D91 44 8B EA mov r13d,edx
00007FF6C60B4D94 0F 1F 40 00 nop dword ptr [rax]
00007FF6C60B4D98 0F 1F 84 00 00 00 00 00 nop dword ptr [rax+rax]
00007FF6C60B4DA0 48 8B 06 mov rax,qword ptr [rsi]
00007FF6C60B4DA3 4E 8B 34 F8 mov r14,qword ptr [rax+r15*8]
00007FF6C60B4DA7 49 8B 06 mov rax,qword ptr [r14]
00007FF6C60B4DAA 49 8B CE mov rcx,r14
00007FF6C60B4DAD FF 90 E0 06 00 00 call qword ptr [rax+6E0h]
Full core dump at https://dffd.bay12games.com/file.php?id=15331 [^]
I'm using Visual Studio together with IDA and Hex-Rays to explore, disassemble, and decompile some of the code around these errors. Are there any good tools to generate debug symbols usable by either of these from what we know of the Dwarf Fortress memory map? I know codegen_c_hdr.pl in df_misc can generate a C header that is somewhat useful for the structures, though getting IDA to actually recognize the global variables and their types is still a slog. Do we have any tools that can improve this?
lethosor: yes, it happens in my case w/o TWBT
For the record, I suspect this happens because of mismatches in how manager and stockpile record code counting items, so when manager's order expects there will be enough available items and for any reason there wasn't - actual workshop order gets zero as a pointer to at least one of chosen items.
Suspition is based on the fact what if I take care to ensure there always will be quite a big room for item blocking (e.g. via hauling jobs using containers) in order conditions, then this sudden crashes happens no more.
|Are you able to prevent a crash by cancelling manager orders before it occurs? I'm not familiar with how the manager works, but the null pointer theory feels unlikely to me (but not impossible).|
edited on: 2021-05-30 10:46
Can't be sure if I able to prevent something that may not happen at all (how to check if anything actually was prevented?), but after asking about *that* random crashing with no visible connection to anything, people in local DF communities said what random silent CTDs happens as frequent as intensively player uses manager to automate production in fortress. Some people advised not to use manager at all, but few people shared the trick what if order conditions requires existence of few times more items than order actually needs, then CTDs may not happen at all. So, this is the clue about some item-related stuff.
Please don't cram multiple issues into a single bug report, and definitely don't add unrelated ones into an existing one.
The reason for this is to ensure that when the bug relating to a bug report is fixed, the report can be closed properly and archived (until something indicates there are still issues relating to the bug, at least). Any "extra" stuff will be lost (and investigators won't look at bug reports to find unrelated things even if the report is open).
Cancellation spam caused by concurrent container access is a known longstanding issue, with many people advocating very restricted use of containers. At least in the past there have been attempts to implement bag less seed stockpiles because of this issue, with varying degrees of success. Theoretically, increasing the number of crops used ought to spread the seed bag access over a larger number of bags, reducing the amount of cancellation spam caused by concurrent bag access (that's a work around action, not a fix, of course).
|PatrikLundell: edited the note according to your advice.|
I'm hesitant to classify issues based on random guesses of other people, especially if it's unknown whether their issues are related to this one. But if you are able to reproduce this crash (for instance, if you have a save where this crash tends to occur shortly after loading, or you are able to come up with a save that does so), it would be helpful for us to know if measures such as disabling manager orders are able to prevent the crash. If the crash isn't reproducible, it's difficult to know whether any preventative measures are helping.
In this case, I was unable to reproduce the crash on your save. However, if you are able to reproduce the crash consistently by just running the save from 0011619:0040719 for a couple in-game days, it would be very helpful to know whether e.g. removing all manager orders makes it run for significantly longer (multiple times) without crashing.
That's the problem, CTD seems completely random - it may be not days, but couple of weeks of even months, especially if I start to save and load fortress trying to get closer to CTD to provide meaningful save folder. Once I get CTD, I load the last save and 'freely' passing the date at which CTD had just happened a minute ago. Nevertheless, any of this works:
- removing of all orders in manager's interface;
- if orders all are general, then setting workshops to 'general work orders can't task this workshop' in workshop profiles helps as well;
- suspending workshop tasks added by order before they become active (i.e. before being picked by any dwarf) also do.
For each of the above, I played fortress for three years in one run, then decided it's enough, replaced save with one from backup and started from the same point.
However, I can't provide concrete reproduction steps - dwarves cancel jobs for some time (as they shall, if they can't gather items needed), but eventually CTD happens. Save-load without quitting or just quicksaving allows for longer run.
|2020-09-08 16:49||Shirasik||New Issue|
|2020-09-08 18:05||lethosor||Note Added: 0040718|
|2020-09-08 18:05||lethosor||Assigned To||=> lethosor|
|2020-09-08 18:05||lethosor||Status||new => needs feedback|
|2020-09-08 18:07||lethosor||Note Edited: 0040718||View Revisions|
|2020-09-08 18:08||lethosor||Note Edited: 0040718||View Revisions|
|2020-09-08 22:42||Shirasik||Note Added: 0040719|
|2020-09-08 22:42||Shirasik||Status||needs feedback => assigned|
|2020-09-09 17:22||lethosor||Note Added: 0040720|
|2020-09-09 17:22||lethosor||Summary||Random crash due to unhandled exception in third-party library => Random crash due to unhandled exception|
|2020-09-09 18:15||Shirasik||Note Added: 0040721|
|2020-09-09 20:38||lethosor||Note Added: 0040722|
|2020-09-09 20:38||lethosor||Assigned To||lethosor =>|
|2020-09-09 20:38||lethosor||Status||assigned => new|
|2020-09-09 20:38||lethosor||Note Edited: 0040722||View Revisions|
|2020-09-10 12:36||Shirasik||Note Added: 0040724|
|2020-09-12 17:30||Shirasik||Note Added: 0040728|
|2020-09-12 17:31||Shirasik||Note Edited: 0040728||View Revisions|
|2020-09-12 17:38||Shirasik||Note Deleted: 0040728|
|2020-12-04 00:38||TV4Fun||Note Added: 0040816|
|2020-12-04 00:44||TV4Fun||Note Added: 0040817|
|2020-12-04 02:05||TV4Fun||Note Added: 0040818|
|2020-12-04 05:37||TV4Fun||Note Added: 0040819|
|2020-12-04 05:53||TV4Fun||Note Added: 0040820|
|2020-12-05 08:25||lethosor||Note Added: 0040821|
|2020-12-05 08:25||lethosor||Assigned To||=> lethosor|
|2020-12-05 08:25||lethosor||Status||new => needs feedback|
|2020-12-06 01:50||TV4Fun||Note Added: 0040822|
|2020-12-06 06:13||TV4Fun||Note Edited: 0040822||View Revisions|
|2021-05-27 05:16||Shirasik||Note Added: 0041064|
|2021-05-27 05:16||Shirasik||Status||needs feedback => assigned|
|2021-05-27 13:23||lethosor||Note Added: 0041065|
|2021-05-27 13:23||lethosor||Status||assigned => needs feedback|
|2021-05-28 07:17||Shirasik||Note Added: 0041067|
|2021-05-28 07:17||Shirasik||Status||needs feedback => assigned|
|2021-05-28 07:19||Shirasik||Note Edited: 0041067||View Revisions|
|2021-05-30 01:27||PatrikLundell||Note Added: 0041069|
|2021-05-30 10:46||Shirasik||Note Edited: 0041067||View Revisions|
|2021-05-30 10:49||Shirasik||Note Added: 0041070|
|2021-05-31 07:20||Shirasik||Issue Monitored: Shirasik|
|2021-05-31 07:21||Shirasik||Issue End Monitor: Shirasik|
|2021-06-02 14:20||lethosor||Note Added: 0041071|
|2021-06-02 14:21||lethosor||Status||assigned => needs feedback|
|2021-06-08 18:07||Shirasik||Note Added: 0041080|
|2021-06-08 18:07||Shirasik||Status||needs feedback => assigned|
|Copyright © 2000 - 2010 MantisBT Group|