Dwarf Fortress Bug Tracker - Dwarf Fortress
View Issue Details
0010061Dwarf FortressTechnical -- Generalpublic2016-11-02 11:472019-02-12 12:32
Solra Bizna 
lethosor 
normalcrashalways
resolvedduplicate 
LinuxDebianStretch
0.43.05 
 
0010061: This save is seconds away from crashing DF
I'm running DF on Debian Linux. A thousand or so ticks after loading this save, the game crashes.
1. Load save.
2. Wait. DF will crash soon after the yak is slaughtered, if not before.
Save is here: http://dffd.bay12games.com/file.php?id=12542 [^]

64-bit, PRINT_MODE 2D: Xlib abort with a multithreading-related error message... unless I'm running DF through gdb, in which case there's a floating point exception
64-bit, dfstream: Floating point exception
32-bit, PRINT_MODE 2D: Floating point exception

Changing Z-levels seems to slightly alter the timing of the crash.

All of the floating point exceptions take place with a backtrace similar to this:

#0 0x08c749a1 in ?? ()
0000001 0x08d87946 in ?? ()
0000002 0x0897d591 in ?? ()
0000003 0x0897dad5 in ?? ()
0000004 0x083f93e2 in ?? ()
0000005 0xf7b29ea1 in interfacest::loop() ()
   from /home/sbizna/df_linux32/libs/libgraphics.so
0000006 0x08665d4f in mainloop() ()
0000007 0xf7b0cb92 in enablerst::async_loop() ()
   from /home/sbizna/df_linux32/libs/libgraphics.so
0000008 0xf7b0cf8d in call_loop(void*) ()
   from /home/sbizna/df_linux32/libs/libgraphics.so
0000009 0xf7efb155 in ?? () from /usr/lib/i386-linux-gnu/libSDL-1.2.so.0
0000010 0xf7f3f048 in ?? () from /usr/lib/i386-linux-gnu/libSDL-1.2.so.0
0000011 0xf735e2da in start_thread () from /lib/i386-linux-gnu/libpthread.so.0
0000012 0xf781691e in clone () from /lib/i386-linux-gnu/libc.so.6
0.44.09, 0.44.12
duplicate of 0008410resolved lethosor Crash due to zero-size weasel 
has duplicate 0010859resolved lethosor Constant Crashes 
Issue History
2016-11-02 11:47Solra BiznaNew Issue
2016-11-03 12:59Solra BiznaNote Added: 0036028
2018-04-12 09:22lethosorNote Added: 0038154
2018-04-12 09:22lethosorAssigned To => lethosor
2018-04-12 09:22lethosorStatusnew => confirmed
2018-04-12 09:22lethosorTag Attached: 0.44.09
2018-04-13 09:57HuntthetrollIssue Monitored: Huntthetroll
2018-04-18 08:54lethosorNote Edited: 0038154bug_revision_view_page.php?bugnote_id=0038154#r15524
2018-08-08 09:38lethosorRelationship addedhas duplicate 0010859
2018-08-08 09:39lethosorNote Added: 0038709
2018-08-08 09:39lethosorTag Attached: 0.44.12
2019-02-06 00:04risusinfNote Added: 0039190
2019-02-07 02:19risusinfNote Deleted: 0039190
2019-02-07 02:19risusinfNote Added: 0039194
2019-02-10 21:36risusinfNote Edited: 0039194bug_revision_view_page.php?bugnote_id=0039194#r15975
2019-02-12 12:27lethosorNote Added: 0039208
2019-02-12 12:31lethosorNote Edited: 0039208bug_revision_view_page.php?bugnote_id=0039208#r15979
2019-02-12 12:32lethosorRelationship addedduplicate of 0008410
2019-02-12 12:32lethosorStatusconfirmed => resolved
2019-02-12 12:32lethosorResolutionopen => duplicate
2019-02-24 18:42HuntthetrollIssue End Monitor: Huntthetroll

Notes
(0036028)
Solra Bizna   
2016-11-03 12:59   
With the help of dwarf_fortress_unfuck I did a little investigating. The exceptions occur within interfacest::loop()'s call to currentscreen->logic(). I hacked in a SIGFPE handler that prints the type of error that occurred (for diagnostic purposes), and throws a catchable exception, then wrapped the currentscreen->logic() call in a try/catch block for that exception. With this in place, if currentscreen->logic() triggers a SIGFPE, it is simply skipped until the next time the loop runs.

From this, I learned two things:

1. As I suspected, the SIGFPE is being caused by an integer divide by zero. (The si_code is FPE_INTDIV.)
2. With this hack in place, instead of crashing, the game... "freezes". The interface is responsive, and it seems to believe it's still processing ticks, but creatures stop moving. Days still seem to pass normally, and at random intervals the game "unfreezes" for a few dozen ticks.

Interestingly, if I stop blocking the river (by pulling the northmost lever in Asob's office), fluid flow continues during the "freeze".

The outpost liason is visiting the fort. I suspected he might be the cause of the problem, so I ordered my crummy militia to bludgeon him to death. This didn't prevent the divide by zero from eventually occurring, and _did_ cause a loyalty cascade.

Deleting all of my work orders in the new manager didn't change anything either.

If I just power through the "freezes", the SIGFPEs eventually stop happening.
(0038154)
lethosor   
2018-04-12 09:22   
(edited on: 2018-04-18 08:54)
Confirmed in vanilla 0.44.09 on OS X.
For what it's worth, it appears to crash consistently at 0x0000000100c966e1 (at least 3 times now), although that's probably not very useful.
This was brought to my attention by https://github.com/DFHack/dfhack/issues/1257 [^] , which may be the same issue as this.

(0038709)
lethosor   
2018-08-08 09:39   
Save from 0.44.12, 0010859: http://dffd.bay12games.com/file.php?id=13950 [^]
(0039194)
risusinf   
2019-02-07 02:19   
(edited on: 2019-02-10 21:36)
Both saves from OP and from the comment above stop crashing after
[DFHACK]# exterminate weasel
which is very similar to 0008410. It says something about zero body size, how do i check that?

Also see 0010253

(0039208)
lethosor   
2019-02-12 12:27   
(edited on: 2019-02-12 12:31)
Highlighting the weasel and running "lua ~unit.body.size_info" in DFHack prints
size_cur               	 = 0
size_base              	 = 1
area_cur               	 = 0
area_base              	 = 1
length_cur             	 = 0
length_base            	 = 21


Killing the weasel with DFHack stops the crash. From running "for _,u in ipairs(world.units.all) do if u.body.size_info.size_cur == 0 then print(u.id) end end" in Lua, this is the only zero-sized unit.

Thanks for investigating! I'll close this as a duplicate of 0008410.