Notes |
|
(0039161)
|
PatrikLundell
|
2019-01-28 00:36
(edited on: 2019-01-28 00:37) |
|
I've tried the save twice, without crashing in either case, so I assume the issue requires a specific environment.
I'm on Windows 10.1 using the LNP r05.
In my first attempt I had DFHack enabled. The merchant ran back and forth like visitors do when trying to leave a fortress whose entrances are all locked up due to a siege so there's no path out. Eventually the merchant disconnected from the yak in a room in the fortress without any indication of a reason (no fighting or fear, as far as I could see) and eventually left through a cavern together with the swordsman. When I killed the game around the 10:th the yak remained in the fortress labeled as "Friendly" rather than "Merchant".
In my second run I'd disabled DFHack (and changed print mode from TwbT to 2D). In this run the merchant left the fortress on the surface together with the yak at around the 4:th. The swordsman turned to "Friendly", but left shortly thereafter.
If it's of interest, I can let the fortress run on its own past the troublesome end of the month and upload the progressed save to allow you to continue (if so, please specify a time you'd like the save to be progressed to).
|
|
|
(0039175)
|
Loci
|
2019-02-02 10:02
(edited on: 2019-02-02 10:05) |
|
Thank you for taking a look at the save. It crashes reliably for me using vanilla windows x32, vanilla linux x64, and vanilla windows x64 under wine. The only common thread I see is an AMD processor.
Given the nature of the failure, I suspect the crash will likely recur at the end of subsequent months, but if you are willing to post a save after the first of Timber I'll test that hypothesis.
|
|
|
|
http://dffd.bay12games.com/file.php?id=14227 [^] progressed to the second of Timber.
I've done nothing except looking at Announcements from time to time to check progress, so no actions have been taken (I haven't even checked on the erratic merchant this time).
My PC has an Intel processor, so that doesn't contradict the hypothesis of the issue being AMD related. |
|
|
(0039183)
|
risusinf
|
2019-02-03 08:35
(edited on: 2019-02-03 21:35) |
|
Does segfault on lin32 old Intel CPU and doesn't on win64 modern Intel CPU (noticed a moment of stuttering though), but that's not the point. Crash isn't time fixed and can be avoided. It doesn't happen when all squads are disbanded, so i believe the issue is the same as i reported in 0010880:0038940, namely military equip lists being completely fucked up. Illustration: https://i.imgur.com/VKTGasZ.png [^] (it is relatively easy to check any crashy save for this to be the reason)
Probably this should be prioritized due to its unholy ugliness and that the issue is likely to cover several tickets on the tracker, not to say it breaks gameplay, you can't have proper militia anymore.
Next day edit: I'm going to check few more unresolved crash reports to see how widespread the issue could be.
|
|
|
|
I just had a case of this crash myself a couple of seconds after quicksaving. Disbanding all squads did indeed allow the game to progress, while disbanding no single squad did. Checking specific Armor showed a bunch of written contents (all "item_bookst") at the bottom (and the one I checked wasn't in my fortress). The only category whose specific item list wasn't cluttered with "item_bookst" was the Weapon one, which doesn't seem to be broken (in my particular case). It can be noted that the junk items at the end differ between the equipment categories, but contain at least one item in common, and probably several ones. This item also appears multiple times in the lists (at least most of them). However, progressing the game for a few seconds and attempting to create a squad shows that the categories have not been repaired, but retain the junk items. Since my fortress was built on the idea of conquering goblin sites, this is really a fatal blow to it.
All the "books" I've looked at have had low item Ids, typically around 6-700. |
|
|
|
Few saves that follow the same pattern
All tests were carried out on intel i7 lin64 platform (through ssh) with vanilla DF 0.44.12
Read each paragraph line by line as:
- Ticket/comment ID
- Result of the first run
- Actions on the second run to avoid and reproduce crash
- Garbage in equip lists ( [m]->[e]->[A|L|H|G|B|S|W]->"specific ..." )
- Additional notes
0011014
crash time ~1min (win64 same CPU no crash)
reload and disband all squads before unpausing; no crash after 2min; quick crash after creating 1 unit squad with "footwear" equipped
Helm=Ivaklasod x1; Boots=Ivaklasod x4; Shield=Ivaklasod x1
Ivaklasod is a serpentine slab, first item in the legends mode's Codices menu
(Garbage items come and go with time, compare https://i.imgur.com/VKTGasZ.png [^] which was made on the first run before unpausing the game)
0011004
crash time ~2min (win64 same CPU no crash)
reload and disband all squads before unpausing; no crash after 5min; quick crash after creating 1 unit squad with "armor" equipped
Armor=Måmgozôm îbmat Lisid x3; Boots=Måmgozôm îbmat Lisid x100, ≡large smoky quartz≡ x2, Soshornalthish x1
Måmgozôm îbmat Lisid is a sterling silver slab, first item in the legends mode's Codices menu
0010880:0038940 (win64 same CPU does crash)
crash time <1min
reload and disband all squads before unpausing; no crash after 1min; quick crash after creating 1 unit squad with "handwear" equipped
Legs=Sesiwerima Obenirifi x1, The World Without The Wind x1; Gloves=Sesiwerima Obenirifi x2, Classic Fikod Gearedfocused x1, Life With Beakers x1
Sesiwerima Obenirifi is a microcline slab, first item in the legends mode's Codices menu
(this one is mine, crashes the same way on lin32 old Intel CPU platform)
0010880:0038892
crash time <2min (win64 same CPU no crash)
reload and disband all squads before unpausing; no crash after 5min; quick crash after creating 1 unit squad with "headwear" equipped
Helm=Slömodoxxo x1; Gloves=Slömodoxxo x2
Slömodoxxo is an hornblende slab, second item in the legends mode's Codices menu
0010880
crash time <5min (win64 same CPU no crash)
reload and disband all squads before unpausing; no crash after 10min; quick crash after trying to check "specific armor" list
Armor=Thafenelara x1
Thafenelara is a rhyolite slab, first item in the legends mode's Codices menu
0010868
crash time <1min (win64 same CPU no crash)
reload and disband all squads before unpausing; no crash after 1min; quick crash after creating 1 unit squad with "armor" equipped
Armor=Okoshfeb x2; Legs=Okoshfeb x2; Helm=Okoshfeb x3; Gloves=Okoshfeb x1; Shield=Okoshfeb x4; Weapon=Okoshfeb x2
Okoshfeb is a native gold slab, first item in the legends mode's Codices menu
Timings and the crashes itself (which might not happen at all) depend on CPU performance and architecture (and god knows what else, OS for sure)
All crashes are SEGFAULT
The issue may be responsible for multiple raid-related reports, i haven't checked those yet though (tested saves has no "travelling" dwarves), and if so this bug grows even bigger in its terrible awfulness (no, i cannot approve that Windows seemingly tends to sweep under the rag flawed code) |
|
|
(0039187)
|
Loci
|
2019-02-05 13:43
|
|
@PatrikLundell: Thank you for uploading the advanced save! I was able to run it without crash for over a year, so loading and resaving a few days later somehow fixed the corruption, at least temporarily.
@risusinf: Thank you for the detailed analysis! This issue is likely to be one of the first few bug-fixes after the upcoming release, and your efforts will help Toady identify and correct the problem. |
|
|
|
More abominations
0010975
crash time ~1min
reload and disband all squads before unpausing; no crash after 2min; quick crash after trying to check "specific gloves" list
NO visual garbage in equip lists
0010579 might be the same, but it crashed only once in multiple attempts
0010911
crash time <1min
reload and disband all squads before unpausing (including one with travelling dwarves); no crash after 2min; no crash after creating 1 unit squad with "armor" equipped, then adding "legwear", then "helm", but eventually crashed after adding "handwear"
Armor=For The Love Of The Individual x15, For The Love Of The Fortress x1; Legs=For The Love Of The Individual x7; Helm=For The Love Of The Individual x7; Gloves=An Offering To The Truth x1; Boots=For The Love Of The Individual x15, Taremimustteling x2; Shield=For The Love Of The Individual x6
Taremimustteling is a native copper slab, first item in the legends mode's Codices menu
0010894
crash ~2min, soon after a squad returns from mission
disbanding squads except the returning one still crashes, while disbanding it too right on arrival keeps the game running; quick crash after creating 1 unit squad with "armor" equipped
Armor=Asolum âtrid Vetek x2; Legs=Asolum âtrid Vetek x5; Helm=Asolum âtrid Vetek x5; Gloves=Asolum âtrid Vetek x2; Boots=Asolum âtrid Vetek x10; Shield=Asolum âtrid Vetek x5; Weapon=Asolum âtrid Vetek x1
Asolum âtrid Vetek is a diorite slab, first item in the legends mode's Codices menu
0010851
crash time <1min
after disbanding squads game goes past, and goblins arrive; couldn't reproduce crash by adding equip categories for 1 unit squad
Armor=Slatsutosnak x1; Legs=Slatsutosnak x1; Helm=Slatsutosnak x1; Gloves=For The Love Of The Tower x1, Thoughts On The Elf x1; Shield=Slatsutosnak x2; Weapon=Slatsutosnak x2
Slatsutosnak is bituminous coal slab, first item in the legends mode's Codices menu
That would be all for my crusade against evil, but i guess there is a huge ton of saves affected, as the most popular Windows platform just keeps it low profile, and sometimes the impact is not too harsh |
|
|
(0039193)
|
Gregamesh
|
2019-02-06 22:49
(edited on: 2019-02-07 11:06) |
|
Going to add to this because I realized that the last issue I posted is actually another example of equipment list corruption:
All are run on DF 0.44.12 on Mac OS High Sierra, version 10.13.6. I could try again with an updated OS, but I doubt it will change things.
Original report of mine: 0011025
Crashes in about 1 min (2 in-game days, exactly)
Disbanding all squads does not result in an immediate crash.
Corrupted armors are as follows:
Armor=Snogongstekug x1, Unookal x1; Legs=no corruption; Helm=Unookal x1, Inquires on the Center x1; Gloves=Snogongstekug x2, Unookal x1; Boots=Snogongstekug x1, Unookal x1; Shields=no corruption; Weapons=no corruption.
Unookul is the first entry in Codices, etc; it is an orthoclase slab.
Snogongstekug is the second entry; it is a chalk slab.
A save created by retiring the fort at this point and unretiring does not have any armor list corruption. I have not tested if armor corruption will eventually develop in that version of the save.
By saving repeatedly on the date when a segfault would occur, I was able to push the save until 773-8-11. Advanced version of the save can be downloaded here: http://dffd.bay12games.com/file.php?id=14236. [^] At this point, saving on or after 8-12 results in the following error:
[DFHack]# /Users/{my directory}/Desktop/df_osx/dfhack: line 15: 5033 Abort trap: 6 DYLD_INSERT_LIBRARIES=./hack/libdfhack.dylib ./dwarfort.exe "$@"
Letting the game run instead, it crashes around 8-17 with the following error:
[DFHack]# /Users/{my directory}/Desktop/df_osx/dfhack: line 15: 4930 Segmentation fault: 11 DYLD_INSERT_LIBRARIES=./hack/libdfhack.dylib ./dwarfort.exe "$@“
Note: the 4-digit number changes between each iteration of the crash, but the rest of the message remains the same.
Disbanding all squads but the eighth (it is leaving to go on a raid, and the commander is off-site) allows normal saving to occur after the 12th. Squads can be recreated after this point and saved normally. Squads were recreated using a premed equipment template with parts from all categories. The game still segfaults around 8-17.
The corrupted armors in this version of the save are:
Armor=Snogongstekug x1, Unookal x1; Legs=no corruption; Helm=Unookal x1, Inquires on the Center x1; Gloves=Snogongstekug x4, Unookal x2; Boots=Snogongstekug x1, Unookal x1; Shields=no corruption; Weapons=no corruption.
|
|
|
(0039207)
|
risusinf
|
2019-02-11 22:56
(edited on: 2019-02-13 03:15) |
|
Regarding possible obstructions of the raid system
0010951
Resaving postpones crashing, and equip lists are heavily corrupted (behaviour similar to 0011014). This resave (made immediately after the conquest of Mighthorror) -- http://dffd.bay12games.com/file.php?id=14246 [^] -- reliably crashes for me in under 3min (44.12@lin64, win64 doesn't seem to crash at all). Military layout is complicated. One squad is out for a mission with two recruits falling behind, one of them has no military equipment and is about to pick it up. Disbanding all other squads (including one with travelling dwarves from the previous mission) makes it crash even faster (<1.5min). Removing the guy without equipment from the squad doesn't change anything, removing both recruits allows the game to show message about successfull conquest and then crash shortly. After that my best guess was that the travelling squads can actually trigger the equipment bug, or there is something else, so i kept digging. Weirdly enough, i discovered that terminaton of all of the yaks at once allows to progress for 5min and maybe more. However, killing off yaks one by one didn't make the crash go away. More to this, guy from The Eradicators, position 1 and another one from The Crosshoes, position 2 are assigned to equip two "missing item" each; and all members of The Golden Legion are assigned to wear socks, which is weird. Related or not, but it's a total mess.
Also there is a small inconsistency in viewing departed squads on the military screen: [p]ositions shows members names with (travelling) postfix while [e]quip instead of that prints VACANT/AVAILABLE.
============
EDIT:
After getting familiar with 0010369 i tend to assume that all these memory corruptions are actually just a further development of some general flaw in the raiding system itself, as that issue wasn't solved but patched out by checking and removing duplicate items.
|
|
|
(0039229)
|
Darxus
|
2019-02-20 16:09
|
|
Gregamesh said mine might be related: 0011035
I load it, save immediately, and it crashes, every time. They mentioned I have armor corruption similar to stuff mentioned here. On my dwarves out on a raid. |
|
|
(0039342)
|
Galap
|
2019-04-20 23:19
|
|
The workaround I've found that works (albeit tediously) so far is to assign every soldier specific items instead of anything generic, and have it replace clothing, exact matches. Basically If you tell them exactly what to equip, they will never try to put on the bugged items. |
|
|
|
After a recent small discovery at 0011085, i got back to revise the save from 0010975, which stands out for having no usual symptoms of 0011014 corruption (slabs, books, etc), and indeed there was more to that. "Weapon" category has (large iron right gauntlet) somewhere in the middle for some reason. Still unclear what is going on, but catching the moment when equipment goes wild now seems to be theoretically possible, but goddamn i am reluctant to try. |
|
|
(0039356)
|
risusinf
|
2019-05-07 09:03
(edited on: 2019-05-11 20:37) |
|
Started a new fort the other day just to investigate the problem further. Kept it really simple and minimalistic with low population cap, so any kind of invasion never was triggered, nothing happened but peaceful economic growth, squad training and equipping and some occasional traiding. When the squad was set up, i decided to roll them out for the first time, and checked equipment lists before that just to make a reference point.
https://i.imgur.com/1WBSEDr.png [^]
Few quick notes before i'm too tired:
1. Reds are obviously squad equipment
2. Civilian clothing worn or claimed by fort citizens are not present
3. (+steel mail shirt+) probably was traided by accident
4. Most of the remaining items are "large" but:
a) i have never bought anything "large"
b) i have no human residents, all petitions were denied
c) there are few human visitors, and it seems they are wearing those items
d) there are few dwarven visitors, but their clothing is not present
The situation is explosive if i am getting it right. There might be something else as i see some discrepancies in handwear/footwear sections, but i'm leaving it for the next time. Thoughts?
============
"Large" items indeed belong to four human visitors, the rest is just unowned tattered clothing which is expected to be in the lists. Two dwarven visitors don't cause anything unusual. Also, when squad leaves the map those red lines (i.e. assigned equipment) disappear in every category (except "boots", and that's because of my overlook -- dwarves apparently can't wear high boots with leggings, and never put boots on), and they never reappear even after squad is back home.
============
Human visitors: all their items are in equip lists and not forbidden. When manually forbidden they don't get listed.
Human merchants: everything in their inventory initially forbidden, and will be available in equip lists when unforbidden.
Dwarven visitors: all their items are not forbidden, and only military equipment gets listed.
============
It's not like you send out a squad and it breaks down. There are some conditions, but those i can't figure out. Couple of sidenotes:
* Merchants' goods show up in the lists after "merchants have embarked on their journey" announcement
* (unrelated) Dumped into magma items don't get destroyed until they reach the bottom of magma sea
* (large iron right gauntlet) from 0010975 that somehow counts as "weapon" can be found in one of the human traiding wagons.When forbidden it disappears from equip lists, but the game crashes regardless.
* In my previous fort (0010880:0038940) i found two human bard with completely empty inventories. Their extermination changes nothing.
* Regarding the situation with high boots, the human merc who came in with both high boots and leggings equipped, succesfully replaced all of his equipment with the new one i made for him, except for his old high boots, which he refuses to change.
So the bottom line as i see it at the moment is that there is a combination of two issues. Firstly, the filters for equip lists are quite loose, they allow too much items that shouldn't be there. Secondly, somehow traveler units get corrupted (0010939 et al, 0010369 -- units offloading broken?).
|
|
|
|
faced this bug too after one and only raid. Some investigation in exported legends stated that item that appeared in every equipment lists is world's first artifact #0 - some slab, created by demon. Maybe memory structures/indexes/boundaries/whatnot starts to overlap from there? |
|
|
(0039398)
|
hyndis
|
2019-06-20 13:14
|
|
I'm also seeing this. The game is 100% stable until I start sending squads off to raid, then once the squad gets back its crash city. Instant crash to desktop. The game just dies immediately and randomly, but also so frequently its unplayable.
Disbanding my military squads and rebuilding them seems to fix the issue. I can play stable and without problems after that. The cause and fix of this crash bug is pretty much 100% reproducible.
Is there any solution to this other than simply don't send any dwarves off map?
Can I change the uniform once they get home? Immediately switch out the uniform to something else? Will sending naked dwarves with a uniform that replaces all items fix the issue? Dwarves out on raids only seem to use combat stats, not weapons, so equipment might not matter at all. Or do squads sent out need to be one time things? Make a squad, send it out, and then upon return immediately disband it only to make a new squad?
How can I use the world map system in fortress mode without crashing? What should I do as a workaround? |
|
|
|
All we know is that travelling (not only for squads, might affect fort visitors as well) somehow corrupts equip lists (and visitors equipment in that case), and after that happened it is unsalvageable (bugged visitors could be just exterminated though). Ability to rebuild squads might be platform dependant. On linux game dies rather quickly if you do so. |
|
|
(0039405)
|
hyndis
|
2019-06-30 01:44
(edited on: 2019-06-30 01:45) |
|
Modding out clothing and setting up military uniforms so they replace the existing outfit and accepting exact matches only seems to have resolved this for me. I also set the armor type's material. So, mithril breastplate, mithril greaves, mithril gauntlets, etc.
I suppose this might also be because I added a new metal to the game; mithril. Only dwarves can make this metal. This means that all mithril helms are all exactly the same size. No large or small mithril helms will ever be generated. Therefore, by specifying mithril helm in the uniform screen there will never be an unwearable helm that a dwarf tries to equip.
By removing clothing and specifying material type and requiring exact matches with the uniform replacing clothes this has fixed the problem. My squads leave on raids and return without any crashes happening anymore. Zero crashes since I did this.
I'm still seeing some oddities where dwarves get very attached to artifact mugs and goblets. Once a dwarf picks up an artifact mug or goblet they never put it down regardless of military uniforms. They even use this artifact mug in combat, smacking goblins over the head with their named granite mug. No idea how to fix that one but at least it doesn't appear to crash the game. The mug is equipped in addition to any uniforms. A dwarf with a mug will use both their mug and battleaxe in combat as if they're dual wielding.
|
|
|
|
|
|
|
I took another crack at this before the recent release, but haven't made progress yet (aside from closing a few unrelated crashes having to do with broken visitors.) Are these slabs and things visible in windows? I can't get them to show up when I disband the various squads, and then set them to have equipment 'footwear' etc., and they aren't there on load. |
|
|
(0039690)
|
risusinf
|
2020-01-29 01:38
(edited on: 2020-01-29 03:04) |
|
Slabs and stuff should be visible immediately after loading with no additional actions taken, they can be in any military equipment category, sometimes in just one, sometimes in few, but always at the bottom of the list. Like this: https://imgur.com/VKTGasZ [^] I've checked all those tickets both on windows and linux, so they should be there.
|
|
|
|
Ah, cool, I see them now. That is very, very messed up, ha ha ha. The month end is full of various admin tasks, but I'll take a look at this when I get through with that. Preliminary log shows that the slabs/water stacks exist in the fort's unassigned shoe list somehow, or whatever corrupt vestiges of whatever they are. It'll be a good starting point and hopefully I can make some progress. |
|
|
|
Hmm, it all points toward an item being deleted prematurely (and the then-corrupted data being maintained in the equipment lists), but it doesn't yet appear to be anything as simple as a forgotten delete or double push. The symptoms aren't giving me more than this, so I'm going to go for some more logging, and then try a few of the other equipment bugs that may cause this incidentally (odd assignment behaviors and duplication errors.) |
|
|
(0040086)
|
Darxus
|
2020-02-16 19:34
|
|
|
|
(0040087)
|
risusinf
|
2020-02-16 22:34
(edited on: 2020-02-16 22:44) |
|
47.03 changelog:
> There's been some work on the raid crash/corruption problem, though I suspect that is ongoing - hopefully the frequency is lessened.
Previous devlog:
> Hopefully took a chunk out of the raid crashes (there was a large problem with the post-raid equipment manifests), but there are definitely still other issues there.
|
|
|
|
I've just tested this in 47.03 and the bug still persists and crashes the game. |
|
|
(0040184)
|
Orkel
|
2020-02-24 11:37
|
|
Please provide a save @Sethatos if possible. |
|
|
|
I will. I’m just going to cue it up closer to the crash and then submit to the file depot. |
|
|
|
I'm also still getting this bug in 47.03. My shield list is corrupted; there's a repeating name at the bottom of the list, which I'm guessing is the first artifact on the list in legends mode.
http://dffd.bay12games.com/file.php?id=14864 [^]
I had several sieges and a ton of visitors come and go, but I never did any raids whatsoever. Tons of dwarves were also expelled, and several came back (to be expelled again). I'm guessing the returning citizens is what caused it. |
|
|
|
Here's my save as well:
http://dffd.bay12games.com/file.php?id=14867 [^]
It should crash in about 10 minutes or so. If you go to the military menu as well and try to equip "specific armour" you'll notice that a pair of greaves comes up first instead of a breastplate. |
|
|
(0040380)
|
hyndis
|
2020-03-17 00:04
|
|
I'm still seeing this in 47.03
In my fortress it seems my dwarves are attempting to wear books as armor: https://i.imgur.com/efG98Qs.png [^]
The same book shows up multiple times for different parts of armor.
My game is now extremely unstable to the point of being unplayable even with seasonal autosave. It is crashing constantly. |
|
|
|
Just to check, that's actually a world created in 47.03, right? Not one which was already corrupted from 47.02 and earlier? |
|
|
(0040385)
|
hyndis
|
2020-03-17 10:58
|
|
Yes, I made the new world and started the new fortress all in 47.03. I never transfer saves between versions. |
|
|
|
Noticed something potentially useful on a new save: A dwarf was in the middle of hauling some wood when I sent the raid order, and when he came back, "willow logs" were listed in the equipment screen under armor.
Whatever random non-equipment items they're hauling when they go out seem to get interpreted as equipment items and stored with the rest of them. |
|
|
|
So, potentially, always ensuring your squad are active and training before sending them out on a raid could minimize equipment corruption? |
|
|
|
Potentially. I've not had corruption since I started ensuring they weren't carrying random things, but that's still anecdotal.
If it is the cause, one would also have to make sure expelled people aren't carrying anything. I've had corrupted saves from forts without any raids being carried out at all but with tons of expelled citizens (who occasionally return to be expelled again). |
|
|
|
It is still here. Got this in 47.04. World was created in this version, fortress 25 years old or so. The corruption is classic: there is a legendary golden slab at the bottom of the armor list. |
|
|
|
I believe I have the same issue. On linux with dfhack the game consistently crashes after few seconds after pause (next month tick). There is a cinnabar slab on the bottom of armor list in squads equipments.
DFHack `tweak craft-age-wear` script seems somehow make the crash happen faster:
https://github.com/DFHack/dfhack/issues/1677 [^] |
|
|
|
I've been working on a script on the DFHack side (https://github.com/DFHack/dfhack/issues/1678 [^]) to address this, which should be in an upcoming release of DFHack. So far, there appear to be two distinct issues with the equipment vectors, usually in one or more of the "unassigned" vectors - Toady's description of these is at http://www.bay12forums.com/smf/index.php?topic=169696.msg8195533#msg8195533 [^]
a) Sometimes, items in a vector have the wrong type - in one extreme case, I've seen over 50 slabs present in the unassigned shoes vector
b) Sometimes, pointers in a vector point to things that are not items - I suspect that these are items that were recently freed, but not properly removed from the list. These are somewhat easy to detect just by looking at the pointers without dereferencing them, because they do not appear in the global vector of all items.
Some general notes:
- (b) seems to occur sometimes when units (e.g. merchants or military dwarves on a raid) leave the map.
- Both (a) and (b) can cause a crash - i.e. I've seen saves where only one issue is present, DF crashes, and removing the problematic items from the equipment vectors prevents the crash
- Sometimes, (a) alone does not crash, but (b) occurs later unless (a) is addressed early. Either addressing (a) early or addressing (a) and (b) before the crash prevents the crash.
By far the fastest reproduction I've come across is https://dffd.bay12games.com/file.php?id=14109 [^] - see https://github.com/DFHack/dfhack/issues/1678#issuecomment-732669641 [^] for a more detailed analysis. |
|
|
|
Do we know if (a) or (b) happens first? I've taken a look at it again and tried to reproduce issues in the vectors and can't get anything to happen. Once the vector is corrupted the first time, everything that happens afterwards is no longer relevant as far as fixing the bug is concerned (though of course the crashes are the worst symptom of the problem.)
Sadly, crash saves aren't going to help at this point. I need to catch the vector being corrupted or else rewrite the whole system. |
|
|
(0040870)
|
lethosor
|
2021-01-23 10:39
(edited on: 2021-01-23 10:40) |
|
I'm not entirely sure that either causes the other - sometimes only one of them is detected by the script, but not always the same one. (I regret not linking specific saves to my points above, but that's probably less useful if saves with the issue present on load don't help track down the issue.)
One that might be useful: https://dffd.bay12games.com/file.php?id=15264 [^] (from https://github.com/DFHack/dfhack/issues/1678#issuecomment-714384568 [^]) is a case where (b) is not present on load, but occurs as soon as the second merchant wagon exits the map, which results in a crash fairly soon after. I don't recall seeing a save like this for (a), unfortunately.
|
|
|
|
It would actually be good if (b) happens first, I think - just a simple failure-to-remove someplace would be nice (of course I've checked the usual suspects there already.) I'll take a look at the save! |
|
|
|
What about a sanity check of things written into the vectors as a way to at least either detect corrupt items being added (and don't add them if corrupted, but generate a report instead), or point the finger at items not being removed properly if no reports are generated, but crashes still happen?
It might be added to 0.47.05 as a safeguard/debug output even if one or more culprits have been found (as the problem has been addressed several times before, so there may always be a "last one" lingering). |
|
|
(0040878)
|
Toady One
|
2021-01-25 12:58
(edited on: 2021-01-25 19:13) |
|
The problem seems to be that the corruption isn't from the add/remove. The corruption seems like it is happening afterward, to an item, anywhere in the program.
For instance, in the latest save linked by lethosor above, the unassigned armor vector is already corrupted, before anybody leaves. It is neither (a) nor (b), but maybe something closer the the root cause - the id numbers of the unassigned armor items are out of order. The first several are ascending 60000s, but there's a 239xxx that got in there - this breaks binary search, and causes things to fall apart when an item is deleted (since the binary search won't remove it but the ptr is still deleted, causing (b).)
The problem is, I've been over the inserts several times, and I don't think it's just putting in something out of order. The more insidious possibility is that the item pointed to is being overwritten/changed after it is added properly, so in the example it used to be a ~60000 id number item, but the id number itself was changed to ~239000, breaking that vector (and every other vector).
This wouldn't quickly blow things up, because on save/load, the main item vectors are self-repairing, since it rebuilds them in order, rather than assuming they are in order. The unassigned (and assigned and unmanifested) item vectors assume that they are in proper order and just reload that way. That at least I can change.
But it doesn't get us any closer to what's actually going on, which seems like a pretty nasty corruption bug, anywhere items can be found. If I'm really fortunate, it could just be some add/remove to the unassigned vector I missed, but I've been over it enough times I'm nearly convinced it's something worse.
Still, if the overall idea is to move away more from these pointer vectors where I can, and repair them when I can, it'll be slightly more stable. But hopefully at some point the root corruption can be found. I'm not sure if it helps/is possible over in script land, but checking the unassigned vector for id ordering errors and when they occur would get us closer, I think. Maybe it's on an add/remove, or maybe somewhere else.
edit: I suppose we could also get id numbers out of order with a (b) (bad pointer w/ gibberish for id number) -> save -> load -> item with that id number in the bad pointer slot. In which case a bad item deletion at any point could still be a culprit, though I've checked them all and haven't found anything that looks good. With the upcoming patch, this pathway will reduce from giving us id numbers out of order to giving us just a random item in the vector, but in good id order (which we already see w/ the slabs etc., so it might not present as a new symptom.) Also, if the gibberish id number isn't held by an actual item, there'd be no effect, since it already handles id number misses on load in 0.47.04 (and earlier.)
|
|
|
|
Ah, thanks for the details. I didn't realize these item vectors were also sorted by ID. DFHack can certainly check for that, although it sounds like that wouldn't be necessary in 0.47.05.
For a second I was worried that a faulty DFHack script could have caused that, but DFHack only recently added names to these particular vectors, and I would be surprised if any tools (other than the "fix" script I mentioned above) are modifying them. There is certainly potential for utilities to cause corruption, but given the existence of 0010868, and the fact that this seems to be specific to the equipment vectors in some way, I think that would be unlikely here. I will certainly keep an eye out, in any case. |
|
|
|
A related bug and possibly the root cause of this bug is that when units return from a mission, the units in slot 2-10 of a squad can get randomly "unassigned". They still show as assigned in the military screen. However, in dwarf therapist, they show as not being in the squad anymore. Their behaviour is also as though they are not in the squad anymore. They will take off their equipment and put on civilian clothes and will not respond to squad orders. When this happens, you start getting equipment corruption and eventually crashes.
Since this bug only happens to slot 2-10 of a squad and not to the militia captains, I tried making them all militia captains and had tens of 1 man squads. After tens of raids each with tens of 1 man squads, I have yet to see any equipment corruption or crashes. |
|
|
|
While you are enumerating/traversing a data structure, the same data structure is (accidentally) modified, I'm wondering if the data corruptions like this bug could happen.
Let's see a simple example with a vector/array:
std::vector items;
void process_items() {
size_t n = items.size();
for (size_t i=0; i<n; i++) {
.
. // do something with items[i]
.
blahblah(); // an innocent-looking function call
.
. // do something with items[i]
.
}
}
.
.
.
void blahblah() {
static int i;
if (++i % 100 == 99) {
items.push_front(new_item); // std::vector doesn't actually have push_front
}
}
Do you see a problem in the above code? When "blahblah()" (sometimes) insert an element into "items", "i" and "n" in "process_items()" are invalidated; they no longer reflect the current state of "items." Both functions in isolation looks correct. This is of course the simplest example; finding errors in the DF code is a lot more difficult.
Toady One's impression so far "The problem seems to be that the corruption isn't from the add/remove. The corruption seems like it is happening afterward, to an item, anywhere in the program." (http://www.bay12games.com/dwarves/mantisbt/view.php?id=11014#c40878 [^]), "a technical bug" (http://www.bay12games.com/dwarves/mantisbt/view.php?id=10369#c38250 [^]), seems to suggest this kind of race condition is happening somewhere in the code. Note a corruption is not caused by inserts in the example; it happens after elements are added.
I think Toady One already know this possibility given what I quoted above. I'm writing here in case this is something new. |
|