Re: [Wesnoth-dev] 1.12.0 release plans and questions

chris beck Fri, 04 Apr 2014 10:30:06 -0700

Don't get me wrong, I think the whiteboard is a great feature, and many
people use it and greatly appreciate it, esp. with the ability to show your
plans to your allies. I even had an experience once where the whiteboard
made it possible to play and win a 2v2 game with an ally who did not speak
any english, getting by with my broken spanish and using the whiteboard to
give context. So if there was some way we could fix it I think it would be
great. The only thing I want to change is that I think we should disable
the extra parts that have become buggy enough that most players don't use
them anyways. I would expect that there would be a small outcry if we
really just removed the feature completely.


I think that it's possible for us to to pare back the whiteboard as
described without someone necessarily becoming a de facto whiteboard
expert. But if we actually can't manage that, and no one volunteers to
develop the expertise and become a real maintainer, so that it's either
status quo or remove, then I would vote to remove.

---

With regards to the reference counting / possible memory corruption: The
unit_map does indeed have some reference counting built in, but that is
only for unit_map::iterator objects afaict. Many of the wesnoth modules
currently interact with the unit_map by using "extract", which extracts a
naked pointer to the unit and "transfers ownership" to the recipient. From
what I understood, the whiteboard in some circumstances takes pointers like
this and wraps them in boost shared pointers for its own use. But I guess
that this means we have two disjoint reference counting systems, so there
could potentially still be pointer problems between modules. E.g. what
happens if a unit is owned by the unit_map and being animated, but during
the animation it is transferred to the whiteboard / some other module... ?
Since animations don't use the reference counted pointer but rather a raw
pointer I think it may result in segfault if the whiteboard then destroys
the unit. (It's also worth noting that the unit_map has been failing its
unit tests regarding invalidating iterators that are supposed to be dead,
so this might also somehow be a contributing factor in some scenarios.)

A possible solution that I mulled over was to have different modules pass
unit_pods around (the class used internally by unit_map as a
reference-counted pointer struct) or trying to make some of them use the
unit_map iterator type rather than raw pointers to units that they do not
own. But finally I concluded that since I'm not a real C++ programmer,
anything I do here would most likely be flawed and need to be redone --
really IMO we should get a bright GSoC student who knows a lot about STL to
figure out a modern smart pointer system we can use and implement it.

Let me also say that I think the memory corruption may not have had to do
with the unit_map pointers, but rather the fake units owned by
game_display, which the whiteboard holds many of. (I didn't mention the
game_display in my earlier listing of what things can "own" units, that was
an oversight.) I really don't know for sure, but my recollection of seeing
all the units drawn strangely was pretty memorable, it happened a few times
before I stopped using whiteboard attacks, and I don't know what could
explain it other than some kind of memory corruption. Of course it's hardly
a proper bug report anyways.

Cheers,
Chris Beck



On Fri, Apr 4, 2014 at 11:14 AM, Gabriel Morin <gabrielmo...@gmail.com>wrote:

> Just chiming in to say that you have my blessing if you think the
> whiteboard needs to be disabled for the greater good. Don't treat it like
> some kind of sacred cow! I keep meaning to revisit it to enact Crab_'s
> master plan of merging whiteboard + deferred moves (or whatever they're
> called) + some AI simulation code together, but after a day job of coding I
> rarely want to stare at yet more code.
>
> Of course the ideal scenario would be that a maintainer steps up, but if
> no one cares enough, well I guess that's open-source democracy at work.
> Without a maintainer it's inevitably gonna get more buggy as time goes.
>
> Regarding the memory corruption though, I'm pretty sure I was not storing
> raw pointers to units anymore and was using unit IDs instead to get them
> back from the unit map. There might be a unit map bug. Also I remember the
> unit map already having some kind of reference counting built in, something
> to look into before refactoring anything.
>
>
> 2014-04-04 9:53 GMT-04:00 chris beck <beck...@gmail.com>:
>
> An addendum:
>>
>> 5.) Another thing that we should discuss seriously is what to do about
>> the whiteboard. It seems that right now we have no active dev who serves as
>> the whiteboard expert / maintainer, and there are many known / reported
>> bugs of the form "segfault" / "assertion failed" associated to using the
>> whiteboard. From my own experience playing wesnoth, I have personally
>> witnessed data corruption -- i.e. in version 1.10.6, I plan some actions,
>> then disable planning mode and attack some units, then suddenly all the
>> "fake" planned units have changed their unit types and become completely
>> different units. So that drawn on the screen may be my orcish grunt
>> standing in a village, with a red-arrow "planned action" to move to e.g a
>> grass hex, but drawn in that hex is a loyalist cavalryman sprite, seemingly
>> at random, and similarly for all other units planned movements. I have also
>> experienced assertion failures which immediately result in a crash while
>> executing a planned attack action.
>>
>> I have only personally experienced these things in networked mp, but that
>> is also the only time that I have used planning mode. A search on the bug
>> tracker / forums reveals many, many reported errors of this form. Most of
>> the players at this point seem to know better than to use the whiteboard to
>> plan anything other than moves if they use it at all. Unfortunately it
>> seems that almost none of these bug reports may be actionable or precisely
>> reproducible because of the nature of memory corruption problems.
>>
>> I looked into this in the source code once. I barely scratched the
>> surface, but it seems to me that the problem may be related to the problem
>> underlying some of the known animation segfaults, such as those reported by
>> mattsc. That is, graphics such as animations seem to be implemented by a
>> graphics object which holds a naked C pointer to a unit, but the way
>> wesnoth works, animations do not "own" units, instead units are owned by
>> the unit_map, the recall list, or lua... So if a unit being displayed is
>> somehow destroyed unexpectedly, immediate segfault / memory corruption
>> occurs. I believe that in the case of the whiteboard, memory corruption may
>> potentially occur whenever any unit on the map is killed or removed and
>> there exist planned actions.
>>
>> I guess that it would likely be a GSoC-level project to switch over to a
>> system where we use reference counted pointers to share units between
>> different modules where we currently use C pointers, and to then roll the
>> whiteboard / other systems over to this system. In the meantime, without
>> having an expert available, and without really understanding the problem,
>> and now reaching the end of beta, I think it would be appropriate to do
>> both of:
>>
>> 1.) Don't allow the user to attack while in planning mode.
>> 2.) Either don't allow the user to attack while there are any planned
>> actions, or cause attacking to clear all of the currently planned actions.
>>
>> Additionally we might consider
>>
>> 3.) Disabling planned attack / recruit / recall actions, so that only
>> planned moves are possible.
>> 4.) Disallowing the user to disable planning mode while there are planned
>> actions, or cause this to clear all of the currently planned actions.
>> 5.) Whether to do these things in both SP and MP or just MP. (Most likely
>> I think the problems affect both however.)
>>
>> I don't know for sure but I strongly believe that any of these things
>> would qualify as an "easy coding task." If for some reason we couldn't do
>> any of these things before releasing, then I would advocate disabling the
>> whiteboard entirely rather than releasing as it currently stands.
>>
>> Again I don't know what the real problems are but I think this may
>> significantly improve the situation. If you have insight into the situation
>> please share by all means.
>>
>> Cheers,
>> Chris Beck
>>
>>
>> On Thu, Apr 3, 2014 at 10:47 PM, chris beck <beck...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I would advocate 1.11.13 not being a RC1, and having at least one more
>>> round of beta testing. Here are the reasons:
>>>
>>>
>>>
>>> 1.) Right now, replays from the replay server *do not work*.
>>>
>>> Even the simplest scenario 1v1 scenario with no funny business results
>>> in a "Error: The file you have tried to load is corrupt. 'Mandatory WML
>>> child missing, yet untested for. Please Report.'" This is true in master as
>>> well as 1.11.9 -- I don't know when it became true, but it is no secret
>>> that server replays were quite buggy even in 1.10. I would like to have
>>> some time to review how we currently generate save game files and figure
>>> out a server side fix, and then to test this.
>>>
>>> In 1.10 all of "mp campaigns / mp reloads / mp replays" basically work
>>> in the simplest case, but if you do two of them at once you tend to get
>>> bugs. MP campaigns need to be reloaded in a certain way or there are bugs
>>> -- i.e. only the host can save, and players may not rejoin as observers and
>>> take control or controller types become corrupted. MP replays of reloaded
>>> games, esp. from the server, are generally bugged. I guess that many of
>>> these bugs were only detected once 1.10 was released, but anyways we should
>>> fix as many of we can. I'm not sure which we should consider as "blockers"
>>> but I would argue this:
>>>
>>> - Any bug that makes reloading or replaying *impossible* should be
>>> considered a blocker.
>>> - Server replays should work at least for "basic" scenarios and reloads
>>> of "basic" scenarios, or it is blocker. They should also work for mp
>>> campaigns, at least those that proceed without any "funny business". If
>>> there is some class of scenarios that we simply can't get it to work for,
>>> then we should consider disabling the saving of those replays (or mark them
>>> as hidden) rather than serve broken files to the users.
>>> - Regressions from 1.10 quality in any of these regards should be
>>> considered a blocker.
>>> - A few weeks ago, I discovered a bug that simply prevented any observer
>>> from joining a reloaded game, before or after it has started, despite
>>> observers being permitted and the observer icons seeming to say that it was
>>> okay to join. This was considered a blocker and fixed, but the fact is that
>>> this was a pretty crippling and easy to discover bug that I stumbled on,
>>> which was present on master for about 9 months without being reported. This
>>> tells me that there hasn't been very much serious testing of this sort of
>>> basic mp functionality on the 1.11 / 1.12 series. It seems to me that right
>>> now that you are likely to encounter many different bugs if you e.g. create
>>> a networked game with 3 players, pass some turns, have one disconnect and
>>> reassign to another player, rejoin the disconnected player, do this for a
>>> while, save and reload, continue, proceed to the next scenario of a
>>> campaign, ...
>>>
>>> The good news is that some of these have been fixed in the past month or
>>> so because of fixes to controller types in mp games. However this has led
>>> to:
>>>
>>>
>>>
>>> 2.) Recent changes to the controller type system have unfortunately had
>>> a subtle and unexpected effect on SP games. mattsc reports that some ai
>>> test scenarios that used to work fine now end immediately in defeat. I have
>>> looked into this; here is the background:
>>>
>>> - Prior to the changes there were controller types: human, ai, human_ai,
>>> network, network_ai, null.
>>> No one understood what "human_ai" was or how it was different from "ai",
>>> and appeared to give the same behavior, except that some commands used one
>>> rather than the other. Since it caused bugs and confusion esp. in mp code,
>>> we decided to purge "human_ai" by merging it with "ai". We (myself,
>>> Soliton, others on irc) think that this was the right decision.
>>> - However, it turns out that there was a key function which made a
>>> distinction between them, the "is_observer" function. This function used to
>>> return false if there were any "human" or "human_ai" sides, and true
>>> otherwise.
>>> - It turns out that "is_observer" is used in the "check_victory"
>>> function. So the controller type change has a possible impact even on SP
>>> scenarios this way -- in any scenarios with "victory_when_enemies_defeated
>>> = no" and no human controlled sides, only ai sides.
>>>
>>> The only scenario I know that is affected is mattsc's, but it is
>>> possible that there are mainline scenarios with cutscenes which don't work
>>> properly now, I simply don't know. That's not all -- in review, a fix to
>>> this function which would support mattsc's scenario was found to uncover a
>>> different bug: https://github.com/wesnoth/wesnoth/pull/133
>>>
>>> I have been thinking about what is the exact right way to fix this
>>> check_victory function -- I think the right way is to create a new boolean
>>> attribute of [side] named e.g. "fight_to_the_last" which prevents a side
>>> from being defeated until all of it's units are dead, to support mattsc and
>>> replace the spurious check to no_leader() used currently in the function.
>>> My initial thought was also that the check to "is_observer" may simply be
>>> spurious here -- is it not enough to check for "human" or "network" sides
>>> which are not defeated to determine whether to continue?
>>>
>>> I am now leaning against removing is_observer from this function because
>>> I worry that such a change would result in us no longer supporting creating
>>> AI-only battles through the mp_create dialog, or some other edge case
>>> becoming bad. Even if this could be fixed with a flag for scenario and a
>>> new checkbox or something, the goal is just to get things working with as
>>> little impact otherwise as possible.
>>>
>>> My assumption from working with the mp code has always been that
>>> "is_observer" is basically supposed to reflect whether we are an "MP
>>> observer". However it now seems that this function plays a role in sp
>>> games... and I'm just not sure how I should think about it / how the rest
>>> of the code currently thinks about it so it is hard to be sure of how to
>>> proceed, which is why I have taken my time.
>>>
>>> In irc I told AI0867 that I would make a new proposal here and test it.
>>> But if we make such a change, we should really make sure that none of these
>>> scenarios with "victory_when_enemies_defeated" is broken. I can skip
>>> through and test them myself, but realistically having not played all the
>>> campaigns I might not necessarily know if a cutscene has ended early for
>>> example. And it is very possible that one of these scenarios has become
>>> broken now.
>>>
>>>
>>>
>>> 3.) Does anyone know what the status of this is?
>>> https://gna.org/bugs/index.php?20895
>>>
>>> Does anyone have any plans to work on it? If it is indeed bugged and no
>>> one will commit to fixing it, should we consider dropping support for some
>>> of the map generator features in SP, rather than release with known
>>> segfaults associated to some of them?
>>>
>>>
>>>
>>> 4.) gfgtdf's work on rng's and syncrhonization
>>>
>>> gfgtdf recently pushed to master a major overhaul to the synchronization
>>> mechanism, basically following through on this forum post of his:
>>> http://forums.wesnoth.org/viewtopic.php?f=15&t=39203&p=558003#p558003
>>>
>>> As I understand he creates a simple class called "synced_context" which
>>> allows him to synchronize many fundamental events that were previously too
>>> dificult to sync in multiplayer, leading to much to the confusion of /
>>> problems for UMC authors. However, he also has to change some of the rng
>>> classes to do this -- I don't understand all the details. I believe that
>>> this ultimately fixes many OOS bugs, you will have to see the PR for
>>> details though.
>>>
>>> There are some other things in his pull request which were merged into
>>> master which are features and not bug fixes. But I think the basic idea of
>>> his pull request should be cherry-picked into 1.12, since it fixes some
>>> genuine bugs. However after that happens there should be serious testing in
>>> mp and sp to see that nothing has become subtly broken, hence why I suggest
>>> another beta.
>>>
>>>
>>>
>>> ---
>>>
>>> As for why there was not a lot of progress on the changelog -- I guess
>>> that there are just not many people who are trying to work on these things.
>>> Many of them require feedback and collaboration which makes everything take
>>> longer. And I guess also there has just been a lot of people working on
>>> 1.13 instead, there has certainly been a lot of recent movement on that
>>> changelog.
>>>
>>> There may also be some general ignorance about some of these bugs. Some
>>> of them esp. about replay server replays are so longstanding and well-known
>>> that I guess they are not reported anymore, or the reports are very old.
>>> Some of the reports about mp problems are cryptic or confusing esp. when
>>> players make assumptions about exactly what is broken, and a dev is
>>> required to ignore the title and description and just go through the steps
>>> to get a sense of what the problem is.
>>>
>>> Cheers,
>>> Chris "iceiceice / involution" Beck
>>>
>>> On Thu, Apr 3, 2014 at 7:04 PM, Simon Forsyth <alarantal...@gmail.com>wrote:
>>>
>>>> Maiklas3000 mentioned some obsolete WML warnings in EI in the forums.
>>>> I'd like to look into that.
>>>>
>>>>
>>>> On Thursday, April 3, 2014, Andrius Šilinskas <
>>>> silinskas.andr...@gmail.com> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I've got some bugs which I want to fix. I don't think that any of them
>>>>> are major, though.
>>>>>
>>>>> I'm planning to work on them since the next week, when my holidays
>>>>> starts.
>>>>>
>>>>> Andrius aka thunderstruck
>>>>>
>>>>>
>>>>> On 3 April 2014 14:25, Nils Kneuper <crazy-ivano...@gmx.net> wrote:
>>>>>
>>>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>>>> Hash: SHA1
>>>>>>
>>>>>> Hi folks!
>>>>>>
>>>>>> Initially I planned to get some work on the release announcement for
>>>>>> 1.12.0
>>>>>> done next weekend. Besides I also planned to get 1.11.13 out on April
>>>>>> 12th.
>>>>>> Looks like work prevents me from doing these Wesnoth things.
>>>>>>
>>>>>> Anyway, I got some questions for you:
>>>>>> 1) Looking at the changelog in the 1.12 branch I got to assume that
>>>>>> there is
>>>>>> not much to fix left (since almost nobody does anything). Is this
>>>>>> right or are
>>>>>> the issues just impossible hard to fix?
>>>>>> 2) If the assumption that there is not much to fix is correct: What
>>>>>> do you
>>>>>> think of having 1.11.13 named 1.12.0 RC1 and release it during the
>>>>>> easter weekend?
>>>>>> 3) In case you don't think we are there yet, please make sure to tell
>>>>>> me what
>>>>>> you think the blockers are. Also make sure that someone will actually
>>>>>> work on
>>>>>> fixing those first instead of concentrating on getting new features
>>>>>> implemented in master.
>>>>>>
>>>>>> For the next weeks this are the dates when I will not be reachable
>>>>>> for Wesnoth
>>>>>> stuff:
>>>>>> * April 6th until April 13th
>>>>>> * April 24th until April 30th
>>>>>>
>>>>>> So these are also the timeframes in which I will not get out a new
>>>>>> release. So
>>>>>> the next option for a new beta (or even RC?) release is the weekend
>>>>>> from April
>>>>>> 19th to 20th. If this is the RC, then I could get another RC or
>>>>>> 1.12.0 'final'
>>>>>> out on May 3rd.
>>>>>>
>>>>>> Cheers,
>>>>>> Nils Kneuper aka Ivanovic
>>>>>> -----BEGIN PGP SIGNATURE-----
>>>>>> Version: GnuPG v2.0.22 (GNU/Linux)
>>>>>>
>>>>>> iEYEARECAAYFAlM9YVIACgkQfFda9thizwW+dQCgnlgD5EE8Yb4rWbqmBorBpKUm
>>>>>> NAEAn1RImr3FOXeYZ7oO/3pyyX28N6rk
>>>>>> =lpf1
>>>>>> -----END PGP SIGNATURE-----
>>>>>>
>>>>>> _______________________________________________
>>>>>> Wesnoth-dev mailing list
>>>>>> Wesnoth-dev@gna.org
>>>>>> https://mail.gna.org/listinfo/wesnoth-dev
>>>>>>
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Wesnoth-dev mailing list
>>>> Wesnoth-dev@gna.org
>>>> https://mail.gna.org/listinfo/wesnoth-dev
>>>>
>>>>
>>>
>>
>> _______________________________________________
>> Wesnoth-dev mailing list
>> Wesnoth-dev@gna.org
>> https://mail.gna.org/listinfo/wesnoth-dev
>>
>>
>
> _______________________________________________
> Wesnoth-dev mailing list
> Wesnoth-dev@gna.org
> https://mail.gna.org/listinfo/wesnoth-dev
>
>

_______________________________________________
Wesnoth-dev mailing list
Wesnoth-dev@gna.org
https://mail.gna.org/listinfo/wesnoth-dev

Re: [Wesnoth-dev] 1.12.0 release plans and questions

Reply via email to