Question about to-be-closed methods

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: Question about to-be-closed methods

Dibyendu Majumdar
On Wed, 22 Jul 2020 at 00:48, Andrew Gierth <[hidden email]> wrote:

> That is nothing at all what I am referring to. I'm referring to the
> database engine getting a malloc failure at some arbitrary point in the
> backend, which could be at any point during query parsing, planning, or
> execution, and recovering from it with nothing worse than an "out of
> memory" error returned to the client. (In particular, no disconnection
> of clients, no processes exiting, no forced recovery of the db.) The
> only pre-allocation of memory is a small reserve for error handling
> purposes.
>

Also just curious - when you say recovering - does it mean abandoning
the request? It would appear so from what you say above?

Regards
Dibyendu
Reply | Threaded
Open this post in threaded view
|

Re: Question about to-be-closed methods

Philippe Verdy-2
DB engines have rollback facilities: if something turns wrong and must be aborted, the transaction is not committed. The process can even be interrupted abruptly, the database will rollback automatically, so there should be no integrity failure.

Applications in Lua (or any language) can also implement their own transactional model, in order to implement a recovery mechanism after a failure (even a very strong failure where the process crashes, or the whole OS crashes, or the PC suddenly looses power, or is abruptly reseted).

Transaction recovery is a good thing to implement where data/state integrity is needed (this recovery may be done "offline", prior to make the service fully active again, and as long it is running, the service may offer a temporary "degraded working mode", during which data may be displayed but not modifiable, with a notice that what is displayed may be out of sync).

Transactional mechanisms are not just for database engines, even if most applications depend on a database to implement this safely (because db engines have made extensive tests on their recovery process, to minimize their "offline" recovery time).

For example in games, you'll want to implement the recovery mechanisms only in the database managing the coordination between multiple players. For actions made by a single player, that player has to wait for the completion, before he can play online with other players, and the application may place the user in the game in a "safe" state, where the other players won't attack him or get significant advantages because that player is forced offline: the application could hide that player temporarily, and if other players depend on the active presence on that player (e.g: a chess game, or a soccer simulation where there's the need of a fair number of active players in each team), the game may be suspended temporarily for all within that game, without interrupting other unrelated competitions, and all players will be informed that the game is temporarily suspended for technical reasons (without much details) and an estimated time (time needed to recover, to be estimated by the game developers or by site admins) before they can continue the party again (in some case, the gamer will restart only if enough players want to continue it, and did not abandon their place to offer it to other possible candidate competitors)

If you don't implement any recovery model or transaction model in your app (the recovery may be partly automated, the rest being handled manually by some admin deciding what to do if this does not require him lot of work: the semi-automatic recovery should analyse the situation and help sorting the cases that need to be treated manually, with enough significant data to make informed decision), there's no easy way to recover: this is a fatal crash, all that was done in the application can only be archived eventually, or just discarded if its partial results are unusable.

Such unrecoverable app is just developed for optimistic conditions where errors cannot happen: this may be good for initial design (for basic evaluation of ideas, on a small known set of devices and a know set of users or data), but not kept as is for the final release or for the long term usage. To develop recoverable applications, the first thing to do is to develop unit tests with good enough coverage of test cases. Good apps are developed in a modular way where each unit has its coverage tests. Once the code is covered, you can develop the recovery mechanisms more easily (possibly by adding transactional models where needed with concurrent processes/threads/coroutines/users).

But it's not easy to test the transactional model without first developing the coverage unit tests for them: that's why database engines are used (even if they are only used to track transactions and store little or no data needed for the application itself, in which case the cost of these databases is very small, and there are good alternatives, such as transaction engines, that don't store anything or just maintain a safe log of events that the application will be able to reverse itself). today, there's a plethora of fast transactional engines that are easily integrable.

Individual transactions should also be small and designed that that they have minimal interdependence: if the application needs to perform a very complex and long task, you should question yourself if you can't split the work to do into multiple subtasks that can run in arbitrary order, sequentially or in parallel without depending on other subtasks: if one of them has be be cancelled/rollbacked, the losses will be minimal, the recovery will be faster. This is not specific to the language used. It's just a question of application and data-model design. And you can design your task divisions into several layers if there's interdependance for combined results and split each layer separately with their own subtasks using their own transaction/recovery model.


Le mer. 22 juil. 2020 à 02:04, Dibyendu Majumdar <[hidden email]> a écrit :
On Wed, 22 Jul 2020 at 00:48, Andrew Gierth <[hidden email]> wrote:

> That is nothing at all what I am referring to. I'm referring to the
> database engine getting a malloc failure at some arbitrary point in the
> backend, which could be at any point during query parsing, planning, or
> execution, and recovering from it with nothing worse than an "out of
> memory" error returned to the client. (In particular, no disconnection
> of clients, no processes exiting, no forced recovery of the db.) The
> only pre-allocation of memory is a small reserve for error handling
> purposes.
>

Also just curious - when you say recovering - does it mean abandoning
the request? It would appear so from what you say above?

Regards
Dibyendu
Reply | Threaded
Open this post in threaded view
|

Re: Question about to-be-closed methods

Gé Weijers
In reply to this post by Dibyendu Majumdar
On Tue, Jul 21, 2020 at 2:56 PM Dibyendu Majumdar
<[hidden email]> wrote:

>
> On Tue, 21 Jul 2020 at 22:46, Viacheslav Usov <[hidden email]> wrote:
> >
> > On Tue, Jul 21, 2020 at 10:58 PM Dibyendu Majumdar
> > <[hidden email]> wrote:
> >
> > > There was a talk several
> > > years ago by a Google engineer that said essentially there is no point
> > > trying to recover after a memory failure. Fail fast is often a better
> > > approach - because trying to recover in that scenario could cause more
> > > damage because of further failures.
> >
> > Either the interpretation is too naive, or the original statement is
> > nonsensical.
> >
>
> For reference
> https://youtu.be/NOCElcMcFik
> At 38m past


The talk was about C++ code, if you get an out-of-memory issue in C++
there's typically not a whole lot you can do to continue a program in
a meaningful way, because there is no pool of unused but unreclaimed
storage you could go and free up. Running the Lua GC may well free up
enough memory, especially if you have a lot of objects with __gc
metamethods, and you don't adjust the garbage collector's settings to
collect faster.

--

Reply | Threaded
Open this post in threaded view
|

Re: Question about to-be-closed methods

Dibyendu Majumdar
In reply to this post by Dibyendu Majumdar
On Wed, 22 Jul 2020 at 00:26, Dibyendu Majumdar <[hidden email]> wrote:

>
> On Tue, 21 Jul 2020 at 19:55, Dibyendu Majumdar <[hidden email]> wrote:
> >
> > On Tue, 21 Jul 2020 at 17:58, Roberto Ierusalimschy
> > <[hidden email]> wrote:
>
> > > The upvalue is removed from the open list only after the call. If
> > > there is an error, that sequence is interrupted, and the upvalue
> > > is not removed from the list. It would be easy to remove it
> > > before the call, if we preferred not to do the call again.
> > >
> >
> > Thank you - I think you mean this commit:
> > https://github.com/lua/lua/commit/c220b0a5d099372e58e517b9f13eaa7bb0bec45c
> > ?
> >
> > I did try reverting the commit but it caused some failure in the
> > tests. But that was just my initial look - I will look deeper.
> >
>
> If I just move the following lines to where they used to be prior to
> above commit:
>
>     if (uv->tbc && status != NOCLOSINGMETH) {
>       /* must run closing method, which may change the stack */
>       ptrdiff_t levelrel = savestack(L, level);
>       status = callclosemth(L, uplevel(uv), status);
>       level = restorestack(L, levelrel);
>     }
>
> It crashes with memory error when running the tests in local. (I am
> currently testing on Windows 10, but will also test on Linux later).
> The crash occurs in the first test after printing "testing errors in __close".
> So I am unsure what else needs to change...
> I would expect test to fail rather than crash.
>

It seems I also need the other two lines of change in that commit...

But it still crashes on Windows.
So I switched to Linux.

Going back to the original test case:

  local x = 0
  local y = 0
  co = coroutine.wrap(function ()
    local xx <close> = func2close(function () y = y + 1; error("YYY") end)
    local xv <close> = func2close(function () x = x + 1; error("XXX") end)
      coroutine.yield(100)
      return 200
  end)
  assert(co() == 100); assert(x == 0)
  local st, msg = pcall(co)
  print(x)
  print(y)

Now x is 1 and y is 0. Which means that the one of the close methods
wasn't called.

So it seems to me that the commit was to fix this issue - but a
side-effect is that the first __close() method gets called twice. Am I
right?

Regards
Dibyendu
12