Heap memory issues for Embedded Systems

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Heap memory issues for Embedded Systems

Glenn edgar
Hi everyone

I have recently incorporated lua into a cable modem settop box.  The effort 
was very successful in demonstrating the usefulness of lua as a test and 
integration environment.  Also, lua applications have the ability of being 
downloaded from the cable head end.

The main problem with the lua environment was the dynamic memory management.  
Dynamic memory for the lua environment was taken from a fixed private heap of 
around 100k - 200k bytes.  The private heap which was used was taken from the 
back of the K&R C book.  Each memory block had a 16 byte overhead.  

Before I looked into redesigning the memory interface and possibly lua's 
interface to dynamic memory,  I wondered if there were any better ways all 
ready out there or better ways incorporated into up comming version of lua.

The problems can be summarized as follows:

1.  The garbage collector thresholds did not function correctly in a fixed 
heap size.  The script writers had to manually force garbage collection.

2.  The utilization of the heap was around 50%.  This implied that there were 
a significantly small objects being allocated???  This is a problem with C++ 
and special heaps are designed to solve this problem.

3.  For soft real time problems, the lua scripts were a successful solutions.  
Being greedy, I would like to extend the lua scripts into more real time 
areas such as 60Hz type applications.  Web postings seem to indicate that the 
game guys are possible interested in the same area.  Are there any garbage 
collectors now or currently planned that will allow lua to operate in that 
area?

Glenn Edgar

Reply | Threaded
Open this post in threaded view
|

Re: Heap memory issues for Embedded Systems

David Given
On Sunday 03 July 2005 02:00, [hidden email] wrote:
[...]
> The private heap which was used
> was taken from the back of the K&R C book.  Each memory block had a 16 byte
> overhead.

Unless those 16 bytes are mostly used for debugging, you're being ripped off, 
I'm afraid. For an allocated block, the overhead should be exactly four bytes 
--- the size of the block, and depending on your application you might not 
need that (if the app keeps track of the size of blocks itself, for example). 
I don't know the K&R algorithm but you can probably do better.

[...]
> 2.  The utilization of the heap was around 50%.  This implied that there
> were a significantly small objects being allocated???  This is a problem
> with C++ and special heaps are designed to solve this problem.

Having a better allocator might help here, as it definitely sounds like you 
have a memory fragmentation problem. I'd suggest doing a brief search of the 
literature, and then using BSD's libc allocator --- the license should be 
acceptable and the algorithm should be good.

[...]
> Are
> there any garbage collectors now or currently planned that will allow lua
> to operate in that area?

IIRC (and I possibly don't) you *cannot* run a hard realtime system with a 
garbage collector. The problem is that there is no (short) upper limit on how 
long it takes to do a garbage collection and since you can potentially run 
out of memory at any point, your worst-case latencies are going to suck.

That said, there are tricks you can do: partial collecting, incremental 
collecting, background collecting, all that kind of thing. However, they do 
not *solve* the problem, only ameliorate it. If you have a fast enough 
processor, you can get away with it... most of the time. If you require 
*truly* hard real-time, with low latencies, I suspect you're doomed if you 
use Lua.

-- 
"Curses! Foiled by the chilled dairy treats of righteousness!" --- Earthworm 
Jim (evil)

Attachment: pgpouDVsgap8I.pgp
Description: PGP signature

Reply | Threaded
Open this post in threaded view
|

Re: Heap memory issues for Embedded Systems

Klaus Ripke
On Tue, Jul 05, 2005 at 09:51:14PM +0100, David Given wrote:
> IIRC (and I possibly don't) you *cannot* run a hard realtime system with a 
> garbage collector. The problem is that there is no (short) upper limit on how 
> long it takes to do a garbage collection and since you can potentially run 
> out of memory at any point, your worst-case latencies are going to suck.
hum, guess GC need not be that unpredictable.
As "real time" always is about requests/events,
you clearly must not leave garbage from one request
lying yround to spoil the next one.
But when doing a full GC "after" every request (i.e. at the
end of the request, as far as total timing is concerned),
you should find a perfectly predictable environment for
every event, including the time it takes to clean up, no?

> *truly* hard real-time, with low latencies, I suspect you're doomed if you 
> use Lua.
recently I saw "realtime for Java" mentioned.
This is a joke, since java can not really control
it's GC due to MT issues.
But I guess Lua can.


hand

Reply | Threaded
Open this post in threaded view
|

Re: Heap memory issues for Embedded Systems

Ben Sunshine-Hill
On 7/5/05, Klaus Ripke <[hidden email]> wrote:
> recently I saw "realtime for Java" mentioned.
> This is a joke, since java can not really control
> it's GC due to MT issues.
> But I guess Lua can.

The hard-realtime approaches to Java I've seen do some very
interesting things to maintain usability in the face of full GC not
being possible. Basically, you provide the VM with certain invariants
regarding the lifetime and accessibility of an object, and create
objects within pools, references between which are subject to very
restrictive semantics.

I don't think these same approaches would work well with Lua (or
perhaps I just don't want to see Lua uglified with them ;-).

Ben


Reply | Threaded
Open this post in threaded view
|

Re: Heap memory issues for Embedded Systems

Adam D. Moss
In reply to this post by Glenn edgar
[hidden email] wrote:
1. The garbage collector thresholds did not function correctly in a fixed heap size. The script writers had to manually force garbage collection.

2. The utilization of the heap was around 50%. This implied that there were a significantly small objects being allocated??? This is a problem with C++ and special heaps are designed to solve this problem.

3. For soft real time problems, the lua scripts were a successful solutions. Being greedy, I would like to extend the lua scripts into more real time areas such as 60Hz type applications. Web postings seem to indicate that the game guys are possible interested in the same area. Are there any garbage collectors now or currently planned that will allow lua to operate in that area?

You don't mention which version of Lua you investigated.  Lua 5.1's
incremental GC has some very interesting tuneables.

Lua isn't going to be more than 'soft' real time, if that, without
a rewrite/redesign I believe (and a hard realtime platform, natch).
But I think it's realistically quite good enough for ~60Hz
applications on modern hardware, 5.0 with some care, and 5.1 with
some minor tuning.  YMMV.

--adam
--
Adam D. Moss   -   [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Heap memory issues for Embedded Systems

Luiz Henrique de Figueiredo
In reply to this post by David Given
> For an allocated block, the overhead should be exactly four bytes 
> --- the size of the block, and depending on your application you might not 
> need that (if the app keeps track of the size of blocks itself, for example). 

Lua does keep track of block sizes.
--lhf

Reply | Threaded
Open this post in threaded view
|

Re: Heap memory issues for Embedded Systems

Glenn Maynard
On Tue, Jul 05, 2005 at 09:42:51PM -0300, Luiz Henrique de Figueiredo wrote:
> > For an allocated block, the overhead should be exactly four bytes 
> > --- the size of the block, and depending on your application you might not 
> > need that (if the app keeps track of the size of blocks itself, for example). 
> 
> Lua does keep track of block sizes.

I think most C++ allocators take advantage of that; it's one reason C++
allocators can be much more efficient than C malloc/free.  They don't
support realloc, though.

I wonder if it'd be a win, for allocation speed and memory fragmentation,
if Lua could be convinced to hook into C++-style allocation through callbacks.
I havn't profiled memory use or allocation counts recently in my code to see
if it's relevant, and I don't know enough about Lua to know if it makes a lot
of small allocations.

-- 
Glenn Maynard

Reply | Threaded
Open this post in threaded view
|

Re: Heap memory issues for Embedded Systems

Luiz Henrique de Figueiredo
> I wonder if it'd be a win, for allocation speed and memory fragmentation,
> if Lua could be convinced to hook into C++-style allocation through callbacks.

I'm not sure what you mean, but memory allocation in 5.1 is through callbacks.
In 5.0 you can compile Lua to use your own allocation routines, which use
the same interface (ie, get the old size).
--lhf

Reply | Threaded
Open this post in threaded view
|

Re: Heap memory issues for Embedded Systems

Mark Hamburg-4
Has anyone instrumented Lua to see what block sizes are common? Or is this
likely to vary from project to project? I'm thinking that a sub-allocator
for the common cases (or at least a free list) could be a significant win.

Mark

on 7/7/05 5:43 PM, Luiz Henrique de Figueiredo at [hidden email]
wrote:

>> I wonder if it'd be a win, for allocation speed and memory fragmentation,
>> if Lua could be convinced to hook into C++-style allocation through
>> callbacks.
> 
> I'm not sure what you mean, but memory allocation in 5.1 is through callbacks.
> In 5.0 you can compile Lua to use your own allocation routines, which use
> the same interface (ie, get the old size).
> --lhf


Reply | Threaded
Open this post in threaded view
|

Re: Heap memory issues for Embedded Systems

Luiz Henrique de Figueiredo
> Has anyone instrumented Lua to see what block sizes are common? Or is this
> likely to vary from project to project? I'm thinking that a sub-allocator
> for the common cases (or at least a free list) could be a significant win.

I think it's likely to depend on the application because string and closure
structs are allocated with a varying tail. However, simple value structs are
likely to be the most frequent.

Here is the result of "lua /dev/null" for Lua 5.1w6. The first column is the
number of requests for a block of the size given in the second column. The
alloc function (to replace the one in lauxlib.c) is given at the end.
Running "lua life.lua" shows a different pattern,  although the top 10 are
the same (in a slightly different order).

    169	20
     39	22
     38	21
     31	23
     27	32
     23	24
     16	28
     14	25
     12	56
     10	224
     10	112
      9	19
      7	27
      6	448
      4	26
      3	896
      2	31
      2	29
      2	16
      2	12
      1	91
      1	88
      1	76
      1	72
      1	540
      1	512
      1	348
      1	34
      1	256
      1	192
      1	18
      1	1792
      1	17
      1	128
      1	1024

static void *l_alloc (void *ud, void *ptr, size_t osize, size_t nsize) {
  static size_t total=0;
  static int time=0;
  void *optr=ptr;
  (void)ud;
  (void)osize;
  ++time;
  if (nsize == 0) {
    free(ptr);
    total-=osize;
    printf(">>> %p %p %8d - T=%d O=%d N=%d\n",ptr,optr,time,total,osize,nsize);
    return NULL;
  }
  else {
    total+=nsize-osize;
    ptr=realloc(ptr, nsize);
    printf(">>> %p %p %8d %c T=%d O=%d N=%d\n",ptr,optr,time,osize==0?'+':'=',total,osize,nsize);
    return ptr;
  }
}

--lhf

Reply | Threaded
Open this post in threaded view
|

RE: Heap memory issues for Embedded Systems

Alex Evans
In reply to this post by Glenn edgar
That's interesting! Given that I'm more interested in speed than memory
tightness, I guess there would be mileage in running out and writing a
super simplistic free-list-allocator-thing of 32 byte blocks used for
any lua allocs between 20 and 32 bytes...
A

-----Original Message-----
From: [hidden email]
[[hidden email]] On Behalf Of Luiz Henrique
de Figueiredo
Sent: 08 July 2005 11:26
To: Lua list
Subject: Re: Heap memory issues for Embedded Systems

> Has anyone instrumented Lua to see what block sizes are common? Or is
this
> likely to vary from project to project? I'm thinking that a
sub-allocator
> for the common cases (or at least a free list) could be a significant
win.

I think it's likely to depend on the application because string and
closure
structs are allocated with a varying tail. However, simple value structs
are
likely to be the most frequent.

Here is the result of "lua /dev/null" for Lua 5.1w6. The first column is
the
number of requests for a block of the size given in the second column.
The
alloc function (to replace the one in lauxlib.c) is given at the end.
Running "lua life.lua" shows a different pattern,  although the top 10
are
the same (in a slightly different order).

    169	20
     39	22
     38	21
     31	23
     27	32
     23	24
     16	28
     14	25
     12	56
     10	224
     10	112
      9	19
      7	27
      6	448
      4	26
      3	896
      2	31
      2	29
      2	16
      2	12
      1	91
      1	88
      1	76
      1	72
      1	540
      1	512
      1	348
      1	34
      1	256
      1	192
      1	18
      1	1792
      1	17
      1	128
      1	1024

static void *l_alloc (void *ud, void *ptr, size_t osize, size_t nsize) {
  static size_t total=0;
  static int time=0;
  void *optr=ptr;
  (void)ud;
  (void)osize;
  ++time;
  if (nsize == 0) {
    free(ptr);
    total-=osize;
    printf(">>> %p %p %8d - T=%d O=%d
N=%d\n",ptr,optr,time,total,osize,nsize);
    return NULL;
  }
  else {
    total+=nsize-osize;
    ptr=realloc(ptr, nsize);
    printf(">>> %p %p %8d %c T=%d O=%d
N=%d\n",ptr,optr,time,osize==0?'+':'=',total,osize,nsize);
    return ptr;
  }
}

--lhf



Reply | Threaded
Open this post in threaded view
|

Re: Heap memory issues for Embedded Systems

Luiz Henrique de Figueiredo
> I guess there would be mileage in running out and writing a super
> simplistic free-list-allocator-thing of 32 byte blocks used for any
> lua allocs between 20 and 32 bytes...

Perhaps, but you'd have to instrument your application. Don't take
the numbers I posted as valid for every application; they were for a
do-nothing run. Also, the numbers did not show the history of allocations.
I mean, there's no point in doing custom allocation for the initialization
phase of your application; you should concentrate on its main phase.
--lhf