io:lines() and \0

classic Classic list List threaded Threaded
198 messages Options
1234 ... 10
Reply | Threaded
Open this post in threaded view
|

io:lines() and \0

René Rebe
Hi all,

I just noticed that io:lines() does not cope with \0 in the lines, and thus just returns truncated lines (lua-5.2.3, but legacy 5.1 likewise).

May I suggest replacing the call to fgets in src/liolib.c so that we can read lines with \0 data?

René

-- 
 ExactCODE GmbH, Jaegerstr. 67, DE-10117 Berlin
 http://exactcode.com | http://exactscan.com | http://ocrkit.com | http://t2-project.org | http://rene.rebe.de
Reply | Threaded
Open this post in threaded view
|

Re: io:lines() and \0

steve donovan
On Mon, Feb 17, 2014 at 5:51 PM, René Rebe <[hidden email]> wrote:
> I just noticed that io:lines() does not cope with \0 in the lines, and thus
> just returns truncated lines (lua-5.2.3, but legacy 5.1 likewise).

This is not surprising.  The whole idea of 'lines' only really applies
to text files, at least in my head ;)

Reply | Threaded
Open this post in threaded view
|

Re: io:lines() and \0

René Rebe
Hi,

On Feb 17, 2014, at 16:55 , steve donovan wrote:

On Mon, Feb 17, 2014 at 5:51 PM, René Rebe <[hidden email]> wrote:
I just noticed that io:lines() does not cope with \0 in the lines, and thus
just returns truncated lines (lua-5.2.3, but legacy 5.1 likewise).

This is not surprising.  The whole idea of 'lines' only really applies
to text files, at least in my head ;)

well, in my option library foundations should just work, and not silently discard some bits and bytes. A line is a line, no matter how many \0 are in there until the next \n-newline. And the Lua manual points out Lua strings are \0-save.

I already provided patches a year or two ago for other pattern matching \0 fixes, which where merged into 5.2.

One quite simple and obvious use of lines with \0 binary data is parsing MIME, CGI data.

René

-- 
 ExactCODE GmbH, Jaegerstr. 67, DE-10117 Berlin
 http://exactcode.com | http://exactscan.com | http://ocrkit.com | http://t2-project.org | http://rene.rebe.de

Reply | Threaded
Open this post in threaded view
|

Re: io:lines() and \0

Craig Barnes
> I just noticed that io:lines() does not cope with \0 in the lines, and thus
> just returns truncated lines
> ...
> in my option library foundations should just work, and not silently
> discard some bits and bytes. A line is a line, no matter how many \0 are in
> there until the next \n-newline. And the Lua manual points out Lua strings
> are \0-save.

This seems to be the definitive answer (from ~8 years ago):

    http://lua-users.org/lists/lua-l/2006-01/msg00641.html

I think that's a fair explanation, since it can be replaced with:

    local file = assert(io.open(filename))
    local text = assert(file:read("*a"))
    for line in text:gmatch "([^\n]*)\n" do
        -- print(line)
    end

...which handles embedded zeros at the cost of reading the entire
thing into memory.

It would be helpful if the limitation was mentioned in the
reference manual though. I spent quite a while trying to
work out what was happening the first time I encoutered it.

Reply | Threaded
Open this post in threaded view
|

Re: io:lines() and \0

Luiz Henrique de Figueiredo
In reply to this post by René Rebe
> I just noticed that io:lines() does not cope with \0 in the lines, and thus just returns truncated lines (lua-5.2.3, but legacy 5.1 likewise).
>
> May I suggest replacing the call to fgets in src/liolib.c so that we can read lines with \0 data?

At least in Mac OS X and Linux fgets works just fine: it reads bytes
until it sees \n, as promised in its man page. Unfortunately, fgets does
not tell you how many bytes it has read and you're left with having to
call strlen to find this out. I guess we could avoid strlen and use
memchr instead.

How did you propose to replace fgets?

Reply | Threaded
Open this post in threaded view
|

Re: io:lines() and \0

Coda Highland
On Mon, Feb 17, 2014 at 9:10 AM, Luiz Henrique de Figueiredo
<[hidden email]> wrote:
>> I just noticed that io:lines() does not cope with \0 in the lines, and thus just returns truncated lines (lua-5.2.3, but legacy 5.1 likewise).
>>
>> May I suggest replacing the call to fgets in src/liolib.c so that we can read lines with \0 data?
>
> At least in Mac OS X and Linux fgets works just fine: it reads bytes
> until it sees \n, as promised in its man page. Unfortunately, fgets does
> not tell you how many bytes it has read and you're left with having to
> call strlen to find this out. I guess we could avoid strlen and use
> memchr instead.

memchr doesn't work if fgets hits EOF before it hits \n, though.

getline(3) looks line it would work, but it's POSIX, not ISO C99, so I
don't think it's supported by MSVC.

I think the only possible implementation that complies with strict ISO
C99, doesn't lose data, and doesn't walk off the end of the buffer is
to do it all by hand, either buffering blocks of the file in memory,
or reading one character at a time.

/s/ Adam

Reply | Threaded
Open this post in threaded view
|

Re: io:lines() and \0

Francisco Olarte
In reply to this post by Luiz Henrique de Figueiredo
Hi:

On Mon, Feb 17, 2014 at 6:10 PM, Luiz Henrique de Figueiredo
<[hidden email]> wrote:

>> I just noticed that io:lines() does not cope with \0 in the lines, and thus just returns truncated lines (lua-5.2.3, but legacy 5.1 likewise).
>>
>> May I suggest replacing the call to fgets in src/liolib.c so that we can read lines with \0 data?
>
> At least in Mac OS X and Linux fgets works just fine: it reads bytes
> until it sees \n, as promised in its man page. Unfortunately, fgets does
> not tell you how many bytes it has read and you're left with having to
> call strlen to find this out. I guess we could avoid strlen and use
> memchr instead.
>
> How did you propose to replace fgets?
>

I woud propose something like:

int c;
while ((c=getc(f))!='\n' && c!=EOF) {
    luaL_addchar(buffer, c);
}
// Test error if needed, we do not need strlen calls here, wrap buffer, return.

One of the strengths of C is the speed at what it can read a char and
act on it ( or not act ). getc() is normally very fast. lual_addchar
is very fast, for what I've seen in lualib.h.  The speed loss should
not be noticeable, and IMO not worrth the extra complexity in the
current readline function. I'f I've read the (5.2.2) sources correctly
the culprint is:

static int read_line (lua_State *L, FILE *f, int chop) {
  luaL_Buffer b;
  luaL_buffinit(L, &b);
  for (;;) {
    size_t l;
    char *p = luaL_prepbuffer(&b);
    if (fgets(p, LUAL_BUFFERSIZE, f) == NULL) {  /* eof? */
      luaL_pushresult(&b);  /* close buffer */
      return (lua_rawlen(L, -1) > 0);  /* check whether read something */
    }
    l = strlen(p);
    if (l == 0 || p[l-1] != '\n')
      luaL_addsize(&b, l);
    else {
      luaL_addsize(&b, l - chop);  /* chop 'eol' if needed */
      luaL_pushresult(&b);  /* close buffer */
      return 1;  /* read at least an `eol' */
    }
  }
}

Which, using getc, would reduce to,more or less:

static int read_line (lua_State *L, FILE *f, int chop) {
  luaL_Buffer b;
  luaL_buffinit(L, &b);
  int c;
  while (  (c=getc(f))!='\n'  &&  c!=EOF) {
    luaL_addchar(buffer, c);
  }
  if (!chop && c=='\n') {
    luaL_addchar(buffer, c); /* Add newline if needed. */
  }
  luaL_pushresult(&b);  /* close buffer */
  return (c=='\n') || (lua_rawlen(L, -1) > 0);  /* check whether read
something ( either newline or non-empty before EOF )*/
}

which I find much clearer.

Francisco Olarte.

Reply | Threaded
Open this post in threaded view
|

Re: io:lines() and \0

William Ahern
In reply to this post by René Rebe
On Mon, Feb 17, 2014 at 05:16:29PM +0100, Ren? Rebe wrote:

> Hi,
>
> On Feb 17, 2014, at 16:55 , steve donovan wrote:
>
> > On Mon, Feb 17, 2014 at 5:51 PM, Ren? Rebe <[hidden email]> wrote:
> >> I just noticed that io:lines() does not cope with \0 in the lines, and thus
> >> just returns truncated lines (lua-5.2.3, but legacy 5.1 likewise).
> >
> > This is not surprising.  The whole idea of 'lines' only really applies
> > to text files, at least in my head ;)
>
> well, in my option library foundations should just work, and not silently
> discard some bits and bytes. A line is a line, no matter how many \0 are
> in there until the next \n-newline. And the Lua manual points out Lua
> strings are \0-save.
>
> I already provided patches a year or two ago for other pattern matching \0
> fixes, which where merged into 5.2.
>
> One quite simple and obvious use of lines with \0 binary data is parsing
> MIME, CGI data.

Well, in MIME a line ends in \r\n. So if you want to be 8-bit clean you
technically shouldn't be treating a line as simply ending in \n, anyhow.

OTOH, in MIME even "8-bit" encoded entities shouldn't have bare \0 or \n
characters. The "binary" transfer encoding allows those. But even in binary
transfer encoding a line is \r\n.

So there's no simple answer, really.

The sockets implementation in my cqueues library has a text-mode translation
feature which translates \r\n sequences to \n, because on Unix (unlike
Windows) this is not done by the underlying stdio implementation. This
allows simple (and in practice mostly correct) implementation of MIME-like
protocols. But of course I had to implement all of the buffering myself
because you simply cannot reliably depend on the underlying implementation
if you want dependable behavior.

For example, what's your maximum line length? MIME specifies 998, but in
practice lots of implementations allow much larger limits because of broken
clients (like brain-dead PHP scripts). Lua's internal limit is also probably
too small to be production-quality reliable on the open internet (unless you
want endless support calls), and in any event it's not configurable.

Basically, if you want to be serious about this stuff you have to do your
own buffering.


Reply | Threaded
Open this post in threaded view
|

Re: io:lines() and \0

Francisco Olarte
Hi:


On Mon, Feb 17, 2014 at 10:05 PM, William Ahern
<[hidden email]> wrote:
....
> The sockets implementation in my cqueues library has a text-mode translation
> feature which translates \r\n sequences to \n, because on Unix (unlike
> Windows) this is not done by the underlying stdio implementation. This
> allows simple (and in practice mostly correct) implementation of MIME-like
> protocols. But of course I had to implement all of the buffering myself
> because you simply cannot reliably depend on the underlying implementation
> if you want dependable behavior.

AFAIK sockets does not use an underlying stdio, they are aprox. at the
IO level, with open/read/write/close(2). It is the other way round,
you wrap a socket in an stdio object.

Anyway, I connot speak of unix in general, but the glibcs I normally
use have text mode, in fact they describe the default open behaviour
as text ( from man 3 fopen ):

       r      Open text file for reading.  The stream is positioned at
the beginning of the file.

It just happens that text mode is easy on unix.

I've used a lot of C runtimes in CP/M, MSDOS and WIN* , and found that
'text' file handling was at least peculiar. I've found runtimes which
just dropped '\015', others which considered '\012\015', '\015\012'
and '\012' as '\n', others which used '\015*\012', even ones which
also considered naked '\015' as line end. For your purpose of reading
lines, you would be out of luck with all of them, as every single one
I tested considered naked '\012' as '\n', which would make 'A\015\012'
and 'A\012' undistinguishable.

OTOH that's is, as you say, mostly correct. If you adhere to the motto
'Be strict in what you send and tolerant in what you accept', they
work quite well.

Francisco Olarte.

Reply | Threaded
Open this post in threaded view
|

Re: io:lines() and \0

Roberto Ierusalimschy
In reply to this post by Francisco Olarte
> I woud propose something like:
>
> [...]

ANSI C says that about text files:

  Data read in from a text stream will necessarily compare equal to
  the data that were earlier written out to that stream only if: the
  data consist only of printing characters and the control characters
  horizontal tab and new-line; no new-line character is immediately
  preceded by space characters; and the last character is a new-line
  character.

So, there is no garanties that a text file with embedded zeros will be
read correctly, no matter how we implement it.


> One of the strengths of C is the speed at what it can read a char and
> act on it ( or not act ). getc() is normally very fast. lual_addchar
> is very fast, for what I've seen in lualib.h.  The speed loss should
> not be noticeable, and IMO not worrth the extra complexity in the
> current readline function. [...]

Just for the record: In my machine, the following program,

  local count = 0
  for l in io.lines() do
    count = count + #l
  end
  print(count)

reading the Bible, takes ~0.07s with the current implementation and
~0.14s with this proposal.

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: io:lines() and \0

Enrico Colombini
On 18/02/2014 15.04, Roberto Ierusalimschy wrote:

> ANSI C says that about text files:
>
>    Data read in from a text stream will necessarily compare equal to
>    the data that were earlier written out to that stream only if: the
>    data consist only of printing characters and the control characters
>    horizontal tab and new-line; no new-line character is immediately
>    preceded by space characters; and the last character is a new-line
>    character.
>
> So, there is no garanties that a text file with embedded zeros will be
> read correctly, no matter how we implement it.

Another example is the "end-of-text" character (0x04 in Unix, 0x1a in
Windows) that terminates text files.

> Just for the record: In my machine, the following program,
>
>    local count = 0
>    for l in io.lines() do
>      count = count + #l
>    end
>    print(count)
>
> reading the Bible, takes ~0.07s with the current implementation and
> ~0.14s with this proposal.

I really can't see any advantage in making text file reading worse, just
to handle the nonstandard case "read a binary file using text file
functions".
Especially because the nonstandard case can be easily handled in other
ways, either in C or in pure Lua.

--
   Enrico

Reply | Threaded
Open this post in threaded view
|

Re: io:lines() and \0

Andrew Starks


On Tuesday, February 18, 2014, Enrico Colombini <[hidden email]> wrote:
On 18/02/2014 15.04, Roberto Ierusalimschy wrote:
ANSI C says that about text files:

   Data read in from a text stream will necessarily compare equal to
   the data that were earlier written out to that stream only if: the
   data consist only of printing characters and the control characters
   horizontal tab and new-line; no new-line character is immediately
   preceded by space characters; and the last character is a new-line
   character.

So, there is no garanties that a text file with embedded zeros will be
read correctly, no matter how we implement it.

Another example is the "end-of-text" character (0x04 in Unix, 0x1a in Windows) that terminates text files.

Just for the record: In my machine, the following program,

   local count = 0
   for l in io.lines() do
     count = count + #l
   end
   print(count)

reading the Bible, takes ~0.07s with the current implementation and
~0.14s with this proposal.

I really can't see any advantage in making text file reading worse, just to handle the nonstandard case "read a binary file using text file functions".
Especially because the nonstandard case can be easily handled in other ways, either in C or in pure Lua.

--
  Enrico


The OP did offer up that a clarification in the documentation might also be an improvement. Given that Lua strings are 8bit clean and that they can contain \0, this seems wise. I agree that strings ~= lines, of course.   
Reply | Threaded
Open this post in threaded view
|

Re: io:lines() and \0

Roberto Ierusalimschy
> The OP did offer up that a clarification in the documentation might also be
> an improvement. Given that Lua strings are 8bit clean and that they can
> contain \0, this seems wise. I agree that strings ~= lines, of course.

That part is easy :)

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: io:lines() and \0

Francisco Olarte
In reply to this post by Roberto Ierusalimschy
Hi:

On Tue, Feb 18, 2014 at 3:04 PM, Roberto Ierusalimschy
<[hidden email]> wrote:
> Just for the record: In my machine, the following program,
...
> reading the Bible, takes ~0.07s with the current implementation and
> ~0.14s with this proposal.

Just this morning, working on unrelated thing, I stumbled upon the
unlocked_stdio(3) man pages, which make me realize my experience with
non-thread-aware stdios is totally outdated :(

After receiving this I decided to make a very simple test, as I
realized sync overhead may kill perfomance. Test program is attached,
run times where repeated and are quite repetitive.

Note, this is not to endorse my solution, after reading this I totally
see it needs rethinking, and I completely withdraw it, but I thought
the results may be useful for somene which needs to do the things I
did in the past with getc ( mainly, processing huge files using a
simple loop plus a state machine, or something similar).

I do not know how long is the bible, or where to grab it, so I just
used a file sitting around on my computer big enough to give relevant
timings, small enough to insure full caching ( and accessible to
anyone who may want to repeat the tests )

folarte@paqueton:~/tmp$ ls -l ~/Downloads/netbeans-7.4-javase-linux.sh
-rw------- 1 folarte folarte 87140352 Oct 22 12:12
/home/folarte/Downloads/netbeans-7.4-javase-linux.sh

I got this times the first two runs:
folarte@paqueton:~/tmp$ ./timeit <~/Downloads/netbeans-7.4-javase-linux.sh
Warm disk cache: 0.777915
fgets: 0.086217
getc: 0.817653
fgets_unlocked: 0.080532
getc_unlocked: 0.070354
folarte@paqueton:~/tmp$ ./timeit <~/Downloads/netbeans-7.4-javase-linux.sh
Warm disk cache: 0.080762
fgets: 0.080582
getc: 0.852571
fgets_unlocked: 0.078911
getc_unlocked: 0.069547

I repeated it several more times, timing was stable enough.

As you can see LOCKING seems to be killing performance.  fgets is not
too bad, but I consistently got about 2/3% more time due to locks, but
getc got always more than 11 times slower, more than 1000% sync
penalty. Also, in every run I did, the unlocked versions of fgets
where always noticeably slower than unlocked getc, which correlates
with my outdated experience which non-thread aware runtimes.

IIRC Roberto uses linux as me, so I suppose his smaller time
difference is due to all the extra processing done instead of my empty
loops, I was just trying to measure raw read & discard performance.

And I'll repeat myself, this is not to defend my proposal, it's wrong
on current runtimes ( unlocked is not ANSI, lua is better served by an
the current solution, and if I needed an ultrafast module I would
possibly just go for raw read(2) for better control ( although with
the current fgets speed, normal fgets ( or the alternate getc ) real
world usage will probably be limited by disk throughput  )  ) , but I
figured once I've taken the time to measure the info it could be
useful for the community.

Hapy hacking.

Francisco Olarte.

PS: Just for timing comparison, with the file cached:

folarte@paqueton:~/tmp$ dd if=~/Downloads/netbeans-7.4-javase-linux.sh
of=/dev/null
170196+0 records in
170196+0 records out
87140352 bytes (87 MB) copied, 0.155959 s, 559 MB/s
folarte@paqueton:~/tmp$ dd if=~/Downloads/netbeans-7.4-javase-linux.sh
of=/dev/null bs=16384
5318+1 records in
5318+1 records out
87140352 bytes (87 MB) copied, 0.0259828 s, 3.4 GB/s
folarte@paqueton:~/tmp$ dd if=~/Downloads/netbeans-7.4-javase-linux.sh
of=/dev/null bs=32768
2659+1 records in
2659+1 records out
87140352 bytes (87 MB) copied, 0.0238975 s, 3.6 GB/s
folarte@paqueton:~/tmp$ dd if=~/Downloads/netbeans-7.4-javase-linux.sh
of=/dev/null bs=65536
1329+1 records in
1329+1 records out
87140352 bytes (87 MB) copied, 0.0230595 s, 3.8 GB/s

timeit.c (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: io:lines() and \0

Francisco Olarte
In reply to this post by Enrico Colombini
Hi:

On Tue, Feb 18, 2014 at 4:16 PM, Enrico Colombini <[hidden email]> wrote:
> Another example is the "end-of-text" character (0x04 in Unix, 0x1a in
> Windows) that terminates text files.

Both Unix and Windows ( win32 & 64, and also 'modern' MSDOS which
provided files to Win16 ), have exact byte lengths in their files. ^D
/ ^Z are used by the terminal drivers to signal end of 'stream', and
are not transferred. IIRC, you can even feed them via the terminal
using some escaping.

The OS which used ^Z to signal EOF was CP/M, which stored the length
in blocks ( 128-byte blocks = sectors on the original 77 track / 26
sectors 8 inches floppies ), as it needed some way to signal the
byte-end. MSDOS files sometimes had a ^Z tucked at the end to be able
to use the same exact code as CP/M ( CP/M programs could be machine
transalated to MSDOS easily ).

But you can find much stranger things in other operating systems, as
some of them distinguishat the OS level between text and binary and
fixed / variable length, and even other quirks.

Francisco Olarte.

Reply | Threaded
Open this post in threaded view
|

Re: io:lines() and \0

Tim Hill
In reply to this post by Roberto Ierusalimschy

On Feb 18, 2014, at 6:04 AM, Roberto Ierusalimschy <[hidden email]> wrote:

>> I woud propose something like:
>>
>> [...]
>
> ANSI C says that about text files:
>
>  Data read in from a text stream will necessarily compare equal to
>  the data that were earlier written out to that stream only if: the
>  data consist only of printing characters and the control characters
>  horizontal tab and new-line; no new-line character is immediately
>  preceded by space characters; and the last character is a new-line
>  character.
>
> So, there is no garanties that a text file with embedded zeros will be
> read correctly, no matter how we implement it.
>

Is anyone else uneasy about that “no new-line character is immediately preceded by space characters” bit? I take this to mean trailing space in lines may give unpredictable results in ANSI C, which is pretty eye-brow raising to me.

If this is true, this reads like the ANSI committee bending the standard to meet a (buggy) implementation, as they did with realloc() and a few others.

—Tim


Reply | Threaded
Open this post in threaded view
|

Re: io:lines() and \0

Francisco Olarte
Hi:

On Tue, Feb 18, 2014 at 7:35 PM, Tim Hill <[hidden email]> wrote:
> Is anyone else uneasy about that "no new-line character is immediately preceded by space characters" bit? I take this to mean trailing space in lines may give unpredictable results in ANSI C, which is pretty eye-brow raising to me.
> If this is true, this reads like the ANSI committee bending the standard to meet a (buggy) implementation, as they did with realloc() and a few others.

I think they put it there to be able to cover every strange OS out
there. Some of them use fixed record length files, space padded, for
text ( I used one of them, it did that for hollerith punch cards
compatibility ). OS have evolved a lot, and become more uniform, but I
don not want to think which kind of things the IBM mainframes with
their backward compatibility are supporting. You are not likely to
find many implementation where the space stuff matters, but if you
meet one you maybe lucky to be able to have thar thing on the
implementation.

Francisco Olarte.

Reply | Threaded
Open this post in threaded view
|

Re: io:lines() and \0

Tim Hill

On Feb 18, 2014, at 10:50 AM, Francisco Olarte <[hidden email]> wrote:

> Hi:
>
> On Tue, Feb 18, 2014 at 7:35 PM, Tim Hill <[hidden email]> wrote:
>> Is anyone else uneasy about that "no new-line character is immediately preceded by space characters" bit? I take this to mean trailing space in lines may give unpredictable results in ANSI C, which is pretty eye-brow raising to me.
>> If this is true, this reads like the ANSI committee bending the standard to meet a (buggy) implementation, as they did with realloc() and a few others.
>
> I think they put it there to be able to cover every strange OS out
> there. Some of them use fixed record length files, space padded, for
> text ( I used one of them, it did that for hollerith punch cards
> compatibility ). OS have evolved a lot, and become more uniform, but I
> don not want to think which kind of things the IBM mainframes with
> their backward compatibility are supporting. You are not likely to
> find many implementation where the space stuff matters, but if you
> meet one you maybe lucky to be able to have thar thing on the
> implementation.
>
> Francisco Olarte.
>

OK except my reading of the ANSI text is that reading lines is *not* guaranteed to work if the newline is preceded by spaces, i.e. it may fail if the line has trailing whitespace. Which is the opposite of allowing for fixed-wdith punched cards.

Now, I very much doubt if any real-world C runtime would behave like that, but it is a bit odd.

—Tim


Reply | Threaded
Open this post in threaded view
|

Re: io:lines() and \0

Francisco Olarte
Hi:

On Wed, Feb 19, 2014 at 8:53 AM, Tim Hill <[hidden email]> wrote:

> OK except my reading of the ANSI text is that reading lines is *not* guaranteed to work if the newline is preceded by spaces, i.e. it may fail if the line has trailing whitespace. Which is the opposite of allowing for fixed-wdith punched cards.

Do not exactly remember the text, but what I was trying to say is some
systems do NOT use newline chars, they use fixed lentgh records, space
padded, and the runtime has to syntetize the newline, or they have a
syntetic newline at some other position. Anyway, you normally do not
have to worry about this if you use any of the mainline operating
systems.

> Now, I very much doubt if any real-world C runtime would behave like that, but it is a bit odd.

Not any modern one, as presently nearly every OS treats files as a
byte array, but once you throw exotic device streams and legacy OS
into the mix, I'm not so sure. Backwards compatibiity is a bitch.

Francisco Olarte.

Reply | Threaded
Open this post in threaded view
|

Re: io:lines() and \0

Roberto Ierusalimschy
In reply to this post by Francisco Olarte
> I do not know how long is the bible

~ 4.3MBytes.

> or where to grab it,

  http://www.gutenberg.org/cache/epub/10/pg10.txt

(I got this idea of using the Bible for file tests from Kernighan:
   http://cm.bell-labs.com/cm/cs/who/bwk/interps/pap.html)

-- Roberto

1234 ... 10