popen read and write?

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

popen read and write?

Michal Kolodziejczyk-3
Hello,
I can see that popen() can be used either to read or to write data
from/to the process. Is there any hope it would be able to read and
write data at the same time?
How do you use lua when trying do write and read to/from the same
process? (under linux if this matters)

Regards,
miko

Reply | Threaded
Open this post in threaded view
|

Re: popen read and write?

Klaus Ripke
Hi

On Mon, Oct 08, 2007 at 04:58:14PM +0200, Michal Kolodziejczyk wrote:
> I can see that popen() can be used either to read or to write data
> from/to the process. Is there any hope it would be able to read and
> write data at the same time?
with a lua popen based on libc's popen: no

> How do you use lua when trying do write and read to/from the same
> process? (under linux if this matters)
you have to roll your own popen, based on a socketpair(PF_UNIX),
which is basically a bidirectional pipe.

You may try to attach your end to a FILE* using fdopen,
but I'm not sure whether glibc's stdio handles bidirectional
streams correctly (they need two separate buffers).

(I am using my homegrown stdio replacement anyway,
so I can for example also use TCP sockets with all
of Lua's IO calls.)


regards

Reply | Threaded
Open this post in threaded view
|

Re: popen read and write?

Jerome Vuarand
In reply to this post by Michal Kolodziejczyk-3
2007/10/8, Michal Kolodziejczyk <[hidden email]>:
> I can see that popen() can be used either to read or to write data
> from/to the process. Is there any hope it would be able to read and
> write data at the same time?
> How do you use lua when trying do write and read to/from the same
> process? (under linux if this matters)

You can do that with the ex library:

- http://lua-users.org/lists/lua-l/2007-02/msg00328.html
- http://lua-users.org/wiki/ExtensionProposal

Reply | Threaded
Open this post in threaded view
|

Re: popen read and write?

Edgar Toernig
In reply to this post by Michal Kolodziejczyk-3
Michal Kolodziejczyk wrote:
>
> Hello,
> I can see that popen() can be used either to read or to write data
> from/to the process. Is there any hope it would be able to read and
> write data at the same time?
> How do you use lua when trying do write and read to/from the same
> process? (under linux if this matters)

This question comes up again and again and there are always
people who show how easy it is to implement it.
Unfortunately, nobody yet has told about the problems that
such a bidirectional popen (let's call it popen2) has.

There's is a very good reason why such a popen2 isn't part of
the POSIX standard: it's _very_very_hard_ to use it correctly.
(I would even go as far as saying that it's impossible to
get right with stdio.)  The trivial cases seems to work well
but real work results in a deadlock.

Just take the most simple filter: cat.  It echos everything it
reads from stdin to stdout.  Let's assume our popen2 uses a
single file handle for reading and writing (using two handles
doesn't really change the problems) and that it's using stdio
(that gives even more problems but still doesn't change the
inherent difficulties).

You think this sequence is ok?

	fp = popen2("cat")
	fp:write("Hello World!")
	x = fp:read("*a")
	fp:close()

No, it's broken - not even this trivial example works.
It hangs.  Why?  Because fp is buffered - you need to
flush the output buffer otherwise "cat" gets nothing
to echo back to your read(). 

Ok, let's add a flush:

	fp:write("Hello World!")
	fp:flush_output()
	x = fp:read("*a")

Heck, still no go.  It hangs in read again.  Still
something wrong.  What's going on?  Easy, cat will
not write such a small amount of data (12 chars).
Even if it were in line buffered mode, there's no
\n at the end of "Hello World!".  It needs a buffer
full of data (whatever that is) or an EOF to process
the data.  So, give it an EOF:

	fp:write("Hello World!")
	fp:flush_output()
	fp:close_output()
	x = fp:read("*a")

Fine.  This trivial example works.  Kind of.  It may work
on your system but not on others.  Actually, it will work
on most systems but only because the implicit assumptions
made by that code (namely that about buffer sizes and the
behaviour of "cat") are satisfied on most systems.

To see what's wrong let's shove bigger chunks of data into
the filter:

	fp:write(about_64k_of_data)
	fp:flush_output()
	fp:close_output()
	x = fp:read("*a")

64k should be big enough to fill all buffers between
the parent and the filter process (the actual value
differs greatly between systems).  What happens?  The
program hangs again!  Why?  Because the parent is still
trying to write data to the filter but the filter is no
longer reading the data.  It is trying to pass already
processes data back to the parent.  Deadlock!

Trying to circumvent the deadlock by guessing proper
chunk sizes is fruitless.  Even if you know the buffer
sizes (of the system and that of the used filter!) you
may not know in advance how much data the filter may
produce and you're lost.


There are usually two ways to handle bidirectional
popens:
 
First: use two different processes/tasks/threads, one
for sending data to the filter and one for reading data
from it.  Of course, these two processes shouldn't block
each other through an additional communication channel
or the deadlock may come again.

This method is the one usually used for typical filters
in Unix pipes.  It works very well and the implementation
gives no big problems.  It is used when the reader and
the writer-process are indepent from each other, a simple
producer/filter/consumer relationship.


Second: use non-blocking file handles together with
select/poll-like system calls to dispatch reading and
writing yourself.  That prohibits use of stdio.  The
stdio routines are not designed for non-blocking access
and all kind of magic things happen.  Even if you think
you can get away with select and blocking I/O (you're
wrong btw) stdio won't play nice with you.  You can't
really control when and how stdio performs I/O on the
relevant file pointers and you can bet that it works
differently on a different implementation.  You can't
even query if there's still something in the input
buffer!

This method is prefered when a single app really tries
to communicate with the "filter".  You have to implement
your own buffering with overflow and underflow handling,
timeouts etc.  A generic implementation for all kind of
"filters" gives a non-trivial API.  Probably not easier
than the POSIX non-blocking I/O API.


I didn't test Python's popen2 family of functions but I
bet that they won't pass the "cat" test.  The users of
these functions usually ignore the deadlock problem.
They pass tiny packets of data back and forth and hope
that it works.  It does, until one of the packets exceeds
some hard to tell size[1] or the filter does something
unexpected.


Reliable bidirectional communication between two processes
isn't trivial.  Getting it right is hard[2].  Don't let
people thing it's easy.


Back to the original poster: if possible, redirect one end
of the stream (input or output) to a file.  I.e.:

	fp = io.popen("foo >/tmp/unique", "w")
	fp:write(anything)
	fp:close()
	fp = io.open("/tmp/unique")
	x = read("*a")
	fp:close()

If you want to process it further, pipe it into another
Lua instance:

	fp = io.popen("foo | lua part2.lua", "w")
	...

For reliable bidrectional communication there's nothing
at the moment in Lua and, afaik, none of the present
extension libraries provide enough functionality to
implement it (multiple processes method maybe).

Ciao, ET.


[1] Regarding hard to tell size: Linux sometime back changed
pipe buffers from a single page sized buffer to something
like 8 page-sized buffers.  But each one could be partial-
ly full as every write consumed at least one buffer.
Worst case: a pipe could buffer only 8 bytes if you perform
single byte writes!

[2] even ssh had this kind of deadlock - too much data on
stderr and noone was reading it - hang.

Reply | Threaded
Open this post in threaded view
|

Re: popen read and write?

gary ng
--- Edgar Toernig <[hidden email]> wrote:
> Second: use non-blocking file handles together with
> select/poll-like system calls to dispatch reading
> and writing yourself.  That prohibits use of 
> stdio.  The stdio routines are not designed for non-
> blocking access and all kind of magic things 
> happen.  Even if you think you can get away with 
> select and blocking I/O (you're
> wrong btw) stdio won't play nice with you.

You hit the nail of the problem in general though I
would appreciate comment on this.

Can't I just use COPAS and non-blocking io on
stdin/stdout(i.e. fh 0/1) ? As it seems to me it is no
different than a program launched by tcpserver ?

That is, I have two handler in COPAS, one is the
writer(on 1), the other the reader(on 0). I can write
until it blocks and if the sub-process have enough to
work on(some kind of filter) would trigger my reader
on next select() ?



       
____________________________________________________________________________________
Building a website is a piece of cake. Yahoo! Small Business gives you all the tools to get online.
http://smallbusiness.yahoo.com/webhosting 

Reply | Threaded
Open this post in threaded view
|

Re: popen read and write?

Edgar Toernig
gary ng wrote:
>
> Can't I just use COPAS and non-blocking io on
> stdin/stdout(i.e. fh 0/1) ? As it seems to me it is no
> different than a program launched by tcpserver ?
>
> That is, I have two handler in COPAS, one is the
> writer(on 1), the other the reader(on 0). I can write
> until it blocks and if the sub-process have enough to
> work on(some kind of filter) would trigger my reader
> on next select() ?

I don't know COPAS but as I understand it, it should
work fine.  It's the two-process-method implemented
with coroutines and select.  The coroutines are used
to create userspace non-preemptive threads and select
is used by the scheduler to wake up the right thread.

Ciao, ET.

Reply | Threaded
Open this post in threaded view
|

Re: popen read and write?

steve donovan
In reply to this post by Edgar Toernig
On 10/9/07, Edgar Toernig <[hidden email]> wrote:
> To see what's wrong let's shove bigger chunks of data into
> the filter:
>
>         fp:write(about_64k_of_data)
>         fp:flush_output()
>         fp:close_output()
>         x = fp:read("*a")
>
> 64k should be big enough to fill all buffers between
> the parent and the filter process (the actual value
> differs greatly between systems).  What happens?  The
> program hangs again!  Why?  Because the parent is still
> trying to write data to the filter but the filter is no
> longer reading the data.  It is trying to pass already
> processes data back to the parent.  Deadlock!

popen2 traditionally returns _two_ file objects, one for writing and
one for reading.
How about this implementation?

http://mysite.mweb.co.za/residents/sdonovan/popen.zip

(on Fedora Core 5)
> require 'popen'
> w,r = popen.popen2('cat')
> w:write(string.rep('*',64*1024))
> w:close()
> res = r:read('*a')
> = #res
65536

Have I missed something here?

steve d.

Reply | Threaded
Open this post in threaded view
|

Re: popen read and write?

Edgar Toernig
steve donovan wrote:
>
> popen2 traditionally returns _two_ file objects, one for writing and
> one for reading.

As I said, it doesn't matter.  At the moment you block in
a read or write you are unable to process data in the
other direction and the application may deadlock.

> How about this implementation?
> 
> http://mysite.mweb.co.za/residents/sdonovan/popen.zip

It is prone to deadlocks.

And btw, the unix version leaks fds in erros paths and
passes to many fds to the spawned process. 

> (on Fedora Core 5)
> > require 'popen'
> > w,r = popen.popen2('cat')
> > w:write(string.rep('*',64*1024))
> > w:close()
> > res = r:read('*a')
> > = #res
> 65536
> 
> Have I missed something here?

Just try something bigger.

It's hard to detect the buffer sizes. Possibly
Linux handles exactly 64k at the moment (8 pages
for each pipe = 8*4k*2 = 64k).  If it were using
sockets the limits may be even bigger, something
like 64k or 128k per direction.  Traditional unix
had 4k.

And, with a strange "cat", one that reads all
data until EOF and only then writes everything
out, you will never experience a deadlock.
Only out of memory conditions ;-)

Ciao, ET.

Reply | Threaded
Open this post in threaded view
|

Re: popen read and write?

steve donovan
On 10/9/07, Edgar Toernig <[hidden email]> wrote:
> It is prone to deadlocks.
>
> And btw, the unix version leaks fds in erros paths and
> passes to many fds to the spawned process.
> Just try something bigger.

Thanks, I shall keep hammering it ;)   Duck has also noticed that one
can get zombie processes with it.

steve d.

Reply | Threaded
Open this post in threaded view
|

Re: popen read and write?

Jeff Pohlmeyer
Reply | Threaded
Open this post in threaded view
|

Re: popen read and write?

Duck-2
In reply to this post by Michal Kolodziejczyk-3

How about this implementation?

http://mysite.mweb.co.za/residents/sdonovan/popen.zip

It is prone to deadlocks.

Anything of Steve's popen2 sort is prone to deadlock, most commonly because its most useful purpose is to allow you to "program" or to script the remote control of an interactive program...so you'd better not make any mistakes. (Same sort of problem you have with |& in gawk, or when using expect, or any templated scripting of an interactive program. After all, if you could solve this problem generically you could pass a Turing Test with ease ;-)

[snip]

Thanks, I shall keep hammering it ;)   Duck has
also noticed that one can get zombie processes
with it.

If the subprocess hangs, or you give up and close the pipes before the process ends...

I had in mind extending Steve's code (and may yet do it) in order to:

1. Provide for "close" and "kill" methods (callable explicitly at any time, or implicitly at __gc time) which attempt to get rid of unfinished subprocesses -- with varying degrees of co-operation from the subprocess.

2. Provide for a "peek" method which portably allows you to see whether a read is going to block. (On Linux I think you can simply use select(). On see if reads on a pipe will block or not.)

(1) will allow zombies to be avoided. (2) will make parent read()s much safer. If the wrong, or no, data comes back you will be able to bail out after a user-specified timeout. A LuaSocket-type timeout for "total subprocess runtime" (LuaSocket lets you have a timeout on individual reads and on total time blocked -- nice) would be handy, but the Linux and Windows code would be irritatingly different.

I just have to get a Round Tuit :-) Don't hold your breath.


Reply | Threaded
Open this post in threaded view
|

Re: popen read and write?

Edgar Toernig
Duck wrote:
>
> Anything of Steve's popen2 sort is prone to deadlock, [...]

Yes and no.  Yes because he has based his popen2 on stdio and
provides nothing but blocking read and write.  No because you
can make it deadlock free by either providing non-blocking
read and write together with a select like function, or you
create threads, one for reading and one for writing.

I.e. glib's popen2 (with that looong name) looks like a solid
base.  Together with its io-queues and the select-based main
loop you have everything to get deadlock-free bidirectional
communication.


> I had in mind extending Steve's code (and may yet do it) in order to:
> [...]
> 2. Provide for a "peek" method which portably allows you to see whether a 
> read is going to block.

And how do you "peek" into stdio's input buffer?

Ciao, ET.

Reply | Threaded
Open this post in threaded view
|

Re: popen read and write?

gary ng
--- Edgar Toernig <[hidden email]> wrote:
> And how do you "peek" into stdio's input buffer?
> 
would io.stdin:read('*a') a way to peek ? As my
understanding is that '*a' is effectively
non-blocking(just return as much as there). Or may be
not if there is nothing to read(which would be
blocked) ? 

But that still don't solve the write() can be
blocking.

So I still think in lua, COPAS(effectively cooperative
multithreading as you said) is the natural way to have
deadlock free bi-directional communication for things
like popen2.


       
____________________________________________________________________________________
Boardwalk for $500? In 2007? Ha! Play Monopoly Here and Now (it's updated for today's economy) at Yahoo! Games.
http://get.games.yahoo.com/proddesc?gamekey=monopolyherenow  

Reply | Threaded
Open this post in threaded view
|

Re: popen read and write?

Ketmar Dark-2
hello, gary ng <[hidden email]>.

On Wed, 10 Oct 2007 21:16:48 -0700 (PDT)
gary ng <[hidden email]> wrote:

> would io.stdin:read('*a') a way to peek ? As my
> understanding is that '*a' is effectively
> non-blocking(just return as much as there).
"*a" is blocking until 'end of stream' reached, AFAIR.

Reply | Threaded
Open this post in threaded view
|

Re: popen read and write?

steve donovan
In reply to this post by Edgar Toernig
On 10/11/07, Edgar Toernig <[hidden email]> wrote:
> I.e. glib's popen2 (with that looong name) looks like a solid
> base.  Together with its io-queues and the select-based main
> loop you have everything to get deadlock-free bidirectional
> communication.

This is probably the only way one can do this and be completely safe.
One could make popen2 return a third value which is a control object,
but this is useless if you're blocking in stdio anyway.

I've used the GTK stuff successfully in a SciTE extension, and also
worked out the Windows equivalent: you spawn the process in a new
thread, and use SendMessage to post output back to the GUI thread.  I
can provide code if anybody is interested.

steve d.

Reply | Threaded
Open this post in threaded view
|

Re: popen read and write?

Duck-2
In reply to this post by Michal Kolodziejczyk-3

2. Provide for a "peek" method which portably allows
you to see whether a read is going to block.

And how do you "peek" into stdio's input buffer?

I meant a function to allow the caller of Steve's popen2() to set a timeout on reading data coming back from the subprocess. So I'm not sure why I'd need to do a non-blocking read on stdio to provide the incomplete but nevertheless useful "peeking" I had in mind. (I'm not sure how you tell whether a Windows pipe will block if you _write_ to it, if indeed you can.)

The idea is simply to permit a Lua program waiting for results coming back from a ubprocess to exit gracefully if the subprocess hangs, or if it itself unexpectedly blocks waiting for input (a "protocol mismatch", if you will).

This has been the most likely reason for a deadlock in my experience of using something like GAWK's "|&" (which is available on Unix only, implemented in the style of Steve's Linux code). The subprocess pops up an unexpected request for input (e.g. "Oops, do you want to continue" because you failed to make it entirely non-interactive), or simply blocks itself waiting for some other event of its own interest (e.g. waiting on /dev/random). This risk is worth taking for simple scripting, and well wroth taking if you can say "give up waiting for the subprocess after, say, 120 seconds."

As for glib's popen2()-flavoured solution -- anything which uses select() isn't portable to Windows.

The "listen" option in Netcat for Windows has a solution which IIRC is like the one Steve himself posed, using several threads and SendMessage() calls to proxy [?] data between a connected socket and a subprocess (inetd style). This code was, in its turn, lifted from a Windows port of rlogind, if I remember the comments correctly.