Patchless modification of Lua source code

classic Classic list List threaded Threaded
69 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

Re: Patchless modification of Lua source code

Sean Conner
It was thus said that the Annoying Philippe Verdy once stated:

> I can pretend that you reconfirm once again what I said, by confirming that
> your example is hard to find (if we cannot find and evaluate it, then your
> reply is justifying nothing at all, it just proves that you cannot justify
> your opposition by verifiable arguments you could have invented only to
> give a contradiction...)
>
> Yes there are programming languages that use "late binding" (i.e. where
> bionding is not solved at all imemdiately, allowing multiple names/symbols
> to coexist and remain unfiltered), but this does not prove that these
> languages are usable for anything, without finally using a final "linker"
> that will resolve the references to names using a well defined and
> predictable order with additional data (or environment) specifying the
> expected ordering rules.

  You know what?  You're right.  I'm wrong.  I'm outta here.

  -spc


Reply | Threaded
Open this post in threaded view
|

Re: Patchless modification of Lua source code

Philippe Verdy
In reply to this post by Sean Conner
Well this program proves what I said: it uses "late binding", then uses an undocumented "linking" step to try resolving the references:

- I don't know what it does really, but if it allows using any candidate, that it will enumerate all possible solutions, by explictly instanciating all the possible bindings, so this will still effectively a linking step, applied to each instance.

- If it just randomly select one candidate, then the results are unpredictable, and the language itself has NO practival use, it is fundamentally flawed, unsecure, non-portable at all: an implementation of the language will necessrily have to make arbitrary choices... and document them! This will create as many distinct "formal" languages as there are arbitrary choices, even if all of them use the same basic syntax. Such language is not formal, it is then actually a (possibly very large) family of languages with distinct applications working only for one of its instances.

Still we are back to the final linking step which is necessarily deterministic (otherwise it is not implementable at all an any deterministic computer or Turing machine, this informal language would be a "blackbox", an "oracle", we could refer by "only God knows").



Le mer. 21 nov. 2018 à 21:23, Sean Conner <[hidden email]> a écrit :
It was thus said that the Great Philippe Verdy once stated:
> I cannot even find any reference about INRAC on the net. All I find is the
> acronym and brand in France of the "Institut National pour la Retraite
> ACtive", a training institute for senior people.
>
> I find a reference in GitHub for the name of a module written in Perl for
> reverse engineering of RAC files used in the early 1980s by the RACTER
> program, used for chatting, an ancestor of IRC... But I don't know if you
> refer to this Perl module and its related language. This is an alpha
> module, not tested, not even documented.

  Congratulations!  RACTER was, in fact, written in INRAC and was
commercially available in tthe mid 80s [1][2].  The RAC files *is* the
program, and the language itself is non-deterministic in resolving names
which I gave as an example to back up the point Viacheslav Usov was making.

  -spc

[1]     It was even used in the writing of the book _The Policemans' Beard
        is Half Constructed_ (https://www.amazon.com/Policemans-Beard-Half-Constructed-Computer/dp/0446380512/)

[2]     The INRAC compiler was also available commercially, but is hard to
        find.

Reply | Threaded
Open this post in threaded view
|

Re: Patchless modification of Lua source code

Philipp Janda
In reply to this post by Luiz Henrique de Figueiredo
Am 21.11.18 um 12:12 schröbte Luiz Henrique de Figueiredo:
> [...] A library is not meant to be totally incorporated into the
> program; only modules that provide definitions for the program are
> incorporated. Again, this is reasonable and convenient, and works
> extremely well in practice. It has been like that for many decades.

The manual for GNU ld seems to support that. Extract from the
`-l/--library` option description:

>> The linker will search an archive only once, at the location where it
>> is specified on the command line. If the archive defines a symbol
>> which was undefined in some object which appeared before the archive
>> on the command line, the linker will include the appropriate file(s)
>> from the archive.

And a small test:


     $ cat a.c
     #include <stdio.h>

     void a(void) {
       puts("A");
     }


     $ cat b.c
     #include <stdio.h>

     void a(void) {
       puts("A");
     }

     void b(void) {
       puts("B");
     }


     $ cat main.c
     extern void a(void);
     extern void b(void);

     int main(void) {
       a();
       b();
       return 0;
     }


     $ gcc -c a.c

     $ gcc -c b.c

     $ ar r liba.a a.o

     $ ar r libb.a b.o

     $ gcc -o main main.c a.o b.o
     b.o: In function `a':
     b.c:(.text+0x0): multiple definition of `a'
     a.o:a.c:(.text+0x0): first defined here
     collect2: error: ld returned 1 exit status

     $ gcc -o main main.c b.o a.o
     a.o: In function `a':
     a.c:(.text+0x0): multiple definition of `a'
     b.o:b.c:(.text+0x0): first defined here
     collect2: error: ld returned 1 exit status

     $ gcc -o main main.c a.o -L. -lb
     ./libb.a(b.o): In function `a':
     b.c:(.text+0x0): multiple definition of `a'
     a.o:a.c:(.text+0x0): first defined here
     collect2: error: ld returned 1 exit status

     $ gcc -o main main.c b.o -L. -la


So it seems to work in practice on Linux using my current version of the
GNU linker if the original object file is in an archive, the modified
object file is listed before the archive on the command line, and the
modified object file exports at least all the same symbols as the
original object file.


Philipp



Reply | Threaded
Open this post in threaded view
|

Re: Patchless modification of Lua source code

Philippe Verdy
Le jeu. 22 nov. 2018 à 07:53, Philipp Janda <[hidden email]> a écrit :
Am 21.11.18 um 12:12 schröbte Luiz Henrique de Figueiredo:
>> The linker will search an archive only once, at the location where it
>> is specified on the command line. If the archive defines a symbol
>> which was undefined in some object which appeared before the archive
>> on the command line, the linker will include the appropriate file(s)
>> from the archive.
So it seems to work in practice on Linux using my current version of the
GNU linker if the original object file is in an archive, the modified
object file is listed before the archive on the command line, and the
modified object file exports at least all the same symbols as the
original object file.

This does not contradict what I said: having a priority order does not mean that a "past" library will necessarily be read again: the linker just needs to keep it (at least partly: the minimum is its table of symbols and the module in which it is defined). Then it still resolved the symbols from the set of "active" modules in order to determine the list of modules to keep, and assemble new ones until there remains no unresoved symbols, but any module that does not define any needed symbol for the assemblee can be dropped from the assemblee (only its list of external symbols is kept); if there remains unresolved symbols, the linker can only generate a partially linked module (containing all their exported symbols). In all cases, the priority order is preserved. But not all specifiied modules will be packed together in the assemblee, but only what is needed (and notably not the full set of unneeded modules that are packed in a library). The exception is when building a library (all modules are kept), or a shared library (only modules that contain exported symbols are kept along with their dependant modules definining non-exported symbols needed by other modules).
If the result of linking must be an executable or a shared library, the link will fail if there remains unresolved symbols, or what will be produced will be a file in a intermediate object format which is not executable, and contains the list of unresolved symbols (because such format may still allow later binding in a next pass of the linker)
Reply | Threaded
Open this post in threaded view
|

Re: Patchless modification of Lua source code

Sean Conner
In reply to this post by Viacheslav Usov

  Having consumed an inordiate amount of food earlier today [1] and
everything is fairly quiet and not much is going on, I thought I would
investigate this matter a bit further.  

  So without further ado ...

It was thus said that the Great Viacheslav Usov once stated:

> On Mon, Nov 19, 2018 at 10:10 AM Dirk Laurie <[hidden email]> wrote:
>
>
> > 1. All external names used by Lua are declared in header files.
> > 2. If such a name has been resolved by an object file or library
> > linked earlier, the linker will be satisfied and will not replace it.
> > 3. Any .o files needed by the linker will be built automatically from
> > .c files by the Makefile rules.
> >
>
> The standard has no concept of "linked earlier" and consequently no primacy
> of "names linked earlier", and the entire program, including all the
> libraries, shall have a function with a given name defined at most once.
> Per clause 4/2, "If a ‘‘shall’’ or ‘‘shall not’’ requirement [...] is
> violated, the behavior is undefined".

  I'm not convinced.  I'm quoting the C99 standard.  First:

        5.1.1.1:

        1 ...  Previously translated translation units may be preserved
          individually or in libraries....  Translation units may be
          separately translated and then later linked to produce an
          executable program.

  All this says is that compiled files (or "translation units" if you want)
can be saved as files, or stored in a "library" (which is not defined in the
standard, but is probably left undefined to give leeway to the
implementation) to avoid unnecessary compilation.

        6.2.2:

        2 In the set of translation units and libraries that constitutes an
          entire program, each declaration of a particular identifier with
          external linkage denotes the same object or function....

  So here we see that a program can consist of individual translation units
and "libraries".  This also says that each external declaration to a given
name will eventually resolve to the same object or function.  Now, onto the
point that appears to have the concept of "linked earlier":

        5.1.1.2:

        8 All external object and function references are resolved.  Library
          components are linked to satisfy external references to functions
          and objects not defined in the current translation.  All such
          translator output is collected into a program image which contains
          information needed for execution in its execution environment.

  I interpret this to mean, "if the given object files do not include an
object or function with external linkage, then said objects or functions can
be pulled from a "library".  So let's go over the critical step in the
procedure:

        make linux -e "LUA_O=lua.o myctype.o mylex.o"

  Going through the makefiles for Lua 5.3, I find that LUA_O is defined as:

        LUA_O=  lua.o
 
  Untangling the makefile (and I'll be following the Linux build since
that's what I'm using, but the others are similar) that when I do a

        % make linux

the build will make the targets defined in ALL_T, which is defined as LUA_A,
LUA_T and LUAC_T.  These are, respectively:

        LUA_A=  liblua.a
        LUA_T=  lua
        LUAC_T= luac

  These, in turn, depend upon

        lua      : lua.o
        luac     : luac.o
        liblua.a : <object files except for the two listed above>

  This eventually will turn into the command line:

        gcc -std=gnu99 -o lua lua.o liblua.a -lm -Wl,-E -ldl -lreadline -lncurses

  On Unix, files with a '.a' extension are a "library".  The "-l" option is
just a shortcut to specifying other libraries one may link against, and
expanding everything out results in:

        gcc -std=gnu99 -o lua lua.o liblua.a /usr/lib/libm.so -Wl,-E /usr/lib/libdl.so /usr/lib/libreadline.so /usr/lib/libncurses.so

(also on Unix, files with a '.so' extension are also a "library" but the
difference between the two is not important for this disussion)

  Try to compile lua without specifying liblua.a and you get complaints
about missing external references (the first one is lua_sethook() by the
way).  Adding back liblua.a and everything compiles, since the missing
functions are found in a library per 5.1.1.2/8.

  Now, to the command in question:

        % make linux -e "LUA_O=  lua.o myctype.o mylex.o"

  This will be (effectively) expanded out to:

        gcc -std=gnu99 -o lua lua.o myctype.o mylex.o liblua.a /usr/lib/libm.so -Wl,-E /usr/lib/libdl.so /usr/lib/libreadline.so /usr/lib/libncurses.so

The file 'mylex.c' will presumedly provide definitions for the following
functions:

        luaX_init()
        luaX_setinput()
        luaX_newstring()
        luaX_next()
        luaX_lookahead()
        luaX_syntaxerror()
        luaX_token2str()

The three "translation units" provide a number of external functions---the
missing ones will therefore be pulled in from liblua.a, so in this case,
yes, there *IS* a form of the concept of "linked earlier".  The key sentence
from 5.1.1.2/8 is:

        Library components are linked to satisfy external references to
        functions and objects not defined in the current translation.

  I would claim that lua.o, myctype.o and mylex.o constitude "the current
translation(s)" and that the functions in mylex.c will be used and the ones
that exist in the liblua.a "library" will not be, given the order in the key
sentance above.

> Note that what you do may fail in weird ways even with toolchains that do
> have the notion of "linked earlier" - precisely because something has been
> linked earlier. Your application may be using the "modified" functions,
> while Lua itself and the other libraries use the "original" functions. This
> would be a special case of generally undefined behaviour.
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  Funny you mention that---I checked the C99 standard and found *nothing*
about this.  It's not unspecified, it's not undefined, it's not
implementation defined, its' not locale specific, *nothing*.  So yes, I
think this might be charitably noted as "generally undefined behavior" but
the C99 standard says nothing else about it.

  But given the wording of 5.1.1.2/8, I don't think your concern is
valid---translations units are searched prior to libraries for external
references so any reference to a function X found in a translation unit,
regardless of where it's used (in the current translation units or a
library), will be used.

  -spc (If at this point you don't think this interpretation holds, I'd like to
        hear the argument against it)

[1] Okay, technically yesterday by the time this is going out, but I
        still consider it the same day.  Also, today is Thanksgiving Day in
        the United States of America, a holiday [2] typically celebrated by
        eating a large meal with friends and family and falling asleep
        shortly thereafter.

[2] It's this weird quasi-religious/secular holiday were the purpose is
        to give thanks for what you have to whatever deity or non-deity of
        your (singular or plural) choice.

Reply | Threaded
Open this post in threaded view
|

Re: Patchless modification of Lua source code

pocomane
In reply to this post by Viacheslav Usov
At my undestanding this is not related to the C standard

On Wed, Nov 21, 2018 at 1:45 PM Viacheslav Usov <[hidden email]> wrote:
> This is not true. For example, in C89, clause 2.1.1.2 Translation phases, number 8:
>
> All external object and function references are resolved. Library components are linked to satisfy external references to functions and objects not defined in the current translation.

Infact, there are 2 translation units in the poposed scenario: one for
the executable, one for the shared library. In both, each symbol is
defined just once.

That references are resolved when the program is loaded in memory for
the execution. So the C compiler/linker/whatever is NOT in charge to
resolve the references. On linux machine, ld-linux-blabla.so is in
charge, i.e the the os dynamic linker/loader.

Now, I am not saying that the proposed procedure is a good pratice,
neither I am saying it is not. For sure it is not portable... I
mean... some platforms may even not have the support for the shared
libraries!

Reply | Threaded
Open this post in threaded view
|

Re: Patchless modification of Lua source code

Viacheslav Usov
In reply to this post by Sean Conner

On Fri, Nov 23, 2018 at 6:48 AM Sean Conner <[hidden email]> wrote:

> I interpret this to mean, "if the given object files do not include an object or function with external linkage, then said objects or functions can be pulled from a "library".

I cannot see how you reach this conclusion and this is exactly what the rest of your argument hinges on. I'd say this also exactly where the fundamental disagreement is.

5.1.1.2 #8 and 6.2.2 #2 that you quoted do not say that libraries are only consulted if something was not resolved in some "primary" translation units, and, as I said earlier, no such primacy is assigned to any translation units by the standard.

Nor do they say anything like "parts of libraries". An "entire program" includes full libraries. And here, you dismissed an important distinction between static libraries and shared libraries. Static libraries (archives) are typically just loose collections of object files not linked together. So (we kiss good-buy to portability and conformance at this point) it kind of makes sense to say that one of those object files can be pulled back from the archive "on demand" and linked to the program without pulling in other object files, so those other object files are kind of not really part of the program, so whatever external symbols they have is irrelevant. This behaviour is exactly what you and the other advocates of patchless patching exploit - where it works, and I have not seen any attempt to demonstrate that it works except with the GNU linker.

However, the whole point of patchless patching is about shared libraries, because it is not that difficult to modify a static library and link your application against this modified library. Shared libraries are not loose collections of object files. They are pre-linked executables and they most certainly bring with them all of their external symbols into the program. I explained how this could result in complications previously and won't repeat that. As far as I can see, no one in this entire thread has come up with a show case of patchless patching for a shared library, while claims like "the difference between [shared and static libraries] is not important for this disussion [sic]" have been made.

> Funny you mention that---I checked the C99 standard and found *nothing* about this.  It's not unspecified, it's not undefined, it's not implementation defined, its' not locale specific, *nothing*.

The very first message of mine in this thread explained how having multiple definitions of external linkage identifiers in an "entire program" is undefined behaviour, quoting the standard. Given your interpretation as to what an "entire program" is, you might indeed have difficulty seeing that.

Cheers,
V.
Reply | Threaded
Open this post in threaded view
|

Re: Patchless modification of Lua source code

Sean Conner
It was thus said that the Great Viacheslav Usov once stated:

> On Fri, Nov 23, 2018 at 6:48 AM Sean Conner <[hidden email]> wrote:
>
> > I interpret this to mean, "if the given object files do not include an
> object or function with external linkage, then said objects or functions
> can be pulled from a "library".
>
> I cannot see how you reach this conclusion and this is exactly what the
> rest of your argument hinges on. I'd say this also exactly where the
> fundamental disagreement is.
>
> 5.1.1.2 #8 and 6.2.2 #2 that you quoted do not say that libraries are only
> consulted if something was not resolved in some "primary" translation
> units, and, as I said earlier, no such primacy is assigned to any
> translation units by the standard.

  Again, the key sentence from 5.1.1.2 #8:

        Library components

That is, translations units pre-compiled and stored in a library.  The C
standard (as far as I can see) does not define "library", but from the
language I've read, it can store translation units and these translation
units can be referenced at a later time.

        are linked

Again, "linked" (and "linkage" and "linking") are not defined by the C
standard, but the implication is that different translation units are
somehow combined so that all external references are satisfied.

        to satisfy external references to functions and objects

As in right here.

        not defined in the current translation.

And I think this is where we have our differenes.  If the "current
translation" does not contain function foo(), *then* such a function is
looked for in any given libraries [1].  I ask, what does "not defined in the
current translation" mean to you?

> Nor do they say anything like "parts of libraries".

  The first two words of 5.1.1.2 #8---"Library components".  I looked up
"component" in the Oxford English Dictionary and I found:

        A constituent element or part

> An "entire program" includes full libraries.

    Citation please.  The phrase "entire program" appears in 6.2.2#2:

        In the set of translation units and libraries that constitutes an
        entire program ...

I could not find the phrase "full library" (or variations) in the C99
Standard.

> And here, you dismissed an important distinction
> between static libraries and shared libraries. Static libraries (archives)
> are typically just loose collections of object files not linked together.
> So (we kiss good-buy to portability and conformance at this point) it kind
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  Citation please, from the C standard (any one of C89, C99, or C11).  It's
odd to think of GNU C as not being conformant.  Or any of the commercial
compilers (don't worry, I'll address one of those in a bit).

> of makes sense to say that one of those object files can be pulled back
> from the archive "on demand" and linked to the program without pulling in
> other object files, so those other object files are kind of not really part
> of the program, so whatever external symbols they have is irrelevant. This
> behaviour is exactly what you and the other advocates of patchless patching
> exploit - where it works, and I have not seen any attempt to demonstrate
> that it works except with the GNU linker.

  I did some experiments last night on this very subject.  At first, I used
Linux but curious, I went and did the same experiments on Solaris, so I'll
use the results from that.  On the Solaris system in question, I used the
Sun Compiler Suite, so no GNU tools.  And just to make sure:

        % cc -V ; ld -V
        cc: Sun C 5.12 SunOS_sparc 2011/11/16
        ld: Software Generation Utilities - Solaris Link Editors: 5.10-1.1512

(and again, it would be odd to think of the Sun compiler as not being
conformant)

  The experiment is a very small program---the function main() calls
func1(), which calls func2().  Each function prints it's been called and a
sample output would look like:

        Hello from main
        Hello from func1
        Hello from func2

Experiment 1---main() is in on translation unit; func1() and func2() are in
another translation unit, which is stored in a library.  Compile and run:

        % cc -c -o main.o main.c

main() is compiled.

        % cc -c -o func.o func.c
        % ar rv libfuncall.a func.o
        a - func.o
        ar: creating libfuncall.a
        ar: writing libfuncall.a

func1() and func2() in the same translation unit is compiled into a library.

        % cc-o main1 main.o libfuncall.a

The two are linked.

        % ./main1
        Hello from main
        Hello from func1
        Hello from func2

And the output.  This is the status quo.  Next up, the "patchless patching
exploit" (to use your terms)---we have our own version of func2() which,
when called, will print "Hello from myfunc2":

        % cc -c -o main.o main.c
        % cc -c -o myfunc2.o myfunc2.c
        % cc -c -o func.o func.c
        % ar rv libfuncall.a func.o
        a - func.o
        ar: creating libfuncall.a
        ar: writing libfuncall.a
        % cc -o main2 main.o myfunc2.o libfuncall.a
        ld: fatal: symbol 'func2' is multiply-defined:
                (file myfunc2.o type=FUNC; file libfuncall.a(func.o) type=FUNC);
        ld: fatal: file processing errors. No output written to main2

And this result does back your position---that we have two definitions of
the external function func2() (and for the record, I got this result on
Linux as well).  Before I interpret this result, let's go on to experiments
three and four.  In these two cases, func1() and func2() are in separate
translation units and both are stored in a library.  Experiment 3---the base
line:

        % cc -c -o main.o main.c
        % cc -c -o func1.o func1.c
        % cc -c -o func2.o func2.c
        % ar rv libfunc.a func1.o func2.o
        a - func1.o
        a - func2.o
        ar: creating libfunc.a
        ar: writing libfunc.a
        % cc  -o main3 main.o libfunc.a
        % ./main3
        Hello from main
        Hello from func1
        Hello from func2

  Nothing unusual here.  Now, onto the "patchless patching exploit":

        % cc -c -o main.o main.c
        % cc -c -o myfunc2.o myfunc2.c
        % cc -c -o func1.o func1.c
        % cc -c -o func2.o func2.c
        % ar rv libfunc.a func1.o func2.o
        a - func1.o
        a - func2.o
        ar: creating libfunc.a
        ar: writing libfunc.a
        % cc -o main4 main.o myfunc2.o libfunc.a

  That's odd---it compiled!  

        % ./main4
        Hello from main
        Hello from func1
        Hello from myfunc2

  And ran!

  So, how do I interpret these results?  In experiment 2, the translation
unit main had an unsatisfied external function, func1().  func1() is not
defined in the translation unit myfunc2, so we then examine the external
functions stored in the library libfuncall.a.  There is a component of
libfuncall.a that has the external function func1(), but said component
*also* has external function func2().  When it comes time to link the
translation unit myfunc2, it also had an external function func2() and thus,
an error, because 6.9#5 states:

        ... somewhere in the entire program there shall be exactly one
        external definition for the identifier; otherwise, there shall be no
        more than one.

  But now we get to experiment 4.  This time, the translation unit main had
an unsatisfied external function func1().  func1() is not defined in the
translation unit myfunc2, so we then examine the external functions stored
in the library libfunc.a.  There is a component of libfunc.a that has the
external function func1(), so that component is pulled in.  That component
has an unsastified external function func2().  Said external function is not
defined in translation unit main, but it *is* defined in translation unit
myfunc2.  There are no more unresolved external functions (or objects for
that matter) so we get the final program that works as expected (and again,
for the record, I got the same result on Linux).

  In fact, these results seem (in my opinion) to be consistent with the
langauge in the C99 standard.  I would also conjecture that you will find
the same results for static compilation across all C compilers.

  I will conceed that there might exist a C compiler out there that does not
conform to these behaviors, but it would be as rare as coming across a C
compiler for a sign-magnitude or 1's-complement system [2].

> However, the whole point of patchless patching is about shared libraries,
> because it is not that difficult to modify a static library and link your
> application against this modified library. Shared libraries are not loose
> collections of object files. They are pre-linked executables

  They are not pre-linked executables.  Well, mostly.  I know you can
execute libc under Linux:

        [spc]lucy:/lib>./libc.so.6
        GNU C Library stable release version 2.3.4, by Roland McGrath et al.
        Copyright (C) 2005 Free Software Foundation, Inc.
        This is free software; see the source for copying conditions.
        There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
        PARTICULAR PURPOSE.
        Compiled by GNU CC version 3.4.6 20060404 (Red Hat 3.4.6-11).
        Compiled on a Linux 2.4.20 system on 2010-04-18.
        Available extensions:
                GNU libio by Per Bothner
                crypt add-on version 2.1 by Michael Glad and others
                linuxthreads-0.10 by Xavier Leroy
                The C stubs add-on version 2.1.2.
                BIND-8.2.3-T5B
                NIS(YP)/NIS+ NSS modules 0.19 by Thorsten Kukuk
                Glibc-2.0 compatibility add-on by Cristian Gafton
                GNU Libidn by Simon Josefsson
                libthread_db work sponsored by Alpha Processor Inc
        Thread-local storage support included.
        For bug reporting instructions, please see:
        <http://www.gnu.org/software/libc/bugs.html>.

but that's actually rare.  If you try some other shared object (Linux):

        [spc]lucy:/usr/lib>./libcrypt.so
        Segmentation fault

or Solaris:

        [spc]sol:/usr/lib>./libcrypt.so
        Illegal Instruction

So I politely disagree with them being "pre-linked executables."

> and they most
> certainly bring with them all of their external symbols into the program. I
> explained how this could result in complications previously and won't
> repeat that. As far as I can see, no one in this entire thread has come up
> with a show case of patchless patching for a shared library, while claims
> like "the difference between [shared and static libraries] is not important
> for this disussion [sic]" have been made.

  Let me rectify that then.  The same four experiments as above, in that
order, but this time with shared libraries.  Again. on Solaris:

        % cc -V ; ld -V
        cc: Sun C 5.12 SunOS_sparc 2011/11/16
        ld: Software Generation Utilities - Solaris Link Editors: 5.10-1.1512

Experment 1:

        % cc -shared -xcode=pic32 -c -o func.ss func.c
        % cc -shared -o libfuncall.so func.ss
        % cc -Wl,-R/lusr/home/spc/foo -o smain1 main.o -L/lusr/home/spc/foo -lfuncall
        % ldd ./smain1
                libfuncall.so =>         ./libfuncall.so
                libc.so.1 =>     /usr/lib/libc.so.1
                libm.so.2 =>     /usr/lib/libm.so.2
                /platform/SUNW,Sun-Fire-T1000/lib/libc_psr.so.1

  This is to ensure we have an executable that loads our library at runtime.
It is, so let's run it:

        % ./smain1
        Hello from main
        Hello from func1
        Hello from func2

  Now experiemnt two, the "patchless patching exploit."

        % cc -c -o myfunc2.o myfunc2.c
        % cc -Wl,-R/lusr/home/spc/foo -o smain2 main.o myfunc2.o -L/lusr/home/spc/foo -lfuncall
        % ldd ./smain2
                libfuncall.so =>         ./libfuncall.so
                libc.so.1 =>     /usr/lib/libc.so.1
                libm.so.2 =>     /usr/lib/libm.so.2
                /platform/SUNW,Sun-Fire-T1000/lib/libc_psr.so.1

  It compiled, unlike the second experiment with static libraries.  But
let's see how it runs:

        % ./smain2
        Hello from main
        Hello from func1
        Hello from myfunc2

  Wow!  It worked!  Even with func1() and func2() in the same translation
unit, func1() is calling func2() from translation unit myfunc2.  And it's
not Linux! (for the record, it worked under Linux).  And just to be
complete, experiments three and four.  I won't comment much on these as they
too, work as the static version (even on Linux):

  Experiment 3:

        % cc -shared -xcode=pic32 -c -o func1.ss func1.c
        % cc -shared -xcode=pic32 -c -o func2.ss func2.c
        % cc -shared -o libfunc.so func1.ss func2.ss
        % cc -Wl,-R/lusr/home/spc/foo -o smain3 main.o -L/lusr/home/spc/foo -lfunc
        % ldd ./smain3
                libfunc.so =>    ./libfunc.so
                libc.so.1 =>     /usr/lib/libc.so.1
                libm.so.2 =>     /usr/lib/libm.so.2
                /platform/SUNW,Sun-Fire-T1000/lib/libc_psr.so.1
        % ./smain3
        Hello from main
        Hello from func1
        Hello from func2

  Experiment 4:

        % cc -Wl,-R/lusr/home/spc/foo -o smain4 main.o myfunc2.o -L/lusr/home/spc/foo -lfunc
        % ldd ./smain4
                libfunc.so =>    ./libfunc.so
                libc.so.1 =>     /usr/lib/libc.so.1
                libm.so.2 =>     /usr/lib/libm.so.2
                /platform/SUNW,Sun-Fire-T1000/lib/libc_psr.so.1
        % ./smain4
        Hello from main
        Hello from func1
        Hello from myfunc2

> > Funny you mention that---I checked the C99 standard and found *nothing*
> about this.  It's not unspecified, it's not undefined, it's not
> implementation defined, its' not locale specific, *nothing*.
>
> The very first message of mine in this thread explained how having multiple
> definitions of external linkage identifiers in an "entire program" is
> undefined behaviour, quoting the standard.

  That was 6.9#5, which I quoted a portion of, but here's the full quote:

        5 An external definition is an external declaration that is also a
          definition of a function (other than an inline definition) or an
          object. If an identifier declared with external linkage is used in
          an expression (other than as part of the operand of a sizeof
          operator whose result is an integer constant), somewhere in the
          entire program there shall be exactly one external definition for
          the identifier; otherwise, there shall be no more than one.

  I'm not reading "undefined behavior" there, I see "error" there.  Annex J
of the C99 standard lists all the unspecified, undefined,
implementation-defined and locale-specific behaviors.  Nowhere is this
addresses.

  Look, I recognise you don't like this and think it's a violation of the C
Standard.  I don't see it as a violation of the C Standard, but I'll grant
that is may be an unusual interpetation of the C Standard.

> Given your interpretation as to
> what an "entire program" is, you might indeed have difficulty seeing that.
>
> Cheers,
> V.

  -spc (So did I use two non-comformant compilers for this experiment then?)

[1] In every C compiler I've used over the past 30 years, there is no
        need to specify the Standard C library.  There have been options to
        tell the compiler *not* to reference the Standard C library, but
        again, the C Standard is silent on that point.

[2] They exist, but are so rare that I would be surprised if anyone on
        this list has used such a system.

Reply | Threaded
Open this post in threaded view
|

Re: Patchless modification of Lua source code

Philipp Janda
In reply to this post by Sean Conner
Am 23.11.18 um 06:47 schröbte Sean Conner:

>
> [...]
>
> 5.1.1.2:
>
> 8 All external object and function references are resolved.  Library
>  components are linked to satisfy external references to functions
>  and objects not defined in the current translation.  All such
>  translator output is collected into a program image which contains
>  information needed for execution in its execution environment.
>
>    I interpret this to mean, "if the given object files do not include an
> object or function with external linkage, then said objects or functions can
> be pulled from a "library".

Which library if there were multiple ones containing the same external
definition?

I interpret it differently. I think "translation" still means the
translation of a single translation unit (i.e. a single object file) as
in all other translation phases before. So you resolve as many undefined
external references in the object file as you can using libraries, and
the remaining ones are resolved from other compiled translation units
when "all such translator output is collected into a program image".

Anyway, the effect is exactly the same as with your interpretation as
long as "In the set of translation units and libraries that constitutes
an entire program, each declaration of a particular identifier with
external linkage denotes the same object or function" (6.2.2) and
"somewhere in the entire program there shall be exactly one external
definition for the identifier; otherwise, there shall be no more than
one" (6.9.5).


Btw., GNU ld is in violation of your interpretation of the C standard as
I have shown in one of my examples earlier (details here[99]):

>>     $ gcc -o main main.c a.o -L. -lb
>>     ./libb.a(b.o): In function `a':
>>     b.c:(.text+0x0): multiple definition of `a'
>>     a.o:a.c:(.text+0x0): first defined here
>>     collect2: error: ld returned 1 exit status

Function `a()` shouldn't have been pulled from `libb.a` because `a.o`
already contains an external definition for that identifier.

GNU ld doesn't pull individual functions or objects from the library. It
just filters out those object files in the library that it thinks it
doesn't need.

>
> [...]
>
>    -spc (If at this point you don't think this interpretation holds, I'd like to
> hear the argument against it)
>

Philipp

   [99]:  http://lua-users.org/lists/lua-l/2018-11/msg00258.html



Reply | Threaded
Open this post in threaded view
|

Re: Patchless modification of Lua source code

Philipp Janda
In reply to this post by Sean Conner
Am 24.11.18 um 00:28 schröbte Sean Conner:

> It was thus said that the Great Viacheslav Usov once stated:
>>
>> The very first message of mine in this thread explained how having multiple
>> definitions of external linkage identifiers in an "entire program" is
>> undefined behaviour, quoting the standard.
>
>    That was 6.9#5, which I quoted a portion of, but here's the full quote:
>
> 5 An external definition is an external declaration that is also a
>  definition of a function (other than an inline definition) or an
>  object. If an identifier declared with external linkage is used in
>  an expression (other than as part of the operand of a sizeof
>  operator whose result is an integer constant), somewhere in the
>  entire program there shall be exactly one external definition for
>  the identifier; otherwise, there shall be no more than one.
>
>    I'm not reading "undefined behavior" there, I see "error" there.  Annex J
> of the C99 standard lists all the unspecified, undefined,
> implementation-defined and locale-specific behaviors.  Nowhere is this
> addresses.

This one is easy. Very first bullet point in Annex J.2.1:
        J.2 Undefined behavior
        1 The behavior is undefined in the following circumstances:
        — A ‘‘shall’’ or ‘‘shall not’’ requirement that appears outside
        of a constraint is violated (clause 4).

Which refers to 4.2 in the normative part of the standard:

        2 If a ‘‘shall’’ or ‘‘shall not’’ requirement that appears
        outside of a constraint is violated, the behavior is undefined.
        Undefined behavior is otherwise indicated in this International
        Standard by the words ‘‘undefined behavior’’ or by the omission
        of any explicit definition of behavior. There is no difference
        in emphasis among these three; they all describe ‘‘behavior that
        is undefined’’.

>
>    -spc (So did I use two non-comformant compilers for this experiment then?)
>

Philipp




Reply | Threaded
Open this post in threaded view
|

Re: Patchless modification of Lua source code

Sean Conner
It was thus said that the Great Philipp Janda once stated:

> Am 24.11.18 um 00:28 schröbte Sean Conner:
> >It was thus said that the Great Viacheslav Usov once stated:
> >>
> >> The very first message of mine in this thread explained how having
> >> multiple definitions of external linkage identifiers in an "entire
> >> program" is undefined behaviour, quoting the standard.
> >
> >   That was 6.9#5, which I quoted a portion of, but here's the full quote:
> >
> > 5 An external definition is an external declaration that is also a
> >  definition of a function (other than an inline definition) or an
> >  object. If an identifier declared with external linkage is used in
> >  an expression (other than as part of the operand of a sizeof
> >  operator whose result is an integer constant), somewhere in the
> >  entire program there shall be exactly one external definition for
> >  the identifier; otherwise, there shall be no more than one.
> >
> >   I'm not reading "undefined behavior" there, I see "error" there. Annex
> > J of the C99 standard lists all the unspecified, undefined,
> > implementation-defined and locale-specific behaviors.  Nowhere is this
> > addresses.
>
> This one is easy. Very first bullet point in Annex J.2.1:
> J.2 Undefined behavior
>
> 1 The behavior is undefined in the following circumstances: — A
> ‘‘shall’’ or ‘‘shall not’’ requirement that appears outside of a
> constraint is violated (clause 4).
>
> Which refers to 4.2 in the normative part of the standard:
>
> 2 If a ‘‘shall’’ or ‘‘shall not’’ requirement that appears outside
> of a constraint is violated, the behavior is undefined. Undefined
> behavior is otherwise indicated in this International Standard by
> the words ‘‘undefined behavior’’ or by the omission of any explicit
> definition of behavior. There is no difference in emphasis among
> these three; they all describe ‘‘behavior that is undefined’’.

  Fair enough, I stand corrected on that point.

> >   -spc (So did I use two non-comformant compilers for this experiment
> >   then?)

> Philipp

  No answer to my question though?

  -spc


Reply | Threaded
Open this post in threaded view
|

Re: Patchless modification of Lua source code

Sean Conner
In reply to this post by Philipp Janda
It was thus said that the Great Philipp Janda once stated:

> Am 23.11.18 um 06:47 schröbte Sean Conner:
> >
> >[...]
> >
> > 5.1.1.2:
> >
> > 8 All external object and function references are resolved.  Library
> >  components are linked to satisfy external references to functions
> >  and objects not defined in the current translation.  All such
> >  translator output is collected into a program image which contains
> >  information needed for execution in its execution environment.
> >
> >   I interpret this to mean, "if the given object files do not include an
> > object or function with external linkage, then said objects or functions
> > can be pulled from a "library".
>
> Which library if there were multiple ones containing the same external
> definition?

  I don't think that's covered in the C Standard.  I know that the C tool
chains I've used over the years have all defaulted to "seach each library,
in order specified on the command line" which seems a reasonable thing to
do.

> I interpret it differently. I think "translation" still means the
> translation of a single translation unit (i.e. a single object file) as in
> all other translation phases before. So you resolve as many undefined
> external references in the object file as you can using libraries, and the
> remaining ones are resolved from other compiled translation units when
> "all such translator output is collected into a program image".

  Okay, given this example:

        % cc -o program main.o other.o -lsomelibrary

  So, are you saying that during the linking phase, main.o is scanned first,
and any unresolved external references are scanned for in somelibrary to be
resolved, and then we move on to other.o and repeat the process?

  Huh ... I suppose that is a valid interpretation as well, even if I
haven't encountered a C compiler that does that (at least eight if I
recalled them all correctly).

> Btw., GNU ld is in violation of your interpretation of the C standard as
> I have shown in one of my examples earlier (details here[99]):

  In my other email I covered this with both static and dynamic libraries.
It was only in the case of a single translation unit containing all
functions where the link failed, which I think you would take as support for
your interpretation.  But the link succeeded when each function was in its
own translation unit in the library, which supports my interpretation (for
the two different tool chains I used).

  So who is correct here?

> Function `a()` shouldn't have been pulled from `libb.a` because `a.o`
> already contains an external definition for that identifier.
>
> GNU ld doesn't pull individual functions or objects from the library. It
> just filters out those object files in the library that it thinks it
> doesn't need.

  I think most linkers work this way.

  -spc

>   [99]:  http://lua-users.org/lists/lua-l/2018-11/msg00258.html

Reply | Threaded
Open this post in threaded view
|

Re: Patchless modification of Lua source code

Philipp Janda
In reply to this post by Sean Conner
Am 24.11.18 um 03:19 schröbte Sean Conner:
>
>    No answer to my question though?

I'm not sure which question you mean, so I'll try to answer all
questions I can find in your last message.


> I ask, what does "not defined in the
> current translation" mean to you?

Probably the same as you, with the exception that I think "current
translation" refers to a single translation unit as in the descriptions
of the other translation phases.
But then we do agree on the fact that unresolved external symbols may be
looked up in a library. What we don't agree on is what is supposed to
happen if a library contains an external symbol that already is defined
somewhere else (either in a standalone translation unit or in a library).

> So, how do I interpret these results?

You answered this one yourself.

> So did I use two non-comformant compilers for this experiment
> then?

Your compilers are probably fine. As soon as undefined behavior is
involved (which I claim is the case if you link multiple external
definitions), a compiler can do almost anything, and that includes
ignoring some symbols from libraries if there are definitions for those
symbols in other translation units.

>
>    -spc

Philipp



Reply | Threaded
Open this post in threaded view
|

Re: Patchless modification of Lua source code

Andrew Gierth
In reply to this post by Sean Conner
>>>>> "Sean" == Sean Conner <[hidden email]> writes:

 Sean>   -spc (So did I use two non-comformant compilers for this
 Sean>   experiment then?)

If you invoke undefined behavior, then the compiler is allowed to do
anything, where "anything" includes things like:

 - the compiler did exactly what you expected
 - the compiler did exactly what you expected, but only because of
   defaults you don't know about, and changing those defaults to make
   something else work will cause this to fail
 - the compiler did something close enough to what you expected to
   fool you into thinking it worked, but in fact it inserted code
   to make it fail on alternate full moons
 - the compiler made the thing you were trying to do work as you
   expected, but subtly broke something somewhere else
 - error messages
 - nasal demons

 Sean> The experiment is a very small program---the function main()
 Sean> calls func1(), which calls func2(). Each function prints it's
 Sean> been called and a sample output would look like:

 Sean> Hello from main
 Sean> Hello from func1
 Sean> Hello from func2

 [snip various experiments]

 Sean>   Now experiemnt two, the "patchless patching exploit."

 Sean> % cc -c -o myfunc2.o myfunc2.c
 Sean> % cc -Wl,-R/lusr/home/spc/foo -o smain2 main.o myfunc2.o -L/lusr/home/spc/foo -lfuncall
 Sean> % ldd ./smain2
 Sean>        libfuncall.so =>         ./libfuncall.so
 Sean>        libc.so.1 =>     /usr/lib/libc.so.1
 Sean>        libm.so.2 =>     /usr/lib/libm.so.2
 Sean>        /platform/SUNW,Sun-Fire-T1000/lib/libc_psr.so.1

 Sean>   It compiled, unlike the second experiment with static
 Sean> libraries. But let's see how it runs:

 Sean> % ./smain2
 Sean> Hello from main
 Sean> Hello from func1
 Sean> Hello from myfunc2

 Sean> Wow! It worked! Even with func1() and func2() in the same
 Sean> translation unit, func1() is calling func2() from translation
 Sean> unit myfunc2. And it's not Linux! (for the record, it worked
 Sean> under Linux).

It's worth noting here that linux and Solaris have very similar shared
library implementations, because the Solaris model was the one adopted
by GNU ld. However, other models do exist (for example the Windows
model), and not all of them will behave this way.

Furthermore, the ability to override functions in libraries this way is
often seen as a bug rather than a feature, especially when doing dynamic
loading. Accordingly, there are ways to change it: either by linking the
shared library with -Bsymbolic, or providing a link map, symbol version
file, exported symbol list, or explicit declaration that hides func2()
in the library build.

(now go and look at the definition of LUAI_FUNC in luaconf.h and
consider what that means for your approach)

--
Andrew.

Reply | Threaded
Open this post in threaded view
|

Re: Patchless modification of Lua source code

Philipp Janda
In reply to this post by Sean Conner
Am 24.11.18 um 03:36 schröbte Sean Conner:

> It was thus said that the Great Philipp Janda once stated:
>> Am 23.11.18 um 06:47 schröbte Sean Conner:
>>>
>>> [...]
>>>
>>> 5.1.1.2:
>>>
>>> 8 All external object and function references are resolved.  Library
>>>  components are linked to satisfy external references to functions
>>>  and objects not defined in the current translation.  All such
>>>  translator output is collected into a program image which contains
>>>  information needed for execution in its execution environment.
>>>
>>>    I interpret this to mean, "if the given object files do not include an
>>> object or function with external linkage, then said objects or functions
>>> can be pulled from a "library".
>>
>> Which library if there were multiple ones containing the same external
>> definition?
>
>    I don't think that's covered in the C Standard.

Well, the C standard does (un-)define it for an entire program (the set
of translation units and libraries). You are the one who wants to have
certain parts of those libraries excluded from the entire program based
on a linking order.

> I know that the C tool
> chains I've used over the years have all defaulted to "seach each library,
> in order specified on the command line" which seems a reasonable thing to
> do.

This order is actually required by POSIX[88], so most UNIX compilers
will behave this way. One exception is apparently clang[89]. But,
AFAICS, unfortunately even POSIX does not define whether multiple
definitions are allowed or not.

>
> [...]
> >    -spc

Philipp


   [88]:
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/c99.html#tag_20_11_04
   [89]:  https://lld.llvm.org/NewLLD.html#key-concepts



Reply | Threaded
Open this post in threaded view
|

Re: Patchless modification of Lua source code

Sean Conner
It was thus said that the Great Philipp Janda once stated:
> Am 24.11.18 um 03:36 schröbte Sean Conner:
> >It was thus said that the Great Philipp Janda once stated:
> >>Am 23.11.18 um 06:47 schröbte Sean Conner:
> >>>
>
> Well, the C standard does (un-)define it for an entire program (the set
> of translation units and libraries). You are the one who wants to have
> certain parts of those libraries excluded from the entire program based
> on a linking order.

  I'm not alone in this, Dirk Laurie (who started this thread) mentioned a
techique originally given by Luiz Henrique de Figueiredo (one of the
maintainers of Lua).  So I feel like I'm in good company here.

> >I know that the C tool
> >chains I've used over the years have all defaulted to "seach each library,
> >in order specified on the command line" which seems a reasonable thing to
> >do.
>
> This order is actually required by POSIX[88], so most UNIX compilers
> will behave this way. One exception is apparently clang[89]. But,
> AFAICS, unfortunately even POSIX does not define whether multiple
> definitions are allowed or not.

>   [88]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/c99.html#tag_20_11_04
>   [89]: https://lld.llvm.org/NewLLD.html#key-concepts

  Thank you for the references.

  -spc (Who will readily admit to writing C code assuming it will run on
        byte-oriented, 2's complement machines ... )


Reply | Threaded
Open this post in threaded view
|

Re: Patchless modification of Lua source code

Viacheslav Usov
In reply to this post by Sean Conner
On Sat, Nov 24, 2018 at 12:28 AM Sean Conner <[hidden email]> wrote:

> And I think this is where we have our differenes.  If the "current translation" does not contain function foo(), *then* such a function is looked for in any given libraries [1].

Correct. As long as there is exactly one definition of foo() in the entire program, this is well-defined.

Having bar() defined in the "current translation" and some other library would cause undefined behaviour, where your reasoning would no longer apply.

>  Again, the key sentence from 5.1.1.2 #8:  Library components

This is not where an "entire program" is referenced.

> I could not find the phrase "full library" (or variations) in the C99 Standard.

I used the word "full" to emphasise that libraries are not considered piece-wise by the standard when it talks about an "entire program".

> Citation please, from the C standard (any one of C89, C99, or C11).

Given that neither the notion of static vs shared libraries, not their organisation, is addressed by the standard, any discussion at this level is not portable, and any use of implementation details, in situations that cause undefined behaviour in the standard, even if it is consistent within the Implementation, is non-conformant. I stress the "use" not the "details".

> It's odd to think of GNU C as not being conformant.

I did not say GNU C was non-coformant (even though it may be). As I explained in earlier messages in this thread, it is the "program", not the "tool", that is non-conformant if the one definition requirement is violated.

> I did some experiments last night on this very subject. 

And, as far as I can tell, they were consistent with my description, wherein the linker pulls previously archived object files from a static library, ignoring other object files whenever it can. Which is perfectly fine when there is no more than one definition of any external symbol in the entire program, and is a trivially acceptable form of undefined behaviour otherwise.

> At first, I used Linux but curious, I went and did the same experiments on Solaris

As mentioned by others, Linux and Solaris have a common Unix heritage and adhere to other common standards, so one should expect similar behaviours. So as a demonstration of portability, this is quite weak.

We have an elephant in the room here, and let me just name it: Windows. Once somebody demonstrates similar techniques on Windows, we could talk about real-world portability.

> They are not pre-linked executables.  Well, mostly.  I know you can execute libc under Linux:

To execute means more than just run something like a program in a shell. And an executable means more than the main executable program file. Any, anyway, the emphasis should have been on "pre-linked" rather than "executable".

> Wow!  It worked!  Even with func1() and func2() in the same translation unit, func1() is calling func2() from translation unit myfunc2.  And it's not Linux! (for the record, it worked under Linux).

Great. Now you might go back to my response to Luiz and address concerns "First" and "Second" there.

> I'm not reading "undefined behavior" there, I see "error" there.

There is no "error" in the standard. What you call an "error", is "undefined behaviour", because "If a ‘‘shall’’ or ‘‘shall not’’ requirement [...] is violated, the behavior is undefined", which I said in my first message in this thread.

Cheers,
V.
Reply | Threaded
Open this post in threaded view
|

Re: Patchless modification of Lua source code

Philippe Verdy
consider these:

main.c:
#include<stdio.h>
main(){printf("Hello from main\n");func1();printf("Back to main\n");func2();printf("Back to main\n");}

func1.c:
#include<stdio.h>
func1(){printf("\tHello from func1\n");func2();printf("\tBack to func1\n");}

func2.c:
#include<stdio.h>
func2(){printf("\t\tHello from func2\n");}

myfunc2.c:
#include<stdio.h>
func2(){printf("\t\Hello from myfunc2\n");}

Try creating a non-shared library containing func1.o and func2.o : OK, no problem, there's no duplicate symbol. You can then link main.o with it to create the main program.
Try creating a non-shared library containing func1.o and func2.o and myfunc2.o : error, duplicate symbol (func2). You cannot link main.o with it to create the main program.

Try creating a **shared** library "mylib.so" containing func1.o and func2.o : OK, no problem, there's no duplicate symbol. You can then link main.o with it to create the main program.
Now try linking main.o with myfunc2.o and mylib.so, what you get is this:

Hello from main
    Hello from func1
        Hello from func2 (sic!)
    Back to func1
Back to main
    Hello from myfun2 (sic!)
Back to main

In other words, the program is linked with two distinct versions of func2(), which exist simultaneously! One version is statically used by func1() from inside the sharedlibrary where func1 is used by main. The second version (myfunc1.o) is however is now used by main but does **not** replace the version used internally by func1() inside the shared library (even if mylib.so also exports the symbol for "func2", this export is not used when linking main because you've specified myfunc2.o **before** mylib.so when linking main).

You cannot then replace the internal use made inside the shared libary because the binding from func1 to func2 is already resolved internally in the shared library, and the dependency of func1 with func2 is not exposed: the shared libary only exposes a set of exported symbols.

You'd have the same situation with DLLs on Windows.

DLLs or shared libary are much more flexible and predictable than simple libaries where no early binding occurs between units. Basically, the simple library is just an archive containing multiple units, in an unspecified order, so they cannot export the same symbol multiple times. If the archive format allows it, it's just because it ignores compeltley the exported symbols in each unit but only distinguish the unit names packed in the library. The situation is different with shared libaries whose internal links between units are already resolved by early binding, and these early binding cannot be replaced.

What is not predictable is the order of units inside **non-shared** libraries (that's why a shared libary cannot be made containing the same symbol multiple times from distinct units in the library). But there's no problem for the order of units inside a call to a linker. An no problem inside shared libaries, because they are warrantied to export the same symbol only once and have their dependencies already resolved inside it, so they cannot export the same symbol multiple times.

A shared libary (or executable) however can be build to contain the same symbol multiple times, but only one of these symbols is really exported: the first one specified when linking it, but it does not mean that there are not multiple implementations the shared library (or program) however becomes an unbreakable new unit (that Windows calls a "module" if it is executable, but this also applies to shared libaries on Linux using the ELF binary format or similar, where early binding of internal symbols is already made inside it and where sets of exported symbols are already merged to a single set containing no duplicate, even if there are multiple internal implementations).





Le sam. 24 nov. 2018 à 13:07, Viacheslav Usov <[hidden email]> a écrit :
On Sat, Nov 24, 2018 at 12:28 AM Sean Conner <[hidden email]> wrote:

> And I think this is where we have our differenes.  If the "current translation" does not contain function foo(), *then* such a function is looked for in any given libraries [1].

Correct. As long as there is exactly one definition of foo() in the entire program, this is well-defined.

Having bar() defined in the "current translation" and some other library would cause undefined behaviour, where your reasoning would no longer apply.

>  Again, the key sentence from 5.1.1.2 #8:  Library components

This is not where an "entire program" is referenced.

> I could not find the phrase "full library" (or variations) in the C99 Standard.

I used the word "full" to emphasise that libraries are not considered piece-wise by the standard when it talks about an "entire program".

> Citation please, from the C standard (any one of C89, C99, or C11).

Given that neither the notion of static vs shared libraries, not their organisation, is addressed by the standard, any discussion at this level is not portable, and any use of implementation details, in situations that cause undefined behaviour in the standard, even if it is consistent within the Implementation, is non-conformant. I stress the "use" not the "details".

> It's odd to think of GNU C as not being conformant.

I did not say GNU C was non-coformant (even though it may be). As I explained in earlier messages in this thread, it is the "program", not the "tool", that is non-conformant if the one definition requirement is violated.

> I did some experiments last night on this very subject. 

And, as far as I can tell, they were consistent with my description, wherein the linker pulls previously archived object files from a static library, ignoring other object files whenever it can. Which is perfectly fine when there is no more than one definition of any external symbol in the entire program, and is a trivially acceptable form of undefined behaviour otherwise.

> At first, I used Linux but curious, I went and did the same experiments on Solaris

As mentioned by others, Linux and Solaris have a common Unix heritage and adhere to other common standards, so one should expect similar behaviours. So as a demonstration of portability, this is quite weak.

We have an elephant in the room here, and let me just name it: Windows. Once somebody demonstrates similar techniques on Windows, we could talk about real-world portability.

> They are not pre-linked executables.  Well, mostly.  I know you can execute libc under Linux:

To execute means more than just run something like a program in a shell. And an executable means more than the main executable program file. Any, anyway, the emphasis should have been on "pre-linked" rather than "executable".

> Wow!  It worked!  Even with func1() and func2() in the same translation unit, func1() is calling func2() from translation unit myfunc2.  And it's not Linux! (for the record, it worked under Linux).

Great. Now you might go back to my response to Luiz and address concerns "First" and "Second" there.

> I'm not reading "undefined behavior" there, I see "error" there.

There is no "error" in the standard. What you call an "error", is "undefined behaviour", because "If a ‘‘shall’’ or ‘‘shall not’’ requirement [...] is violated, the behavior is undefined", which I said in my first message in this thread.

Cheers,
V.
Reply | Threaded
Open this post in threaded view
|

Re: Patchless modification of Lua source code

Ivan Krylov
In reply to this post by Viacheslav Usov
Hi!

On Sat, 24 Nov 2018 13:04:50 +0100
Viacheslav Usov <[hidden email]> wrote:

> We have an elephant in the room here, and let me just name it:
> Windows. Once somebody demonstrates similar techniques on Windows, we
> could talk about real-world portability.

For what it's worth, Sean's experiment can be reproduced on MSVC 2010:

 >cl /c main.c
 >cl /c func1.c
 >cl /c func2.c
 >cl /c myfunc2.c
 >lib /out:func.lib func1.obj func2.obj
 >link /out:main.exe main.obj myfunc2.obj func.lib
 >main.exe
 main
 func1
 myfunc2

> First, other libraries that the executable depends on will not have
> been linked statically to the same "replacement" functions, so they
> will have to resort to the "original" dependencies. Your program, as
> whole, may end up using different versions of the same set of
> functions simultaneously.

I've certainly encountered such nasty cases in the past when I was
learning C, but right now I cannot produce such an example that would
also be concise and easy to reproduce. If you have one handy, I would be
glad to test it on Windows, too.

--
Best regards,
Ivan

Reply | Threaded
Open this post in threaded view
|

Re: Patchless modification of Lua source code

Ivan Krylov
In reply to this post by Philippe Verdy
On Sat, 24 Nov 2018 14:18:07 +0100
Philippe Verdy <[hidden email]> wrote:

> You'd have the same situation with DLLs on Windows.

Thank you for your example, I have added two __declspec(dllexport)'s
and reproduced your behaviour:

 >cl /c main.c
 >cl /c func1.c
 >cl /c func2.c
 >cl /c myfunc2.c
 >link /dll /out:mylib.dll func1.obj func2.obj
   Creating library mylib.lib and object mylib.exp
 >link /out:main.exe main.obj myfunc2.obj mylib.lib
 >main
 Hello from main
         Hello from func1
                 Hello from func2
         Back to func1
 Back to main
                 Hello from myfunc2
 Back to main

--
Best regards,
Ivan

1234