[...] If it finds any invalid byte sequence, returns a false
value plus the position of the first invalid byte.
Which (to me) seems not to specify what the first invalid byte in an
invalid byte sequence is. Is it the first byte that invalidates a byte
sequence or the first byte of the whole invalid byte sequence?
Can an interested Lua user who has not carefully studied the Unicode
specs, which is an external resource, safely infer the output of
$ lua -e "print(utf8.len('\xc3\xc4'))"
from the Lua manual alone?
The given UTF-8 string refers to German umlaut character Ä, 0xc384,
except that the second byte's second-most significant bit is flipped.
The first byte now introduces a multi-byte sequence, but is not followed
by a continuation byte. As given, both bytes are invalid. But which is
the first invalid one?
Without giving a spoiler (hopefully), in this case, Lua seems to be
in line with official Unicode/UTF-8 specs that are more clear on the
handling of invalid UTF-8 material. But a slight change in the Lua
manual could make it more self-contained.