utf8 library may cause heap corruption

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

utf8 library may cause heap corruption

云风 Cloud Wu
I found there is a function " static const char *utf8_decode (const char *o, int *val) " in lutf8lib.c . This function can't known the boundary of string o . So it may cause the heap corruption.

For example, I can build a corruption utf8 string, when utf8 function call utf8_decode to read it , it may read the memory out of the string. Or it can't detect the utf8 string is invalid.

Reply | Threaded
Open this post in threaded view
|

Re: utf8 library may cause heap corruption

Dirk Laurie-2
2017-02-09 11:15 GMT+02:00 云风 Cloud Wu <[hidden email]>:

> I found there is a function " static const char *utf8_decode (const char *o,
> int *val) " in lutf8lib.c . This function can't known the boundary of string
> o . So it may cause the heap corruption.
>
> For example, I can build a corruption utf8 string, when utf8 function call
> utf8_decode to read it , it may read the memory out of the string. Or it
> can't detect the utf8 string is invalid.

Your subject proclaims that a call to the utf8 libray may cause heap
corruption, but your argument merely shows that a call to utf8_decode
could.

The function is not exported. All the calls to it that can ever happen
are in lutf8lib.c. Moreover, at most four bytes of string o are examined,
there is no possibility of an indefinite loop.

Please show us that corruption string that you can build, and the
call to the utf8 library that then corrupts the heap. I don't believe you.

Reply | Threaded
Open this post in threaded view
|

Re: utf8 library may cause heap corruption

云风 Cloud Wu
Dirk Laurie <[hidden email]>于2017年2月9日周四 下午5:46写道:

Please show us that corruption string that you can build, and the
call to the utf8 library that then corrupts the heap. I don't believe you.


My fault. Lua append \0 into every string, so it's safe . Sorry.


Reply | Threaded
Open this post in threaded view
|

Re: utf8 library may cause heap corruption

云风 Cloud Wu
云风 Cloud Wu <[hidden email]>于2017年2月9日周四 下午5:53写道:
Dirk Laurie <[hidden email]>于2017年2月9日周四 下午5:46写道:

Please show us that corruption string that you can build, and the
call to the utf8 library that then corrupts the heap. I don't believe you.


My fault. Lua append \0 into every string, so it's safe . Sorry.


But there is another problem.

local s = "\xE4\xBA"
assert(utf8.len(s, 1, 2) == utf8.len(s .. "\x91",1,2)) -- failed



Reply | Threaded
Open this post in threaded view
|

Re: utf8 library may cause heap corruption

Dirk Laurie-2
2017-02-09 14:05 GMT+02:00 云风 Cloud Wu <[hidden email]>:

> But there is another problem.
>
> local s = "\xE4\xBA"
> assert(utf8.len(s, 1, 2) == utf8.len(s .. "\x91",1,2)) -- failed

Why is this a problem? It should fail. s is not a valid UTF8 codepoint
("\xE4" promises three bytes, but there are only two). When you
supply the extra byte, there is one valid codepoint. starting between
charaters 1 and 2.

> utf8.len(s, 1, 2)
nil    1
> utf8.len(s .. "\x91",1,2)
1

Reply | Threaded
Open this post in threaded view
|

Re: utf8 library may cause heap corruption

Kim Alvefur
On Thu, Feb 09, 2017 at 02:30:39PM +0200, Dirk Laurie wrote:

> 2017-02-09 14:05 GMT+02:00 云风 Cloud Wu <[hidden email]>:
>
> > But there is another problem.
> >
> > local s = "\xE4\xBA"
> > assert(utf8.len(s, 1, 2) == utf8.len(s .. "\x91",1,2)) -- failed
>
> Why is this a problem? It should fail. s is not a valid UTF8 codepoint
> ("\xE4" promises three bytes, but there are only two). When you
> supply the extra byte, there is one valid codepoint. starting between
> charaters 1 and 2.
The manual says:
> Returns the number of UTF-8 characters in string s that **start**
> between positions i and j (both inclusive).

Extra emphasis on **start**. The 3 byte does sequence starts within the
range given.


--
Zash

signature.asc (849 bytes) Download Attachment