Adding utf-8 handling support to Lua 5.1 in 2017

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Adding utf-8 handling support to Lua 5.1 in 2017

Vadi
Hi,

I'd like to add support for dealing with utf-8 text in Lua 5.1 - for functions such as string.gsub, string.len and etc. I'm aware 5.3 comes with a utf8 library but I'm not considering breaking capability for the huge ecosystem of code that's been developed for 5.1 for my application just yet.

I know there's starwing/luautf8, are there any other options that I should be considering?


Reply | Threaded
Open this post in threaded view
|

Re: Adding utf-8 handling support to Lua 5.1 in 2017

Luiz Henrique de Figueiredo
> I'd like to add support for dealing with utf-8 text in Lua 5.1 - for
> functions such as string.gsub, string.len and etc. I'm aware 5.3 comes with
> a utf8 library but I'm not considering breaking capability for the huge
> ecosystem of code that's been developed for 5.1 for my application just yet.
>
> I know there's starwing/luautf8, are there any other options that I should
> be considering?

With the simple changes below, lutf8lib.c from 5.3 compiles fine in 5.1.
I haven't tested it though.

    10d9
    < #include "lprefix.h"
    250,251c249,250
    < LUAMOD_API int luaopen_utf8 (lua_State *L) {
    <   luaL_newlib(L, funcs);
    ---
    > LUALIB_API int luaopen_utf8 (lua_State *L) {
    >   luaL_register(L, "utf8", funcs);


Reply | Threaded
Open this post in threaded view
|

Re: Adding utf-8 handling support to Lua 5.1 in 2017

Vadi
That's great to know as well! Thanks!

On Thu, Apr 13, 2017 at 4:28 PM Luiz Henrique de Figueiredo <[hidden email]> wrote:
> I'd like to add support for dealing with utf-8 text in Lua 5.1 - for
> functions such as string.gsub, string.len and etc. I'm aware 5.3 comes with
> a utf8 library but I'm not considering breaking capability for the huge
> ecosystem of code that's been developed for 5.1 for my application just yet.
>
> I know there's starwing/luautf8, are there any other options that I should
> be considering?

With the simple changes below, lutf8lib.c from 5.3 compiles fine in 5.1.
I haven't tested it though.

    10d9
    < #include "lprefix.h"
    250,251c249,250
    < LUAMOD_API int luaopen_utf8 (lua_State *L) {
    <   luaL_newlib(L, funcs);
    ---
    > LUALIB_API int luaopen_utf8 (lua_State *L) {
    >   luaL_register(L, "utf8", funcs);


Reply | Threaded
Open this post in threaded view
|

Re: Adding utf-8 handling support to Lua 5.1 in 2017

Philipp Janda
In reply to this post by Luiz Henrique de Figueiredo
Am 13.04.2017 um 16:28 schröbte Luiz Henrique de Figueiredo:

>> I'd like to add support for dealing with utf-8 text in Lua 5.1 - for
>> functions such as string.gsub, string.len and etc. I'm aware 5.3 comes with
>> a utf8 library but I'm not considering breaking capability for the huge
>> ecosystem of code that's been developed for 5.1 for my application just yet.
>>
>> I know there's starwing/luautf8, are there any other options that I should
>> be considering?
>
> With the simple changes below, lutf8lib.c from 5.3 compiles fine in 5.1.
> I haven't tested it though.

It won't work because of `lua_pushfstring(L, "%U", ?)`. You can #define
it to something reasonable, though. Compat-5.3[1] has lutf8lib.c
backported to Lua 5.1, btw.

>
>     10d9
>     < #include "lprefix.h"
>     250,251c249,250
>     < LUAMOD_API int luaopen_utf8 (lua_State *L) {
>     <   luaL_newlib(L, funcs);
>     ---
>     > LUALIB_API int luaopen_utf8 (lua_State *L) {
>     >   luaL_register(L, "utf8", funcs);
>

Philipp

   [1]: https://github.com/keplerproject/lua-compat-5.3





Reply | Threaded
Open this post in threaded view
|

Re: Adding utf-8 handling support to Lua 5.1 in 2017

Vadi

Hey, thanks for that link, this kind of stuff is why I came to ask here. I read that your module does changes to the global environment (in the default load manner), are any of those changes backwards incompatible with code written for plain 5.1?


On Thu, 13 Apr 2017 6:47 pm Philipp Janda, <[hidden email]> wrote:
Am 13.04.2017 um 16:28 schröbte Luiz Henrique de Figueiredo:
>> I'd like to add support for dealing with utf-8 text in Lua 5.1 - for
>> functions such as string.gsub, string.len and etc. I'm aware 5.3 comes with
>> a utf8 library but I'm not considering breaking capability for the huge
>> ecosystem of code that's been developed for 5.1 for my application just yet.
>>
>> I know there's starwing/luautf8, are there any other options that I should
>> be considering?
>
> With the simple changes below, lutf8lib.c from 5.3 compiles fine in 5.1.
> I haven't tested it though.

It won't work because of `lua_pushfstring(L, "%U", ?)`. You can #define
it to something reasonable, though. Compat-5.3[1] has lutf8lib.c
backported to Lua 5.1, btw.

>
>     10d9
>     < #include "lprefix.h"
>     250,251c249,250
>     < LUAMOD_API int luaopen_utf8 (lua_State *L) {
>     <   luaL_newlib(L, funcs);
>     ---
>     > LUALIB_API int luaopen_utf8 (lua_State *L) {
>     >   luaL_register(L, "utf8", funcs);
>

Philipp

   [1]: https://github.com/keplerproject/lua-compat-5.3





Reply | Threaded
Open this post in threaded view
|

Re: Adding utf-8 handling support to Lua 5.1 in 2017

Philipp Janda
Am 13.04.2017 um 19:01 schröbte Vadim Peretokin:
> Hey, thanks for that link, this kind of stuff is why I came to ask here. I
> read that your module does changes to the global environment (in the
> default load manner), are any of those changes backwards incompatible with
> code written for plain 5.1?

It's mostly extensions/enhancements and the old functions like
`getfenv`, `loadstring`, `unpack`, etc. are still there, but if you look
hard enough you'll find some incompatibilities (e.g. the table library
in 5.3 respects metamethods while the library in Lua 5.1 uses raw
accesses, or the return values of `os.execute`). You can
`require"compat53.module"` avoid this, or if you are only interested in
utf8, there's also `require"compat53.utf8"` which only loads the utf8
backport. The last two options don't modify the global environment.


Philipp


>
> On Thu, 13 Apr 2017 6:47 pm Philipp Janda, <[hidden email]> wrote:
>
>> Am 13.04.2017 um 16:28 schröbte Luiz Henrique de Figueiredo:
>>>> I'd like to add support for dealing with utf-8 text in Lua 5.1 - for
>>>> functions such as string.gsub, string.len and etc. I'm aware 5.3 comes
>> with
>>>> a utf8 library but I'm not considering breaking capability for the huge
>>>> ecosystem of code that's been developed for 5.1 for my application just
>> yet.
>>>>
>>>> I know there's starwing/luautf8, are there any other options that I
>> should
>>>> be considering?
>>>
>>> With the simple changes below, lutf8lib.c from 5.3 compiles fine in 5.1.
>>> I haven't tested it though.
>>
>> It won't work because of `lua_pushfstring(L, "%U", ?)`. You can #define
>> it to something reasonable, though. Compat-5.3[1] has lutf8lib.c
>> backported to Lua 5.1, btw.
>>
>>>
>>>     10d9
>>>     < #include "lprefix.h"
>>>     250,251c249,250
>>>     < LUAMOD_API int luaopen_utf8 (lua_State *L) {
>>>     <   luaL_newlib(L, funcs);
>>>     ---
>>>     > LUALIB_API int luaopen_utf8 (lua_State *L) {
>>>     >   luaL_register(L, "utf8", funcs);
>>>
>>
>> Philipp
>>
>>    [1]: https://github.com/keplerproject/lua-compat-5.3
>>
>>
>>
>>
>>
>>
>



Reply | Threaded
Open this post in threaded view
|

Re: Adding utf-8 handling support to Lua 5.1 in 2017

Vadi
I've had a closer look at utf8 in 5.3 and unfortunately it does not enable all of string.* to work with utf-8 which is what I need, so that is a no-go. I think the alternatives are starwing/luautf8, Stepets/utf8.lua and Mediawiki's ustring.

Has anyone had experience with any of those or other libraries I've missed to provide equivalent utf-8 support of the string.* library?

On Fri, Apr 14, 2017 at 12:23 AM Philipp Janda <[hidden email]> wrote:
Am 13.04.2017 um 19:01 schröbte Vadim Peretokin:
> Hey, thanks for that link, this kind of stuff is why I came to ask here. I
> read that your module does changes to the global environment (in the
> default load manner), are any of those changes backwards incompatible with
> code written for plain 5.1?

It's mostly extensions/enhancements and the old functions like
`getfenv`, `loadstring`, `unpack`, etc. are still there, but if you look
hard enough you'll find some incompatibilities (e.g. the table library
in 5.3 respects metamethods while the library in Lua 5.1 uses raw
accesses, or the return values of `os.execute`). You can
`require"compat53.module"` avoid this, or if you are only interested in
utf8, there's also `require"compat53.utf8"` which only loads the utf8
backport. The last two options don't modify the global environment.


Philipp


>
> On Thu, 13 Apr 2017 6:47 pm Philipp Janda, <[hidden email]> wrote:
>
>> Am 13.04.2017 um 16:28 schröbte Luiz Henrique de Figueiredo:
>>>> I'd like to add support for dealing with utf-8 text in Lua 5.1 - for
>>>> functions such as string.gsub, string.len and etc. I'm aware 5.3 comes
>> with
>>>> a utf8 library but I'm not considering breaking capability for the huge
>>>> ecosystem of code that's been developed for 5.1 for my application just
>> yet.
>>>>
>>>> I know there's starwing/luautf8, are there any other options that I
>> should
>>>> be considering?
>>>
>>> With the simple changes below, lutf8lib.c from 5.3 compiles fine in 5.1.
>>> I haven't tested it though.
>>
>> It won't work because of `lua_pushfstring(L, "%U", ?)`. You can #define
>> it to something reasonable, though. Compat-5.3[1] has lutf8lib.c
>> backported to Lua 5.1, btw.
>>
>>>
>>>     10d9
>>>     < #include "lprefix.h"
>>>     250,251c249,250
>>>     < LUAMOD_API int luaopen_utf8 (lua_State *L) {
>>>     <   luaL_newlib(L, funcs);
>>>     ---
>>>     > LUALIB_API int luaopen_utf8 (lua_State *L) {
>>>     >   luaL_register(L, "utf8", funcs);
>>>
>>
>> Philipp
>>
>>    [1]: https://github.com/keplerproject/lua-compat-5.3
>>
>>
>>
>>
>>
>>
>



Reply | Threaded
Open this post in threaded view
|

Re: Adding utf-8 handling support to Lua 5.1 in 2017

Paul E. Merrell, J.D.
On Thu, Apr 13, 2017 at 9:17 PM, Vadim Peretokin <[hidden email]> wrote:
> I've had a closer look at utf8 in 5.3 and unfortunately it does not enable
> all of string.* to work with utf-8 which is what I need, so that is a no-go.
> I think the alternatives are starwing/luautf8, Stepets/utf8.lua and
> Mediawiki's ustring.
>
> Has anyone had experience with any of those or other libraries I've missed
> to provide equivalent utf-8 support of the string.* library?

We've used  starwing/luautf8 with v. Lua 5.2 and 5.3 embedded in
NoteCase Pro [1] without reported issues. [2]  We did have to give it
its own namespace (we use "uf8ex") to avoid a function name clash with
Lua 5.3's utf8.len. Fortunately, at the time we implemented starwing's
code, we knew from this list that v. 5.3 would have that naming
conflict so we were able to avoid the conflict before it developed.

Caveat: Those of us who do a lot of scripting in NoteCase Pro [3] to
my knowledge only use the luautf8 equivalents to the Lua string
library functions that consume or return offsets. We have scant
experience with lua-utf8's other functions.

Hope this helps.

Paul


1. <http://notecasepro.com/> (the program embeds Lua on a wide variety
of operating systems. See <http://notecasepro.com/download.php>.

2. I did notice that utf8.title apparently has the same return as
utf8.upper. From my rudimentary understanding of unicode, this is
correct. But I think it represents a poor choice of nomenclature in
the unicode world. "Title case" in the English language does not mean
that all alphabetical characters are upper cased.

3. I've written upward of 600 scripts for NoteCase Pro, most of which
use one or more of the starwing/utf8 string functions.


--
[Notice not included in the above original message:  The U.S. National
Security Agency neither confirms nor denies that it intercepted this
message.]

Reply | Threaded
Open this post in threaded view
|

Re: Adding utf-8 handling support to Lua 5.1 in 2017

Vadi
Thanks for the vote of confidence, appreciate it!

On Fri, Apr 14, 2017 at 7:05 AM Paul Merrell <[hidden email]> wrote:
On Thu, Apr 13, 2017 at 9:17 PM, Vadim Peretokin <[hidden email]> wrote:
> I've had a closer look at utf8 in 5.3 and unfortunately it does not enable
> all of string.* to work with utf-8 which is what I need, so that is a no-go.
> I think the alternatives are starwing/luautf8, Stepets/utf8.lua and
> Mediawiki's ustring.
>
> Has anyone had experience with any of those or other libraries I've missed
> to provide equivalent utf-8 support of the string.* library?

We've used  starwing/luautf8 with v. Lua 5.2 and 5.3 embedded in
NoteCase Pro [1] without reported issues. [2]  We did have to give it
its own namespace (we use "uf8ex") to avoid a function name clash with
Lua 5.3's utf8.len. Fortunately, at the time we implemented starwing's
code, we knew from this list that v. 5.3 would have that naming
conflict so we were able to avoid the conflict before it developed.

Caveat: Those of us who do a lot of scripting in NoteCase Pro [3] to
my knowledge only use the luautf8 equivalents to the Lua string
library functions that consume or return offsets. We have scant
experience with lua-utf8's other functions.

Hope this helps.

Paul


1. <http://notecasepro.com/> (the program embeds Lua on a wide variety
of operating systems. See <http://notecasepro.com/download.php>.

2. I did notice that utf8.title apparently has the same return as
utf8.upper. From my rudimentary understanding of unicode, this is
correct. But I think it represents a poor choice of nomenclature in
the unicode world. "Title case" in the English language does not mean
that all alphabetical characters are upper cased.

3. I've written upward of 600 scripts for NoteCase Pro, most of which
use one or more of the starwing/utf8 string functions.


--
[Notice not included in the above original message:  The U.S. National
Security Agency neither confirms nor denies that it intercepted this
message.]

Reply | Threaded
Open this post in threaded view
|

Re: Adding utf-8 handling support to Lua 5.1 in 2017

Daurnimator
In reply to this post by Vadi
On 14 Apr 2017 2:18 PM, "Vadim Peretokin" <[hidden email]> wrote:
I've had a closer look at utf8 in 5.3 and unfortunately it does not enable all of string.* to work with utf-8 which is what I need, so that is a no-go. I think the alternatives are starwing/luautf8, Stepets/utf8.lua and Mediawiki's ustring.

Has anyone had experience with any of those or other libraries I've missed to provide equivalent utf-8 support of the string.* library?

I wrote some bindings to libunistring which is good for this sort of thing. https://github.com/daurnimator/lua-unistring if you find it useful I can make a release.
Reply | Threaded
Open this post in threaded view
|

Re: Adding utf-8 handling support to Lua 5.1 in 2017

Vadi

Hmm what advantages does it have over starwigs library?


On Sat, 15 Apr 2017 3:15 am Daurnimator, <[hidden email]> wrote:
On 14 Apr 2017 2:18 PM, "Vadim Peretokin" <[hidden email]> wrote:
I've had a closer look at utf8 in 5.3 and unfortunately it does not enable all of string.* to work with utf-8 which is what I need, so that is a no-go. I think the alternatives are starwing/luautf8, Stepets/utf8.lua and Mediawiki's ustring.

Has anyone had experience with any of those or other libraries I've missed to provide equivalent utf-8 support of the string.* library?

I wrote some bindings to libunistring which is good for this sort of thing. https://github.com/daurnimator/lua-unistring if you find it useful I can make a release.