Windows CE

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

Windows CE

Johan Liseborn
Hi all,

I am pretty new to this list, so sorry if my question is kind of obvious or
something. I did search the history at egroups, but came up short.

My question is this: Is there an ongoing effort to get lua running on
windows ce?

I found some references in the history, but they all dated back to 98-99.

I see two main problems with running on windows ce:

1. Windows CE is not fully ANSI compliant (missing libs).
2. UNICODE and wide characters.

None of these should be impossible to overcome, and it would be so great to
have a slick, lean, extendable scripting language on a wince device.

I actually got it to "almost" compile on wince out of the box, the ANSI
functions I found missing are:

- freopen
- system
- remove
- rename
- tmpnam
- getenv
- clock
- CLOCKS_PER_SEC
- time
- localtime
- strftime

/johan

--
Johan Liseborn
CTO and Co-Founder
Hotsip AB


Reply | Threaded
Open this post in threaded view
|

Re: Windows CE

Luiz Henrique de Figueiredo
>I actually got it to "almost" compile on wince out of the box, the ANSI
>functions I found missing are:
>
>- freopen
>- system
>- remove
>- rename
>- tmpnam
>- getenv
>- clock
>- CLOCKS_PER_SEC
>- time
>- localtime
>- strftime

Except for freopen, all these are only used in libraries, so you can just remove
the functions that call them. For 4.1, we have replaced freopen by fclose+fopen.
--lhf

Reply | Threaded
Open this post in threaded view
|

Re: Windows CE

Johan Liseborn
> Except for freopen, all these are only used in libraries, so you can just
remove
> the functions that call them. For 4.1, we have replaced freopen by
fclose+fopen.

OK, thanks! I got it to compile, haven't done anything more useful yet,
though :-)

Is it possible to access the 4.1 release? Could not find a public CVS
repository or something like that... Do you have a preliminary release
schedule?


And what about the UNICODE/wchar_t issue? I saw comments on that in postings
from 98-99. It should be possible to handle that as well, shouldn't it,
using something like the scheme Microsoft is using in their code... If there
is something I can do to help, I will be glad to do so!

/johan

--
Johan Liseborn
CTO and Co-Founder
Hotsip AB


Reply | Threaded
Open this post in threaded view
|

Re: Windows CE

Luiz Henrique de Figueiredo
In reply to this post by Luiz Henrique de Figueiredo
>Is it possible to access the 4.1 release? Could not find a public CVS
>repository or something like that... Do you have a preliminary release
>schedule?

4.1 is not ready, not even for alpha. No, there is no CVS access.
I think 4.1 alpha will be released sometime around April, but no promises.

>And what about the UNICODE/wchar_t issue? I saw comments on that in postings
>from 98-99.

This issue is complicated by the absence of platfortms accesible to us that
implement UNICODE and wchar_t. Not even Linux does.
So, it's not in our immediate plans, but we'd like to hear what should be
changed so that we can plan our code for easy change. That would be a very
good topic for a Lua Technical Note.

>It should be possible to handle that as well, shouldn't it,
>using something like the scheme Microsoft is using in their code... If there
>is something I can do to help, I will be glad to do so!

I'm not familiar with Microsoft's scheme.
--lhf

Reply | Threaded
Open this post in threaded view
|

Re: Windows CE

Roberto Ierusalimschy
In reply to this post by Johan Liseborn
> And what about the UNICODE/wchar_t issue? I saw comments on that in postings
> from 98-99. It should be possible to handle that as well, shouldn't it,
> using something like the scheme Microsoft is using in their code... If there
> is something I can do to help, I will be glad to do so!

It would be nice if we could make easy in Lua to change the "char" type
to support Unicode. But I think there are many details that are difficult
to handle only through macros.

The naive approach seems like to define a type Char (or lua_char), which 
can be both char or wchar_t. We must make sure that we do not assume that 
sizeof(Char) == 1, that we have some macro to convert all 'char' and 
"string" literals to wide (something that creates the `L'), and other such 
details. Then, all functions in the API work with this new type; not only 
lua_pushstring/lua_tostring, but also lua_getglobal, etc. Most functions 
that work over char types (such as ctype.h) can be redefined with macros, 
too. That is the easy part ;-)

But there are other problems. All format strings should be parameterzided
between '%s' and '%ls'; how to do that in an easy way? Also, I think
some functions do not have a wide equivalent (for instance, fopen).
The swprintf function takes one argument more than sprintf; we cannot
handle that with macros, because they have variable number of
arguments. How to handle those things?

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: Windows CE

Jean-Claude Wippler
Roberto Ierusalimschy <[hidden email]> wrote:

[Unicode]
>It would be nice if we could make easy in Lua to change the "char" type
>to support Unicode. But I think there are many details that are difficult
>to handle only through macros.

Please don't go this route.  It makes storing/exchanging data a hell.

The way both Tcl and Perl address this issue, is to use UTF-8 to
represent Unicode data.  UTF-8 maps 1:1 on 7-bit ASCII, and uses the
upper 128 chars to create a multi-byte encoding.  The beauty of it is
that a lot of existing code keeps on working as is (even Lua's lexer
would, I expect), the main trade-off is that character-wise (Unicode,
that is) indexing becomes less straightforward, and that the "length" of
a string, in terms of counting Unicode chars, also is no longer
equivalent to the length of the byte-representation.

A few more properties of UTF-8:
 - zero-byte delimiters can continue to work (there may be minor issues)
 - can be exchanged as strings, even with non-Unicode-aware machines
 - no endian-ness issues, UTF-8 is basically a byte-sized string

Python decided to go for a 2-byte internal representation instead, BTW.

Lua could use UTF-8, since it does not have "str[i]" type indexing and is
8-bit clean.  Evidently, the str* functions are affected - but these are
outside the core, and therefore neatly replaceable.  The basic idea would
be to cover all the in and outcoming cases where strings are involved,
and to keep the Lua core mostly as is.

Having said all this, I must add that I know enough about Unicode to know
that I know hardly anything (go ahead, read that sentence again).  It's
tricky stuff, and too easy to overlook implications (capitalization, word
delimiting, ...).  Those considering dealing with this better make sure
they have an expert at hand.

-jcw

Reply | Threaded
Open this post in threaded view
|

Re: Windows CE

Alan Watson-4
> The way both Tcl and Perl address this issue, is to use UTF-8 to
> represent Unicode data.  

I know nothing about how WinCE handles Unicode, but UTF-8
has some very nice properties. A useful discussion of using
UTF-8 can be found in "Hello World", by Rob Pike and Brian
Kernighan, which describes how Plan 9 handles Unicode and
UTF. See http://plan9.bell-labs.com/sys/doc/.

Regards,

Alan
-- 
Dr Alan Watson
Instituto de Astronomía UNAM

Reply | Threaded
Open this post in threaded view
|

Re: Windows CE

John Belmonte-2
In reply to this post by Roberto Ierusalimschy
Hi,

Roberto wrote:
> But there are other problems. All format strings should be parameterzided
> between '%s' and '%ls'; how to do that in an easy way?

I had experience with this when I tweaked Lua to control the number type
completely.  I don't think you'll like my solution but anyway it's there as
a reference.  (Item #2 in the email "Tightening Lua's type use",
2000-Sep-26.)

-John



Reply | Threaded
Open this post in threaded view
|

RE: Windows CE

Vincent Penquerc'h-3
In reply to this post by Jean-Claude Wippler
> The way both Tcl and Perl address this issue, is to use UTF-8 to
> represent Unicode data.  UTF-8 maps 1:1 on 7-bit ASCII, and uses the
> upper 128 chars to create a multi-byte encoding.  The beauty of it is
> that a lot of existing code keeps on working as is (even Lua's lexer
> would, I expect), the main trade-off is that character-wise (Unicode,
> that is) indexing becomes less straightforward, and that the "length" of
> a string, in terms of counting Unicode chars, also is no longer
> equivalent to the length of the byte-representation.

I'd like to say that i've used UTF-8 encoding, and found it extremely easy
and straightforward to use. As Jean-Claude Wippler says, most string.h
routines
will work unchanged. I used to consider this encoding as a big hack for
those
who can't switch to 16 bit unicode or other encodings, but it is actually a
very viable solution (and this means too that 'normal' strings don't take up
twice more storage space all of a sudden, as would be required with 16 bit
encoding). Also, I think Lua's hashing would take a performance hit from
having
to use 16 bit characters.
Moreover, the character indexing problem can be neatly hidden behind a bunch
of
conveniency functions that increment a pointer along the string.

--
Vincent Penquerc'h


Reply | Threaded
Open this post in threaded view
|

Re: Windows CE

Roberto Ierusalimschy
> Lua could use UTF-8, since it does not have "str[i]" type indexing and is
> 8-bit clean.  Evidently, the str* functions are affected - but these are
> outside the core, and therefore neatly replaceable.  The basic idea would
> be to cover all the in and outcoming cases where strings are involved,
> and to keep the Lua core mostly as is.

Yes, this seems the best option. I think we would need only a new strlib 
(that lib would change a lot); but everything else should work without 
any changes.

But, then, my other question: what is the relationship between Windows CE 
and Unicode? Why did everybody that tryed to port Lua to Windows CE come up 
with this subject? Why can't they just use this approach (UTF-8)?
(this is pure ignorance of my part; I know nothing about Windows CE...) 

-- Roberto


Reply | Threaded
Open this post in threaded view
|

RE: Windows CE

Vincent Penquerc'h-3
> But, then, my other question: what is the relationship between Windows CE
> and Unicode? Why did everybody that tryed to port Lua to Windows
> CE come up
> with this subject? Why can't they just use this approach (UTF-8)?
> (this is pure ignorance of my part; I know nothing about Windows CE...)

Because CE knows only the 16 bit encoding variant of unicode. Everytime you
want to pass a string to the system, or get a string from it, it needs to be
in 16 bit encoding. Thus the need either to program in 16 bit encoding or
do endless conversions.

--
Vincent Penquerc'h


Reply | Threaded
Open this post in threaded view
|

Re: Windows CE

Johan Liseborn
In reply to this post by Roberto Ierusalimschy
Comments below...

> > Lua could use UTF-8, since it does not have "str[i]" type indexing and
is
> > 8-bit clean.  Evidently, the str* functions are affected - but these are
> > outside the core, and therefore neatly replaceable.  The basic idea
would
> > be to cover all the in and outcoming cases where strings are involved,
> > and to keep the Lua core mostly as is.
>
> Yes, this seems the best option. I think we would need only a new strlib
> (that lib would change a lot); but everything else should work without
> any changes.
>
> But, then, my other question: what is the relationship between Windows CE
> and Unicode? Why did everybody that tryed to port Lua to Windows CE come
up
> with this subject? Why can't they just use this approach (UTF-8)?
> (this is pure ignorance of my part; I know nothing about Windows CE...)

The problem, as I understand it, is that Windows CE simply requires all
strings to be UNICODE (the type shall be wchar_t), which means that even
though UTF-8 encoded strings would be nice, it simply won't work for Windows
CE... :-(

The way Microsoft solves this (this is the "Microsoft scheme" I mentioned in
an earlier posting), is that they have a series of typedefs and defines
along the following lines (this is *very* simplified):

#ifdef _UNICODE
typedef wchar_t TCHAR;
#define _T(x) L##x
#define _tcslen wcslen
...
#else
typedef char TCHAR;
#define _T(x) x
#define _tcslen strlen
...
#endif

This basically means that if you define the preprocessor symbol '_UNICODE'
and use the macros, everything will be set for wchar_t-type strings
(presumeably in UNICODE format), else you will get regular or multi-byte
strings. This also means that all libraryfunctions must exist in two
versions, one for standard char and one for wchar_t (I did a stupid mistake
here, where I for a moment though that the wide-character versions where
also part of the standard C library, but I believe they are not... or?).

You can then write code like:

int i = _tcslen( _T("hello world") );

and this will expand to either

int i = wcslen( L"hello world" );

or

int i = strlen( "hello world" );

depending on the definition of _UNICODE.

/johan



Reply | Threaded
Open this post in threaded view
|

Re: Windows CE

Jean-Claude Wippler
In reply to this post by Roberto Ierusalimschy
Roberto Ierusalimschy <[hidden email]> wrote:

[UTF-8]
>But, then, my other question: what is the relationship between Windows CE 
>and Unicode? Why did everybody that tryed to port Lua to Windows CE come up 
>with this subject? Why can't they just use this approach (UTF-8)?
>(this is pure ignorance of my part; I know nothing about Windows CE...) 

The machine is based on wchar_t as way to pass strings in and out, so
people tend to think it needs to be that way inside their code as well. 
IMO, this is not the case - it's a conversion issue just like converting
numbers to/from printable form is.  Or closer to home: just like Lua
converts everything back and forth from doubles when it needs to
interface with things outside it.  If conversions are used only for
information going to/from the user interface, and things like file names,
then they need not become a bottleneck.  As I said before, storing data
on file with anything other than UTF-8 would IMHO be a mistake.

I'd say that if WinCE is considered the main universe, then wchar_t makes
sense, but in a broader perspective it does less so.  The choice of
encoding things as 16-bit shorts already causes trouble with >65k char
codes.  UTF-8 is compact, portable, endian-neutral, and capable of
storing unlimited char sets.  It's the equivalent of people writing words
by stringing characters together.

-jcw


Reply | Threaded
Open this post in threaded view
|

RE: Windows CE

Anna Hester
In reply to this post by Johan Liseborn
I disagree with the statement that UTF-8 is the best option for supporting
Unicode characters. Best is to say that UTF-8 is a good choice for systems
that cannot (or don't want to) be modified to use 16-bit values for a
character.

I will quote Mark Davis, from IBM, and also the president of the Unicode
Consortium to support my point of view:

"Ultimately, the choice of which encoding format to use will depend heavily
on the programming environment. For systems that only offer 8-bit strings
currently, but are multi-byte enabled, UTF-8 may be the best choice. For
systems that do not care about storage requirements, UTF-32 may be best. For
systems such as Windows, Java, or ICU that use UTF-16 strings already,
UTF-16 is the obvious choice. Even if they have not yet upgraded to fully
support surrogates, they will be before long. 

If the programming environment is not an issue, UTF-16 is recommended as a
good compromise between elegance, performance, and storage."

(more info about the differences between UTF-16 and UTF-8 on
http://www-106.ibm.com/developerworks/library/utfencodingforms/)

The Microsoft's OS APIs are natively UNICODE as they use the wchar_t type to
represent characters. wchar_t supports directly UNICODE through UCS-2 and
UTF-16. The Win32 functions use currently UCS-2 internally because this is
more convenient from a programming point of view (i.e, characters are fixed
length).

I think that the statement "storing data on file with anything other than
UTF-8 would IMHO be a mistake" only holds if you're thinking of storing text
in English or Latin languages. If you go to Japanese & Chinese, then you're
talking about 3 bytes per character, which is more than UTF-16 and UCS-2.

It makes a lot of sense to an Operating System to prefer a fixed-length
schema, because of the several issues introduced with variable-length
schemas, such as the performance impact and the required changes on
low-level algorithms (that rely on a known character size). This also seems
to be the case for other class of systems, e.g. DB Servers (MS SQL Server
2000 uses UCS-2 internally, and this approach provides many advantages for a
db system. A non-MS reference is:
http://www-106.ibm.com/developerworks/library/unicode-db-process/index.html)
.

I think it would be better for Lua to support Unicode via wchar_t if all the
target underlying systems could support this. Because this is not the case,
the use of UTF-8 sounds like a reasonable approach. 

Thanks,
-- Anna



-----Original Message-----
From: Jean-Claude Wippler [[hidden email]]
Sent: Tuesday, February 20, 2001 8:57 AM
To: Multiple recipients of list
Subject: Re: Windows CE 


Roberto Ierusalimschy <[hidden email]> wrote:

[UTF-8]
>But, then, my other question: what is the relationship between Windows CE 
>and Unicode? Why did everybody that tryed to port Lua to Windows CE come up

>with this subject? Why can't they just use this approach (UTF-8)?
>(this is pure ignorance of my part; I know nothing about Windows CE...) 

The machine is based on wchar_t as way to pass strings in and out, so
people tend to think it needs to be that way inside their code as well. 
IMO, this is not the case - it's a conversion issue just like converting
numbers to/from printable form is.  Or closer to home: just like Lua
converts everything back and forth from doubles when it needs to
interface with things outside it.  If conversions are used only for
information going to/from the user interface, and things like file names,
then they need not become a bottleneck.  As I said before, storing data
on file with anything other than UTF-8 would IMHO be a mistake.

I'd say that if WinCE is considered the main universe, then wchar_t makes
sense, but in a broader perspective it does less so.  The choice of
encoding things as 16-bit shorts already causes trouble with >65k char
codes.  UTF-8 is compact, portable, endian-neutral, and capable of
storing unlimited char sets.  It's the equivalent of people writing words
by stringing characters together.

-jcw

Reply | Threaded
Open this post in threaded view
|

RE: Windows CE

Vincent Penquerc'h-3
> I think that the statement "storing data on file with anything other than
> UTF-8 would IMHO be a mistake" only holds if you're thinking of
> storing text
> in English or Latin languages. If you go to Japanese & Chinese,
> then you're
> talking about 3 bytes per character, which is more than UTF-16 and UCS-2.

While this is indeed true, and often annoying for people who use non latin
character sets, a good part of the strings that Lua uses are program code,
which is usually made up of latin characters.
This fact is fuzzied by the fact that Lua can accept identifiers using non
ASCII characters, depending on the locale in use. Still, the proportion of
latin characters is likely to be high enough for the space gain to be a
worthwhile tradeoff.
I agree that string intensive programs will diminish this ratio though, if
using non latin characters.

> I think it would be better for Lua to support Unicode via wchar_t if all
the
> target underlying systems could support this. Because this is not the
case,
> the use of UTF-8 sounds like a reasonable approach.

>From what I've read on the list, one of the key advantages of Lua over other
languages is its small size. Indiscriminate use of wchar_t for all strings
would be a waste of space. I don't deny the fact that wchar_t has
advantages,
but I thought I would underline this.

Thanks

--
Vincent Penquerc'h


Reply | Threaded
Open this post in threaded view
|

Re: Windows CE

Michael T. Richter-2
In reply to this post by Roberto Ierusalimschy
> But, then, my other question: what is the relationship between Windows CE
> and Unicode? Why did everybody that tryed to port Lua to Windows CE come
up
> with this subject? Why can't they just use this approach (UTF-8)?
> (this is pure ignorance of my part; I know nothing about Windows CE...)

File names, et al in Win32 are all pure Unicode and, to keep the WinCE size
down, Microsoft, in their near-infinite wisdom and judgement, decided to do
away with the <foo>A versions of the APIs which would accept ASCII (or
UTF-8, presumably) names and convert to Unicode behind the scenes.  This
means that if I want to open a file, I have to manually convert the string
(UTF-8 or ASCII, same difference in this case) to Unicode before using it.

--
Michael T. Richter
"Be seeing you."



rje
Reply | Threaded
Open this post in threaded view
|

Re: Windows CE

rje
On Fri, Feb 23, 2001 at 12:21:00AM -0500, Michael T. Richter wrote:
> > But, then, my other question: what is the relationship between Windows CE
> > and Unicode? Why did everybody that tryed to port Lua to Windows CE come
> up
> > with this subject? Why can't they just use this approach (UTF-8)?
> > (this is pure ignorance of my part; I know nothing about Windows CE...)
> 
> File names, et al in Win32 are all pure Unicode and, to keep the WinCE size
> down, Microsoft, in their near-infinite wisdom and judgement, decided to do
> away with the <foo>A versions of the APIs which would accept ASCII (or
> UTF-8, presumably) names and convert to Unicode behind the scenes.  This
> means that if I want to open a file, I have to manually convert the string
> (UTF-8 or ASCII, same difference in this case) to Unicode before using it.

Forgive me for being dense, for I know very little about unicode.  Doesn't
this mean that all strings in WinCE take up twice as much room as they would
normally require?  If so, in normal use, is it possible that they actually
waste more memory than they save by removing the ASCII compatible calls?
(I'm no Windows person, either)

-- 
Rob Kendrick - http://www.digital-scurf.org/
Your reasoning powers are good, and you are a fairly good planner.

Reply | Threaded
Open this post in threaded view
|

RE: Windows CE

Philippe Lhoste-2
> > File names, et al in Win32 are all pure Unicode and, to keep
> the WinCE size
> > down, Microsoft, in their near-infinite wisdom and judgement,
> decided to do
> > away with the <foo>A versions of the APIs which would accept ASCII (or
> > UTF-8, presumably) names and convert to Unicode behind the scenes.  This
> > means that if I want to open a file, I have to manually convert
> the string
> > (UTF-8 or ASCII, same difference in this case) to Unicode
> before using it.
>
> Forgive me for being dense, for I know very little about unicode.  Doesn't
> this mean that all strings in WinCE take up twice as much room as
> they would
> normally require?  If so, in normal use, is it possible that they actually
> waste more memory than they save by removing the ASCII compatible calls?
> (I'm no Windows person, either)

I don't know much about Unicode myself, no WinCE devices, but I think you
are mixing up things here.
I think that Ascii compatible functions are taking space in the Rom (or hard
disk if the system has one and stores OS on it).
Using UTF-16 (or similar) wastes Ram, indeed, and hard disk space too
(cheaper than Ram, though).
At least from the occidental point of view. After all, 7 bit storage should
be enough for english speaking people, so 8 bit Ascii is a waste of space
too (but storing 7 bit characters in 8 bit memory cells costs a lot of
processing time too). Remember telex used to transmit 6 bit characters (all
capitals).
It's always a tale of compromises between space -- we tend to have plenty of
it, less on hand-held device but this is quickly changing, and processing
time -- not a major issue today too. I guess no one have the perfect
solution. As someone stated here, it also depends on your needs.

Regards.

--._.·´¯`·._.·´¯`·._.·´¯`·._.·´¯`·._.·´¯`·._.·´¯`·._.--
Philippe Lhoste (Paris -- France)
Professional programmer and amateur artist
http://jove.prohosting.com/~philho/
--´¯`·._.·´¯`·._.·´¯`·._.·´¯`·._.·´¯`·._.·´¯`·._.·´¯`--


Reply | Threaded
Open this post in threaded view
|

Re: Windows CE

Michael T. Richter-2
In reply to this post by rje
>> File names, et al in Win32 are all pure Unicode and, to keep the WinCE
size
>> down, Microsoft, in their near-infinite wisdom and judgement, decided to
do
>> away with the <foo>A versions of the APIs which would accept ASCII (or
>> UTF-8, presumably) names and convert to Unicode behind the scenes.  This
>> means that if I want to open a file, I have to manually convert the
string
>> (UTF-8 or ASCII, same difference in this case) to Unicode before using
it.

> Forgive me for being dense, for I know very little about unicode.  Doesn't
> this mean that all strings in WinCE take up twice as much room as they
would
> normally require?  If so, in normal use, is it possible that they actually
> waste more memory than they save by removing the ASCII compatible calls?
> (I'm no Windows person, either)

Yes, quite possibly.  The trade-off is, however, that Windows CE is
trivially internationalised.  If you want internationalisation and have
chosen Unicode as your mechanism to do so, removing the ASCII calls is the
only way left to save memory.  (And keep in mind that the Win32 has
hundreds, if not thousands, of API calls.  Throwing away half of them
translates to a significant savings.)

I guess they could have used UTF-8 as their internationalisation mechanism,
but UTF-8 holds significant processor overhead, some problems in picking
specific characters in a string and the problem that in non-Latin alphabets
it actually takes up more space than Unicode.

--
Michael T. Richter
"Be seeing you."