UTF-8 on windows terminal

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

UTF-8 on windows terminal

aryajur
Hi,
           I was trying to find out an equivalent to the python encode function. For example when I start Python 3.8 on windows cmd shell and I do:

a = "pythön!"
a.encode("utf-8")

it generates:
b'pyth\xc3\xb6n!'

I was looking into the utf8 library in Lua. So in the windows cmd shell if I do:
utf8.codepoint("pythön!",1,-1)

I get: 
stdin:1: invalid UTF-8 code
stack traceback:
        [C]: in function 'utf8.codepoint'
        stdin:1: in main chunk
        [C]: in ?

After some searching I found the command chcp and I start a new cmd and run the command:
chcp 65001
and then run Lua and do
  utf8.codepoint("pythön!",1,-1) 

this it takes as an incomplete statement. It says:
stdin:1: unfinished string near '"pyth'

How can I emulate the python encode function? 
Why does that work on the windows cmd and utf8.codepoint does not? 

Thanks,
Milind
 
Reply | Threaded
Open this post in threaded view
|

Re: UTF-8 on windows terminal

Scott Morgan
On 12/07/2020 21:04, Milind Gupta wrote:
> How can I emulate the python encode function? 
> Why does that work on the windows cmd and utf8.codepoint does not? 
>

tl;dr CMD.EXE isn't fully UTF8 compliant. No idea what tricks Python is
pulling (recoding from a local codepage? Possible if you didn't chcp
65001 first)

AFAIK, this is the last word on the issue:

https://devblogs.microsoft.com/commandline/windows-command-line-unicode-and-utf-8-output-text-buffer/
> The current changes also don’t cover what is required for our “processed input mode” that presents an editable input line for applications like CMD.exe.
Reply | Threaded
Open this post in threaded view
|

Re: UTF-8 on windows terminal

Lorenzo Donati-3
On 13/07/2020 02:59, Scott Morgan wrote:
> On 12/07/2020 21:04, Milind Gupta wrote:
>> How can I emulate the python encode function?
>> Why does that work on the windows cmd and utf8.codepoint does not?
>>
>
> tl;dr CMD.EXE isn't fully UTF8 compliant. No idea what tricks Python is
> pulling (recoding from a local codepage? Possible if you didn't chcp
> 65001 first)
>

As far as far I remember, cmd.exe has no UTF-8 support at all (besides
that MS effort for Win10 you quote below - BTW, thanks, I didn't know
anything about that - I'm still on Win7 most of the time).

cmd.exe can run in UTF-16 compliant mode, but no UTF-8 processing. CHCP
65001 code page only changes the code page and thus some console
handling for console applications (that anyway must do the right thing).

cmd.exe /is/ a console application, but doesn't handle UTF-8, i.e. it
doesn't do the right thing.

See this:

https://ss64.com/nt/chcp.html

and this SO answer in particular:

https://stackoverflow.com/a/47843552/2633423


> AFAIK, this is the last word on the issue:
>
> https://devblogs.microsoft.com/commandline/windows-command-line-unicode-and-utf-8-output-text-buffer/
>> The current changes also don’t cover what is required for our “processed input mode” that presents an editable input line for applications like CMD.exe.

Reply | Threaded
Open this post in threaded view
|

Re: UTF-8 on windows terminal

Lorenzo Donati-3
In reply to this post by Scott Morgan
On 13/07/2020 02:59, Scott Morgan wrote:
> On 12/07/2020 21:04, Milind Gupta wrote:
>> How can I emulate the python encode function?
>> Why does that work on the windows cmd and utf8.codepoint does not?
>>
>
> tl;dr CMD.EXE isn't fully UTF8 compliant. No idea what tricks Python is
> pulling (recoding from a local codepage? Possible if you didn't chcp
> 65001 first)
>

BTW, in the previous SO link i posted there is a link to Python docs
which apparently answers your "what tricks Python is pulling" question:

https://www.python.org/dev/peps/pep-0528/

> AFAIK, this is the last word on the issue:
>
> https://devblogs.microsoft.com/commandline/windows-command-line-unicode-and-utf-8-output-text-buffer/
>> The current changes also don’t cover what is required for our “processed input mode” that presents an editable input line for applications like CMD.exe.