Issues: Character 160 - Non-breaking space + Additional Issue with UTF-8

classic Classic list List threaded Threaded
66 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

Re: Issues: Character 160 - Non-breaking space + Additional Issue with UTF-8

Gregg Reynolds-2


On Tue, Jul 10, 2018, 4:44 PM Gregg Reynolds <[hidden email]> wrote:

 (e.g. numbers in ltr scripts).

Correction: numbers in rtl scripts. Unicode says that numbers in e.g. Arabic are ltr. This is complete BS, but it is also a fact on the ground that cannot be fixed. Extra credit: estimate the cost of this very fundamental mistake.
Reply | Threaded
Open this post in threaded view
|

Re: Issues: Character 160 - Non-breaking space + Additional Issue with UTF-8

Sean Conner
In reply to this post by Gregg Reynolds-2
It was thus said that the Great Gregg Reynolds once stated:

> On Tue, Jul 10, 2018, 4:44 PM Dirk Laurie <[hidden email]> wrote:
> ...
>
> >
> > I. Am. Not. Asking. For. Unicode.
> >
> > I am merely asking for extra functions along the lines of what the
> > utf8 library already does.
> > E.g. Sam's examples:
> >
> > > s1 = "Hélène"
> > > s2 = "Hélène"

  They look similar, but they are construct differently.

> FYI these look identical on Android.
>
> > > utf8.len(s1)
> > 6
> > > utf8.len(s2)
> > 7
> >
> > If you really not understand what I mean, I can elaborate.
>
> Please do.
>
> What does "len" mean? Number of Unicode chars ot number of bytes?

  The number of Unicode code points.  The second one has a letter 'e'
followed by a combining accent (I'm not sure which accent is the combining
one), thus the different number of Unicode code points.

  -spc


Reply | Threaded
Open this post in threaded view
|

Re: Issues: Character 160 - Non-breaking space + Additional Issue with UTF-8

Alysson Cunha
In reply to this post by Gregg Reynolds-2
there are 3 entities with unicode strings::

1 - The bytes according to the encoding used (UTF-8, UTF-16 Big Endian, UTF-16 Little endian, UTF-32)
2 - The unicode code points - The union of one or more bytes compose the code points
3 - And the trickest of they, the glyphs. One or more unicode code points compose a single glyph.

Example: This flag "🏴󠁧󠁢󠁥󠁮󠁧󠁿" is composed of 7 unicode code points, these code-points encoded as UTF-8 occupies 14 bytes.
A single glyph (the flag) is composed by 7 unicode code points, or 14 UTF-8 bytes.
Many emojis are union of more than 1 code point.... And there are the Composing Code Points .... A + ´  , (2 unicode code points)  that my be presented as "Á" by text editors/text presenters.

I think utf8.len() returns the quantity of Unicode Code Points, not glyphs...

PS: In Delphi, I made a library myself to handle glyphs, code points and bytes....

On Tue, Jul 10, 2018 at 6:56 PM Gregg Reynolds <[hidden email]> wrote:


On Tue, Jul 10, 2018, 4:44 PM Gregg Reynolds <[hidden email]> wrote:

 (e.g. numbers in ltr scripts).

Correction: numbers in rtl scripts. Unicode says that numbers in e.g. Arabic are ltr. This is complete BS, but it is also a fact on the ground that cannot be fixed. Extra credit: estimate the cost of this very fundamental mistake.


--
Alysson Cunha / AlyssonRPG
http://www.rrpg.com.br - Jogue o tradicional RPG de mesa online
Reply | Threaded
Open this post in threaded view
|

Re: Issues: Character 160 - Non-breaking space + Additional Issue with UTF-8

Alysson Cunha
PS: The flag "🏴󠁧󠁢󠁥󠁮󠁧󠁿" is the union of the following unicode code points: 1F3F4 E0067 E0062 E0065 E006E E0067 E007F

A + ´  , (2 unicode code points)  that ****must***** be presented as "Á" by text editors/text presenters.

On Tue, Jul 10, 2018 at 7:20 PM Alysson Cunha <[hidden email]> wrote:
there are 3 entities with unicode strings::

1 - The bytes according to the encoding used (UTF-8, UTF-16 Big Endian, UTF-16 Little endian, UTF-32)
2 - The unicode code points - The union of one or more bytes compose the code points
3 - And the trickest of they, the glyphs. One or more unicode code points compose a single glyph.

Example: This flag "🏴󠁧󠁢󠁥󠁮󠁧󠁿" is composed of 7 unicode code points, these code-points encoded as UTF-8 occupies 14 bytes.
A single glyph (the flag) is composed by 7 unicode code points, or 14 UTF-8 bytes.
Many emojis are union of more than 1 code point.... And there are the Composing Code Points .... A + ´  , (2 unicode code points)  that my be presented as "Á" by text editors/text presenters.

I think utf8.len() returns the quantity of Unicode Code Points, not glyphs...

PS: In Delphi, I made a library myself to handle glyphs, code points and bytes....

On Tue, Jul 10, 2018 at 6:56 PM Gregg Reynolds <[hidden email]> wrote:


On Tue, Jul 10, 2018, 4:44 PM Gregg Reynolds <[hidden email]> wrote:

 (e.g. numbers in ltr scripts).

Correction: numbers in rtl scripts. Unicode says that numbers in e.g. Arabic are ltr. This is complete BS, but it is also a fact on the ground that cannot be fixed. Extra credit: estimate the cost of this very fundamental mistake.


--
Alysson Cunha / AlyssonRPG
http://www.rrpg.com.br - Jogue o tradicional RPG de mesa online


--
Alysson Cunha / AlyssonRPG
http://www.rrpg.com.br - Jogue o tradicional RPG de mesa online
Reply | Threaded
Open this post in threaded view
|

Re: Issues: Character 160 - Non-breaking space + Additional Issue with UTF-8

Alysson Cunha
Errata 2: that flag occupies 14 bytes in UTF-16.. In UTF-8, the flag occupies more.. If my memory is correct, 28 bytes

Em Ter, 10 de jul de 2018 19:23, Alysson Cunha <[hidden email]> escreveu:
PS: The flag "🏴󠁧󠁢󠁥󠁮󠁧󠁿" is the union of the following unicode code points: 1F3F4 E0067 E0062 E0065 E006E E0067 E007F

A + ´  , (2 unicode code points)  that ****must***** be presented as "Á" by text editors/text presenters.

On Tue, Jul 10, 2018 at 7:20 PM Alysson Cunha <[hidden email]> wrote:
there are 3 entities with unicode strings::

1 - The bytes according to the encoding used (UTF-8, UTF-16 Big Endian, UTF-16 Little endian, UTF-32)
2 - The unicode code points - The union of one or more bytes compose the code points
3 - And the trickest of they, the glyphs. One or more unicode code points compose a single glyph.

Example: This flag "🏴󠁧󠁢󠁥󠁮󠁧󠁿" is composed of 7 unicode code points, these code-points encoded as UTF-8 occupies 14 bytes.
A single glyph (the flag) is composed by 7 unicode code points, or 14 UTF-8 bytes.
Many emojis are union of more than 1 code point.... And there are the Composing Code Points .... A + ´  , (2 unicode code points)  that my be presented as "Á" by text editors/text presenters.

I think utf8.len() returns the quantity of Unicode Code Points, not glyphs...

PS: In Delphi, I made a library myself to handle glyphs, code points and bytes....

On Tue, Jul 10, 2018 at 6:56 PM Gregg Reynolds <[hidden email]> wrote:


On Tue, Jul 10, 2018, 4:44 PM Gregg Reynolds <[hidden email]> wrote:

 (e.g. numbers in ltr scripts).

Correction: numbers in rtl scripts. Unicode says that numbers in e.g. Arabic are ltr. This is complete BS, but it is also a fact on the ground that cannot be fixed. Extra credit: estimate the cost of this very fundamental mistake.


--
Alysson Cunha / AlyssonRPG
http://www.rrpg.com.br - Jogue o tradicional RPG de mesa online


--
Alysson Cunha / AlyssonRPG
http://www.rrpg.com.br - Jogue o tradicional RPG de mesa online
Reply | Threaded
Open this post in threaded view
|

Re: Issues: Character 160 - Non-breaking space + Additional Issue with UTF-8

Gregg Reynolds-2
In reply to this post by Sean Conner


On Tue, Jul 10, 2018, 5:17 PM Sean Conner <[hidden email]> wrote:
It was thus said that the Great Gregg Reynolds once stated:
> On Tue, Jul 10, 2018, 4:44 PM Dirk Laurie <[hidden email]> wrote:
> ...
>
> >
> > I. Am. Not. Asking. For. Unicode.
> >
> > I am merely asking for extra functions along the lines of what the
> > utf8 library already does.
> > E.g. Sam's examples:
> >
> > > s1 = "Hélène"
> > > s2 = "Hélène"

  They look similar, but they are construct differently.

> FYI these look identical on Android.
>
> > > utf8.len(s1)
> > 6
> > > utf8.len(s2)
> > 7
> >
> > If you really not understand what I mean, I can elaborate.
>
> Please do.
>
> What does "len" mean? Number of Unicode chars ot number of bytes?

  The number of Unicode code points.  The second one has a letter 'e'
followed by a combining accent (I'm not sure which accent is the combining
one), thus the different number of Unicode code points.

Ok, we have "codepoints", "chars", bytes, and heaven knows what else. Is a Unicode "codepoint" a byte? No. Is "Unicode codepoint" even meaningful?



Reply | Threaded
Open this post in threaded view
|

Re: Issues: Character 160 - Non-breaking space + Additional Issue with UTF-8

Gregg Reynolds-2
In reply to this post by Alysson Cunha


On Tue, Jul 10, 2018, 5:20 PM Alysson Cunha <[hidden email]> wrote:
there are 3 entities with unicode strings::

1 - The bytes according to the encoding used (UTF-8, UTF-16 Big Endian, UTF-16 Little endian, UTF-32)
2 - The unicode code points - The union of one or more bytes compose the code points

Union? I don't think so.

3 - And the trickest of they, the glyphs. One or more unicode code points compose a single glyph.

Unicode does not traffic in glyphs. (Except when it must for backwards compatibility.)

Example: This flag "🏴󠁧󠁢󠁥󠁮󠁧󠁿" is composed of 7 unicode code points, these code-points encoded as UTF-8 occupies 14 bytes.
A single glyph (the flag) is composed by 7 unicode code points, or 14 UTF-8 bytes..

Please try to be precise. If you mean 

Character 'BLACK FLAG' (U+2691)


Then you have a problem. That is exactly one code point and one char. If you mean some other Unicode char, then tell us what it is, in hex.

Reply | Threaded
Open this post in threaded view
|

Re: Issues: Character 160 - Non-breaking space + Additional Issue with UTF-8

Alysson Cunha
Gregg, you're mistaken.


That flag is not the BLACK FLAG... it is the England Flag

image.png

Another good example is the glyph "1️⃣"

It is product of the sequence of unicode code points: 0031 FE0F 20E3 . Note that the first unicode code point in that sequence is the traditional/default '1' character, the same value for the ASCII Encoding... but it is following with 2 more code points, changing the glyph to be rendered.



Em Ter, 10 de jul de 2018 20:53, Gregg Reynolds <[hidden email]> escreveu:


On Tue, Jul 10, 2018, 5:20 PM Alysson Cunha <[hidden email]> wrote:
there are 3 entities with unicode strings::

1 - The bytes according to the encoding used (UTF-8, UTF-16 Big Endian, UTF-16 Little endian, UTF-32)
2 - The unicode code points - The union of one or more bytes compose the code points

Union? I don't think so.

3 - And the trickest of they, the glyphs. One or more unicode code points compose a single glyph.

Unicode does not traffic in glyphs. (Except when it must for backwards compatibility.)

Example: This flag "🏴󠁧󠁢󠁥󠁮󠁧󠁿" is composed of 7 unicode code points, these code-points encoded as UTF-8 occupies 14 bytes.
A single glyph (the flag) is composed by 7 unicode code points, or 14 UTF-8 bytes..

Please try to be precise. If you mean 

Character 'BLACK FLAG' (U+2691)


Then you have a problem. That is exactly one code point and one char. If you mean some other Unicode char, then tell us what it is, in hex.

Reply | Threaded
Open this post in threaded view
|

Re: Issues: Character 160 - Non-breaking space + Additional Issue with UTF-8

Alysson Cunha

On Tue, Jul 10, 2018 at 9:01 PM Alysson Cunha <[hidden email]> wrote:
Gregg, you're mistaken.


That flag is not the BLACK FLAG... it is the England Flag

image.png

Another good example is the glyph "1️⃣"

It is product of the sequence of unicode code points: 0031 FE0F 20E3 . Note that the first unicode code point in that sequence is the traditional/default '1' character, the same value for the ASCII Encoding... but it is following with 2 more code points, changing the glyph to be rendered.



Em Ter, 10 de jul de 2018 20:53, Gregg Reynolds <[hidden email]> escreveu:


On Tue, Jul 10, 2018, 5:20 PM Alysson Cunha <[hidden email]> wrote:
there are 3 entities with unicode strings::

1 - The bytes according to the encoding used (UTF-8, UTF-16 Big Endian, UTF-16 Little endian, UTF-32)
2 - The unicode code points - The union of one or more bytes compose the code points

Union? I don't think so.

3 - And the trickest of they, the glyphs. One or more unicode code points compose a single glyph.

Unicode does not traffic in glyphs. (Except when it must for backwards compatibility.)

Example: This flag "🏴󠁧󠁢󠁥󠁮󠁧󠁿" is composed of 7 unicode code points, these code-points encoded as UTF-8 occupies 14 bytes.
A single glyph (the flag) is composed by 7 unicode code points, or 14 UTF-8 bytes..

Please try to be precise. If you mean 

Character 'BLACK FLAG' (U+2691)


Then you have a problem. That is exactly one code point and one char. If you mean some other Unicode char, then tell us what it is, in hex.



--
Alysson Cunha / AlyssonRPG
http://www.rrpg.com.br - Jogue o tradicional RPG de mesa online
Reply | Threaded
Open this post in threaded view
|

Re: Issues: Character 160 - Non-breaking space + Additional Issue with UTF-8

Sean Conner
In reply to this post by Gregg Reynolds-2
It was thus said that the Great Gregg Reynolds once stated:

> On Tue, Jul 10, 2018, 5:20 PM Alysson Cunha <[hidden email]> wrote:
>
> > there are 3 entities with unicode strings::
> >
> > 1 - The bytes according to the encoding used (UTF-8, UTF-16 Big Endian,
> > UTF-16 Little endian, UTF-32)
> > 2 - The unicode code points - The union of one or more bytes compose the
> > code points
> >
>
> Union? I don't think so.

  Union as in United States, not union as in C.

  -spc (Please check a dictionary)

Reply | Threaded
Open this post in threaded view
|

Re: Issues: Character 160 - Non-breaking space + Additional Issue with UTF-8

Axel Kittenberger
In reply to this post by Alysson Cunha
I think utf8.len() returns the quantity of Unicode Code Points, not glyphs...

It simply cannot be glyphs, since ligatures are part of the font used, len() without a specific font cannot know the number of glyphs. "fi" for example is a very common ligature being two characters but one glyph. Also one character can be encoded as union of two glyphs. Like in many fonts ö is o plus ". etc.
Reply | Threaded
Open this post in threaded view
|

Re: Issues: Character 160 - Non-breaking space + Additional Issue with UTF-8

Sam Putman
In reply to this post by Alysson Cunha


On Wed, Jul 11, 2018 at 2:01 AM, Alysson Cunha <[hidden email]> wrote:
Gregg, you're mistaken.


That flag is not the BLACK FLAG... it is the England Flag



🏴󠁧󠁢󠁥󠁮󠁧󠁿 Somehow your string was rendering the black flag, not the flag of England.


Goes to show: Unicode is tough. 
Reply | Threaded
Open this post in threaded view
|

Re: Issues: Character 160 - Non-breaking space + Additional Issue with UTF-8

Sam Putman


On Wed, Jul 11, 2018 at 7:27 AM, Sam Putman <[hidden email]> wrote:


On Wed, Jul 11, 2018 at 2:01 AM, Alysson Cunha <[hidden email]> wrote:
Gregg, you're mistaken.


That flag is not the BLACK FLAG... it is the England Flag



🏴󠁧󠁢󠁥󠁮󠁧󠁿 Somehow your string was rendering the black flag, not the flag of England.


Goes to show: Unicode is tough. 

...Did it to me as well! SMTP is also challenging ^_^
Reply | Threaded
Open this post in threaded view
|

Re: Issues: Character 160 - Non-breaking space + Additional Issue with UTF-8

Alysson Cunha
In reply to this post by Axel Kittenberger
Logic Glyphs maybe is the correct term so. Think about the many existing Android's software keyboard. They don't know what is the font the app is rendering, but when you press the "virtual backspace" button to delete a emoji from content, they understand that the glyph you are trying to delete occupies more than 1 UTF-16 Java char.

Even without the information about font, the virtual keyboard send commands to delete many utf-16 Java chars at once just to delete a glyph.

So, the concept of glyph in unicode exists without the font. Maybe the term should be something like logic glyph, or well known glyphs sequences... 

Em Qua, 11 de jul de 2018 02:14, Axel Kittenberger <[hidden email]> escreveu:
I think utf8.len() returns the quantity of Unicode Code Points, not glyphs...

It simply cannot be glyphs, since ligatures are part of the font used, len() without a specific font cannot know the number of glyphs. "fi" for example is a very common ligature being two characters but one glyph. Also one character can be encoded as union of two glyphs. Like in many fonts ö is o plus ". etc.
Reply | Threaded
Open this post in threaded view
|

Re: Issues: Character 160 - Non-breaking space + Additional Issue with UTF-8

Axel Kittenberger
So, the concept of glyph in unicode exists without the font. Maybe the term should be something like logic glyph, or well known glyphs sequences... 

What you are refererring to is called "character" in typesetting (the lookup index in the font table) and has nothing to with "Java char". "Glyph" as concept makes only sense when typesetting with a font. 

Please familiarize yourself more with the horrible depths of typesetting before trying to invent new terminology.

When editors tread modifying characters together with the character that is being modified as one "character" to be edited, is a feature of said editor. 

Typesetting is really a complicated thing I also only dabble in (a little while ago I wrote the truetype font hinting engine for opentype.js where I encountered a good part of it) and honestly - I repeat - people suggesting it to be included in this or that way more often than not underestimate the size of the can of worms they want to open or are just happy with their local hack that ignores by much of the issues that certainly be encountered by other people.
Reply | Threaded
Open this post in threaded view
|

Re: Issues: Character 160 - Non-breaking space + Additional Issue with UTF-8

Dirk Laurie-2
2018-07-11 9:24 GMT+02:00 Axel Kittenberger <[hidden email]>:

> people suggesting it to be included in this or that way more often
> than not underestimate the size of the can of worms they want to
> open or are just happy with their local hack that ignores by much
> of the issues that certainly be encountered by other people.
t
The Lua 5.3 utf8 library deals with an abstraction that happens
to make the writing of such local hacks easier but actually does
not have anything to so with Unicode/ I have forked this thread
under a new subject to emphasize that the utf8 library is agnostic
about Unicode and all its intricacies.

Reply | Threaded
Open this post in threaded view
|

Re: Issues: Character 160 - Non-breaking space + Additional Issue with UTF-8

云风 Cloud Wu
In reply to this post by Lorenzo Donati-3
Lorenzo Donati <[hidden email]>于2018年7月10日周二下午9:30写道:
I know my view is a bit "western-centric" (or "latin-centrinc") and
people speaking languages who need thousands of symbols to be written
might think differently (especially Asian languages).

Anyway I'm curious to know how, say, Chinese programmers view the thing.
Would they find coding more "easy" if they could write programs using
ideograms or do they think using transliteration of their words in a
Latin alphabet. 

As a Chinese programmer, I hate none standard ascii space character. U+3000 is another space for Asian languages, It's a common typo to use U+3000 in code when you are using Chinese input method. So I always set my editor to highlight none-ascii spaces, and use special color to distinguish tab/space/lf/cr .

Using Chinese words in coding would be more trouble than it's worth, if we can't find proper English word for variable/function name, we can use pinyin (  https://en.wikipedia.org/wiki/Pinyin) instead .

btw, I don't like using none-ascii characters in filename, too, because of Windows. I agree it's a nightmare.



Reply | Threaded
Open this post in threaded view
|

Re: Issues: Character 160 - Non-breaking space + Additional Issue with UTF-8

Lorenzo Donati-3
On 11/07/2018 10:50, 云风 Cloud Wu wrote:

> Lorenzo Donati <[hidden email]>于2018年7月10日周二下午9:30写道:
>
>> I know my view is a bit "western-centric" (or "latin-centrinc") and
>> people speaking languages who need thousands of symbols to be written
>> might think differently (especially Asian languages).
>>
>> Anyway I'm curious to know how, say, Chinese programmers view the thing.
>> Would they find coding more "easy" if they could write programs using
>> ideograms or do they think using transliteration of their words in a
>> Latin alphabet.
>
>
> As a Chinese programmer, I hate none standard ascii space character. U+3000
> is another space for Asian languages, It's a common typo to use U+3000 in
> code when you are using Chinese input method. So I always set my editor to
> highlight none-ascii spaces, and use special color to distinguish
> tab/space/lf/cr .
>
> Using Chinese words in coding would be more trouble than it's worth, if we
> can't find proper English word for variable/function name, we can use
> pinyin (  https://en.wikipedia.org/wiki/Pinyin) instead .
>

Thank you for the info. Very interesting!


> btw, I don't like using none-ascii characters in filename, too, because
> of Windows. I agree it's a nightmare.
>



Reply | Threaded
Open this post in threaded view
|

Re: Issues: Character 160 - Non-breaking space + Additional Issue with UTF-8

Alysson Cunha
In reply to this post by Axel Kittenberger

Probably you are correct about the term "character".
   Although there are the combining characters, that you must parse 1 base character followed by one or more sequence of combining characters to modify and form a single and distinct digital typographic character.

In the Unicode definition, good unicode-aware string comparers/searches, like the one in the SQL Databases, should handle as equal the two cases:
  - "Á" character (single code point: \u00C1)
  - "Á" character in decomposed form (combination of 2 code points: \u0041 + \u0301)

The second one....  "" is a single character that is the result of 2 other characters (a base character and a combining character) and have length of unicode code points equals to 2.

Many programmers associates a Typographic Character to the programing data type "char" or "wchar_t", but that is not entirely true with unicode, because:
* With wchar/UTF-16, there are the surrogates pairs.
* There are combining characters
* There are well known and defined code point sequences

On Wed, Jul 11, 2018 at 4:25 AM Axel Kittenberger <[hidden email]> wrote:
So, the concept of glyph in unicode exists without the font. Maybe the term should be something like logic glyph, or well known glyphs sequences... 
 
What you are refererring to is called "character" in typesetting (the lookup index in the font table) and has nothing to with "Java char". "Glyph" as concept makes only sense when typesetting with a font. 

Please familiarize yourself more with the horrible depths of typesetting before trying to invent new terminology.

When editors tread modifying characters together with the character that is being modified as one "character" to be edited, is a feature of said editor. 

Typesetting is really a complicated thing I also only dabble in (a little while ago I wrote the truetype font hinting engine for opentype.js where I encountered a good part of it) and honestly - I repeat - people suggesting it to be included in this or that way more often than not underestimate the size of the can of worms they want to open or are just happy with their local hack that ignores by much of the issues that certainly be encountered by other people.


--
Alysson Cunha / AlyssonRPG
http://www.rrpg.com.br - Jogue o tradicional RPG de mesa online
Reply | Threaded
Open this post in threaded view
|

Re: Issues: Character 160 - Non-breaking space + Additional Issue with UTF-8

Frédéric van der Plancke
In reply to this post by Gregg Reynolds-2
On 10/07/2018 23:56, Gregg Reynolds wrote:


On Tue, Jul 10, 2018, 4:44 PM Gregg Reynolds <[hidden email]> wrote:

 (e.g. numbers in ltr scripts).

Correction: numbers in rtl scripts. Unicode says that numbers in e.g. Arabic are ltr. This is complete BS, but it is also a fact on the ground that cannot be fixed. Extra credit: estimate the cost of this very fundamental mistake.

I'm not sure it's a mistake, it may be a well-though design compromise.

In arabic, the numbers are written in the same orientation as we do in european languages, because of a double inversion: from right to left, they first write the unit, then the 10s, then the 100s... the end result being that in both writing systems, the units go to the right and the heavier digits go to the left.

See https://ar.wikipedia.org/wiki/%D8%AE%D8%B7_%D8%B2%D9%85%D9%86%D9%8A_%D9%84%D8%AA%D8%A7%D8%B1%D9%8A%D8%AE_%D8%A7%D9%84%D8%B9%D8%A7%D9%84%D9%85 for examples. (It's the arabic counterpart to https://en.wikipedia.org/wiki/Timelines_of_world_history with plenty of numbers in a non-mathematical context.)

Frederic


1234