mlterm

Tested Software version 3.9.3 on Linux Full results available at ucs-detect repository path data/linux-mlterm-3.9.3.yaml

Wide character support

The best wide unicode table version for mlterm appears to be 15.0.0, this is from a summary of the following results:

version

n_errors

n_total

pct_success

‘5.1.0’

0

26

100.0%

‘5.2.0’

78

269

71.0%

‘6.0.0’

0

13

100.0%

‘9.0.0’

0

5000

100.0%

‘10.0.0’

73

735

90.1%

‘11.0.0’

6

62

90.3%

‘12.0.0’

6

62

90.3%

‘12.1.0’

0

1

100.0%

‘13.0.0’

55

541

89.8%

‘14.0.0’

4

41

90.2%

‘15.0.0’

1

15

93.3%

‘15.1.0’

5

5

0.0%

Sequence of a WIDE character from Unicode Version 15.0.0, from midpoint of alignment failure records:

Codepoint

Python

Category

wcwidth

Name

U+0001FABC

‘\U0001fabc’

So

2

JELLYFISH

Total codepoints: 1

  • Shell test using printf(1), '|' should align in output:

    $ printf "\xf0\x9f\xaa\xbc|\\n12|\\n"
    🪼|
    12|
    
  • python wcwidth.wcswidth() measures width 2, while mlterm measures width 0.

Emoji ZWJ support

The best Emoji ZWJ table version for mlterm appears to be None, this is from a summary of the following results:

version

n_errors

n_total

pct_success

‘2.0’

22

22

0.0%

‘4.0’

500

500

0.0%

‘5.0’

100

100

0.0%

‘11.0’

73

73

0.0%

‘12.0’

112

112

0.0%

‘12.1’

165

165

0.0%

‘13.0’

51

51

0.0%

‘13.1’

83

83

0.0%

‘14.0’

20

20

0.0%

‘15.0’

1

1

0.0%

‘15.1’

109

109

0.0%

Sequence of an Emoji ZWJ Sequence from Emoji Version 15.1, from midpoint of alignment failure records:

Codepoint

Python

Category

wcwidth

Name

U+0001F9D1

‘\U0001f9d1’

So

2

ADULT

U+200D

‘\u200d’

Cf

0

ZERO WIDTH JOINER

U+0001F9BC

‘\U0001f9bc’

So

2

MOTORIZED WHEELCHAIR

U+200D

‘\u200d’

Cf

0

ZERO WIDTH JOINER

U+27A1

‘\u27a1’

So

1

BLACK RIGHTWARDS ARROW

U+FE0F

‘\ufe0f’

Mn

0

VARIATION SELECTOR-16

Total codepoints: 6

  • Shell test using printf(1), '|' should align in output:

    $ printf "\xf0\x9f\xa7\x91\xe2\x80\x8d\xf0\x9f\xa6\xbc\xe2\x80\x8d\xe2\x9e\xa1\xef\xb8\x8f|\\n12|\\n"
    🧑‍🦼‍➡️|
    12|
    
  • python wcwidth.wcswidth() measures width 2, while mlterm measures width 7.

Variation Selector-16 support

Emoji VS-16 results for mlterm is 100 errors out of 100 total codepoints tested, 0.0% success. Sequence of a NARROW Emoji made WIDE by Variation Selector-16, from midpoint of alignment failure records:

Codepoint

Python

Category

wcwidth

Name

U+0001F325

‘\U0001f325’

So

1

WHITE SUN BEHIND CLOUD

U+FE0F

‘\ufe0f’

Mn

0

VARIATION SELECTOR-16

Total codepoints: 2

  • Shell test using printf(1), '|' should align in output:

    $ printf "\xf0\x9f\x8c\xa5\xef\xb8\x8f|\\n12|\\n"
    🌥️|
    12|
    
  • python wcwidth.wcswidth() measures width 2, while mlterm measures width 1.

Language Support

The following 82 languages were tested with 100% success:

Adyghe, Aja, Amarakaeri, Arabic, Standard, Assyrian Neo-Aramaic, Baatonum, Bamun, Bhojpuri, Bora, Burmese, Chakma, Cherokee (cased), Chickasaw, Chinantec, Chiltepec, Dagaare, Southern, Dangme, Dendi, Dinka, Northeastern, Ditammari, Dzongkha, Evenki, Fon, Fur, Ga, Gen, Gilyak, Gujarati, Gumuz, Hindi, Idoma, Kabardian, Khmer, Central, Khün, Lamnso’, Lao, Lingala (tones), Magahi, Maithili, Maldivian, Mazahua Central, Mixtec, Metlatónoc, Mon, Mòoré, Nanai, Navajo, Nuosu, Orok, Otomi, Mezquital, Panjabi, Eastern, Pashto, Northern, Picard, Pular (Adlam), Sanskrit, Sanskrit (Grantha), Secoya, Seraiki, Serer-Sine, Shan, Siona, South Azerbaijani, Tagalog (Tagalog), Tai Dam, Tamang, Eastern, Tamazight, Central Atlas, Tamazight, Central Atlas (Tifinagh), Tamazight, Standard Morocan, Tamil, Tamil (Sri Lanka), Telugu, Tem, Thai, Thai (2), Ticuna, Uduk, Vai, Veps, Vietnamese, Vietnamese (Han nom), Waama, Yoruba, Yukaghir, Northern, Éwé.

The following 16 languages are not fully supported:

lang

n_errors

n_total

pct_success

Malayalam

357

1630

78.1%

Javanese (Javanese)

242

1453

83.3%

Mongolian, Halh (Mongolian)

3

33

90.9%

Sinhala

107

1655

93.5%

Bengali

80

1413

94.3%

Farsi, Western

39

1822

97.9%

Dari

36

1872

98.1%

Tibetan, Central

2

260

99.2%

Marathi

9

1614

99.4%

Yaneshaʼ

6

2536

99.8%

Nepali

3

1385

99.8%

Kannada

1

1080

99.9%

Panjabi, Western

2

2419

99.9%

Yiddish, Eastern

1

1775

99.9%

Urdu

1

2237

100.0%

Urdu (2)

1

2251

100.0%

Malayalam

Sequence of language Malayalam from midpoint of alignment failure records:

Codepoint

Python

Category

wcwidth

Name

U+0D38

‘\u0d38’

Lo

1

MALAYALAM LETTER SA

U+0D4D

‘\u0d4d’

Mn

0

MALAYALAM SIGN VIRAMA

U+0D25

‘\u0d25’

Lo

1

MALAYALAM LETTER THA

U+0D3E

‘\u0d3e’

Mc

0

MALAYALAM VOWEL SIGN AA

U+0D2A

‘\u0d2a’

Lo

1

MALAYALAM LETTER PA

U+0D28

‘\u0d28’

Lo

1

MALAYALAM LETTER NA

U+0D2E

‘\u0d2e’

Lo

1

MALAYALAM LETTER MA

U+0D3E

‘\u0d3e’

Mc

0

MALAYALAM VOWEL SIGN AA

U+0D23

‘\u0d23’

Lo

1

MALAYALAM LETTER NNA

U+0D4D

‘\u0d4d’

Mn

0

MALAYALAM SIGN VIRAMA

U+200C

‘\u200c’

Cf

0

ZERO WIDTH NON-JOINER

Total codepoints: 11

  • Shell test using printf(1), '|' should align in output:

    $ printf "\xe0\xb4\xb8\xe0\xb5\x8d\xe0\xb4\xa5\xe0\xb4\xbe\xe0\xb4\xaa\xe0\xb4\xa8\xe0\xb4\xae\xe0\xb4\xbe\xe0\xb4\xa3\xe0\xb5\x8d\xe2\x80\x8c|\\n123456|\\n"
    സ്ഥാപനമാണ്‌|
    123456|
    
  • python wcwidth.wcswidth() measures width 6, while mlterm measures width 7.

Javanese (Javanese)

Sequence of language Javanese (Javanese) from midpoint of alignment failure records:

Codepoint

Python

Category

wcwidth

Name

U+A9A5

‘\ua9a5’

Lo

1

JAVANESE LETTER PA

U+A9B1

‘\ua9b1’

Lo

1

JAVANESE LETTER SA

U+A9AB

‘\ua9ab’

Lo

1

JAVANESE LETTER RA

U+A9BA

‘\ua9ba’

Mc

0

JAVANESE VOWEL SIGN TALING

U+A98F

‘\ua98f’

Lo

1

JAVANESE LETTER KA

U+A9A0

‘\ua9a0’

Lo

1

JAVANESE LETTER TA

Total codepoints: 6

  • Shell test using printf(1), '|' should align in output:

    $ printf "\xea\xa6\xa5\xea\xa6\xb1\xea\xa6\xab\xea\xa6\xba\xea\xa6\x8f\xea\xa6\xa0|\\n12345|\\n"
    ꦥꦱꦫꦺꦏꦠ|
    12345|
    
  • python wcwidth.wcswidth() measures width 5, while mlterm measures width 6.

Mongolian, Halh (Mongolian)

Sequence of language Mongolian, Halh (Mongolian) from midpoint of alignment failure records:

Codepoint

Python

Category

wcwidth

Name

U+1828

‘\u1828’

Lo

1

MONGOLIAN LETTER NA

U+1821

‘\u1821’

Lo

1

MONGOLIAN LETTER E

U+1837

‘\u1837’

Lo

1

MONGOLIAN LETTER RA

U+180E

‘\u180e’

Cf

0

MONGOLIAN VOWEL SEPARATOR

U+1821

‘\u1821’

Lo

1

MONGOLIAN LETTER E

Total codepoints: 5

  • Shell test using printf(1), '|' should align in output:

    $ printf "\xe1\xa0\xa8\xe1\xa0\xa1\xe1\xa0\xb7\xe1\xa0\x8e\xe1\xa0\xa1|\\n1234|\\n"
    ᠨᠡᠷ᠎ᠡ|
    1234|
    
  • python wcwidth.wcswidth() measures width 4, while mlterm measures width 5.

Sinhala

Sequence of language Sinhala from midpoint of alignment failure records:

Codepoint

Python

Category

wcwidth

Name

U+0DB4

‘\u0db4’

Lo

1

SINHALA LETTER ALPAPRAANA PAYANNA

U+0DCA

‘\u0dca’

Mn

0

SINHALA SIGN AL-LAKUNA

U+200D

‘\u200d’

Cf

0

ZERO WIDTH JOINER

U+0DBB

‘\u0dbb’

Lo

1

SINHALA LETTER RAYANNA

U+0D9A

‘\u0d9a’

Lo

1

SINHALA LETTER ALPAPRAANA KAYANNA

U+0DCF

‘\u0dcf’

Mc

0

SINHALA VOWEL SIGN AELA-PILLA

U+0DC1

‘\u0dc1’

Lo

1

SINHALA LETTER TAALUJA SAYANNA

U+0DB1

‘\u0db1’

Lo

1

SINHALA LETTER DANTAJA NAYANNA

U+0DBA

‘\u0dba’

Lo

1

SINHALA LETTER YAYANNA

Total codepoints: 9

  • Shell test using printf(1), '|' should align in output:

    $ printf "\xe0\xb6\xb4\xe0\xb7\x8a\xe2\x80\x8d\xe0\xb6\xbb\xe0\xb6\x9a\xe0\xb7\x8f\xe0\xb7\x81\xe0\xb6\xb1\xe0\xb6\xba|\\n12345|\\n"
    ප්‍රකාශනය|
    12345|
    
  • python wcwidth.wcswidth() measures width 5, while mlterm measures width 7.

Bengali

Sequence of language Bengali from midpoint of alignment failure records:

Codepoint

Python

Category

wcwidth

Name

U+09B8

‘\u09b8’

Lo

1

BENGALI LETTER SA

U+09CD

‘\u09cd’

Mn

0

BENGALI SIGN VIRAMA

U+09AC

‘\u09ac’

Lo

1

BENGALI LETTER BA

U+09C0

‘\u09c0’

Mc

0

BENGALI VOWEL SIGN II

U+0995

‘\u0995’

Lo

1

BENGALI LETTER KA

U+09C3

‘\u09c3’

Mn

0

BENGALI VOWEL SIGN VOCALIC R

U+09A4

‘\u09a4’

Lo

1

BENGALI LETTER TA

U+09BF

‘\u09bf’

Mc

0

BENGALI VOWEL SIGN I

U+200C

‘\u200c’

Cf

0

ZERO WIDTH NON-JOINER

U+0987

‘\u0987’

Lo

1

BENGALI LETTER I

Total codepoints: 10

  • Shell test using printf(1), '|' should align in output:

    $ printf "\xe0\xa6\xb8\xe0\xa7\x8d\xe0\xa6\xac\xe0\xa7\x80\xe0\xa6\x95\xe0\xa7\x83\xe0\xa6\xa4\xe0\xa6\xbf\xe2\x80\x8c\xe0\xa6\x87|\\n12345|\\n"
    স্বীকৃতি‌ই|
    12345|
    
  • python wcwidth.wcswidth() measures width 5, while mlterm measures width 6.

Farsi, Western

Sequence of language Farsi, Western from midpoint of alignment failure records:

Codepoint

Python

Category

wcwidth

Name

U+0648

‘\u0648’

Lo

1

ARABIC LETTER WAW

U+062D

‘\u062d’

Lo

1

ARABIC LETTER HAH

U+0634

‘\u0634’

Lo

1

ARABIC LETTER SHEEN

U+06CC

‘\u06cc’

Lo

1

ARABIC LETTER FARSI YEH

U+0627

‘\u0627’

Lo

1

ARABIC LETTER ALEF

U+0646

‘\u0646’

Lo

1

ARABIC LETTER NOON

U+0647

‘\u0647’

Lo

1

ARABIC LETTER HEH

U+200C

‘\u200c’

Cf

0

ZERO WIDTH NON-JOINER

U+0627

‘\u0627’

Lo

1

ARABIC LETTER ALEF

U+06CC

‘\u06cc’

Lo

1

ARABIC LETTER FARSI YEH

Total codepoints: 10

  • Shell test using printf(1), '|' should align in output:

    $ printf "\xd9\x88\xd8\xad\xd8\xb4\xdb\x8c\xd8\xa7\xd9\x86\xd9\x87\xe2\x80\x8c\xd8\xa7\xdb\x8c|\\n123456789|\\n"
    وحشیانه‌ای|
    123456789|
    
  • python wcwidth.wcswidth() measures width 9, while mlterm measures width 10.

Dari

Sequence of language Dari from midpoint of alignment failure records:

Codepoint

Python

Category

wcwidth

Name

U+0648

‘\u0648’

Lo

1

ARABIC LETTER WAW

U+062D

‘\u062d’

Lo

1

ARABIC LETTER HAH

U+0634

‘\u0634’

Lo

1

ARABIC LETTER SHEEN

U+06CC

‘\u06cc’

Lo

1

ARABIC LETTER FARSI YEH

U+0627

‘\u0627’

Lo

1

ARABIC LETTER ALEF

U+0646

‘\u0646’

Lo

1

ARABIC LETTER NOON

U+0647

‘\u0647’

Lo

1

ARABIC LETTER HEH

U+200C

‘\u200c’

Cf

0

ZERO WIDTH NON-JOINER

U+06CC

‘\u06cc’

Lo

1

ARABIC LETTER FARSI YEH

U+06CC

‘\u06cc’

Lo

1

ARABIC LETTER FARSI YEH

Total codepoints: 10

  • Shell test using printf(1), '|' should align in output:

    $ printf "\xd9\x88\xd8\xad\xd8\xb4\xdb\x8c\xd8\xa7\xd9\x86\xd9\x87\xe2\x80\x8c\xdb\x8c\xdb\x8c|\\n123456789|\\n"
    وحشیانه‌یی|
    123456789|
    
  • python wcwidth.wcswidth() measures width 9, while mlterm measures width 10.

Tibetan, Central

Sequence of language Tibetan, Central from midpoint of alignment failure records:

Codepoint

Python

Category

wcwidth

Name

U+0F7C

‘\u0f7c’

Mn

0

TIBETAN VOWEL SIGN O

U+0F66

‘\u0f66’

Lo

1

TIBETAN LETTER SA

U+0F0B

‘\u0f0b’

Po

1

TIBETAN MARK INTERSYLLABIC TSHEG

U+0F54

‘\u0f54’

Lo

1

TIBETAN LETTER PA

U+0F60

‘\u0f60’

Lo

1

TIBETAN LETTER -A

U+0F72

‘\u0f72’

Mn

0

TIBETAN VOWEL SIGN I

U+0F0B

‘\u0f0b’

Po

1

TIBETAN MARK INTERSYLLABIC TSHEG

U+0F50

‘\u0f50’

Lo

1

TIBETAN LETTER THA

U+0F7C

‘\u0f7c’

Mn

0

TIBETAN VOWEL SIGN O

U+0F56

‘\u0f56’

Lo

1

TIBETAN LETTER BA

U+0F0B

‘\u0f0b’

Po

1

TIBETAN MARK INTERSYLLABIC TSHEG

U+0F51

‘\u0f51’

Lo

1

TIBETAN LETTER DA

U+0F56

‘\u0f56’

Lo

1

TIBETAN LETTER BA

U+0F44

‘\u0f44’

Lo

1

TIBETAN LETTER NGA

U+0F0B

‘\u0f0b’

Po

1

TIBETAN MARK INTERSYLLABIC TSHEG

U+0F61

‘\u0f61’

Lo

1

TIBETAN LETTER YA

U+0F7C

‘\u0f7c’

Mn

0

TIBETAN VOWEL SIGN O

U+0F51

‘\u0f51’

Lo

1

TIBETAN LETTER DA

U+0F0D

‘\u0f0d’

Po

1

TIBETAN MARK SHAD

Total codepoints: 19

  • Shell test using printf(1), '|' should align in output:

    $ printf "\xe0\xbd\xbc\xe0\xbd\xa6\xe0\xbc\x8b\xe0\xbd\x94\xe0\xbd\xa0\xe0\xbd\xb2\xe0\xbc\x8b\xe0\xbd\x90\xe0\xbd\xbc\xe0\xbd\x96\xe0\xbc\x8b\xe0\xbd\x91\xe0\xbd\x96\xe0\xbd\x84\xe0\xbc\x8b\xe0\xbd\xa1\xe0\xbd\xbc\xe0\xbd\x91\xe0\xbc\x8d|\\n123456789012345|\\n"
    ོས་པའི་ཐོབ་དབང་ཡོད།|
    123456789012345|
    
  • python wcwidth.wcswidth() measures width 15, while mlterm measures width 16.

Marathi

Sequence of language Marathi from midpoint of alignment failure records:

Codepoint

Python

Category

wcwidth

Name

U+091C

‘\u091c’

Lo

1

DEVANAGARI LETTER JA

U+094D

‘\u094d’

Mn

0

DEVANAGARI SIGN VIRAMA

U+092F

‘\u092f’

Lo

1

DEVANAGARI LETTER YA

U+093E

‘\u093e’

Mc

0

DEVANAGARI VOWEL SIGN AA

U+200C

‘\u200c’

Cf

0

ZERO WIDTH NON-JOINER

U+0905

‘\u0905’

Lo

1

DEVANAGARI LETTER A

U+0930

‘\u0930’

Lo

1

DEVANAGARI LETTER RA

U+094D

‘\u094d’

Mn

0

DEVANAGARI SIGN VIRAMA

U+0925

‘\u0925’

Lo

1

DEVANAGARI LETTER THA

U+0940

‘\u0940’

Mc

0

DEVANAGARI VOWEL SIGN II

Total codepoints: 10

  • Shell test using printf(1), '|' should align in output:

    $ printf "\xe0\xa4\x9c\xe0\xa5\x8d\xe0\xa4\xaf\xe0\xa4\xbe\xe2\x80\x8c\xe0\xa4\x85\xe0\xa4\xb0\xe0\xa5\x8d\xe0\xa4\xa5\xe0\xa5\x80|\\n12345|\\n"
    ज्या‌अर्थी|
    12345|
    
  • python wcwidth.wcswidth() measures width 5, while mlterm measures width 6.

Yaneshaʼ

Sequence of language Yaneshaʼ from midpoint of alignment failure records:

Codepoint

Python

Category

wcwidth

Name

U+0303

‘\u0303’

Mn

0

COMBINING TILDE

U+0061

‘a’

Ll

1

LATIN SMALL LETTER A

U+006E

‘n’

Ll

1

LATIN SMALL LETTER N

U+0061

‘a’

Ll

1

LATIN SMALL LETTER A

U+0072

‘r’

Ll

1

LATIN SMALL LETTER R

U+0065

‘e’

Ll

1

LATIN SMALL LETTER E

U+0074

‘t’

Ll

1

LATIN SMALL LETTER T

Total codepoints: 7

  • Shell test using printf(1), '|' should align in output:

    $ printf "\xcc\x83anaret|\\n123456|\\n"
    ̃anaret|
    123456|
    
  • python wcwidth.wcswidth() measures width 6, while mlterm measures width 7.

Nepali

Sequence of language Nepali from midpoint of alignment failure records:

Codepoint

Python

Category

wcwidth

Name

U+092A

‘\u092a’

Lo

1

DEVANAGARI LETTER PA

U+0941

‘\u0941’

Mn

0

DEVANAGARI VOWEL SIGN U

U+0930

‘\u0930’

Lo

1

DEVANAGARI LETTER RA

U+094D

‘\u094d’

Mn

0

DEVANAGARI SIGN VIRAMA

U+200D

‘\u200d’

Cf

0

ZERO WIDTH JOINER

U+092F

‘\u092f’

Lo

1

DEVANAGARI LETTER YA

U+093E

‘\u093e’

Mc

0

DEVANAGARI VOWEL SIGN AA

U+0907

‘\u0907’

Lo

1

DEVANAGARI LETTER I

U+090F

‘\u090f’

Lo

1

DEVANAGARI LETTER E

U+0915

‘\u0915’

Lo

1

DEVANAGARI LETTER KA

U+094B

‘\u094b’

Mc

0

DEVANAGARI VOWEL SIGN O

Total codepoints: 11

  • Shell test using printf(1), '|' should align in output:

    $ printf "\xe0\xa4\xaa\xe0\xa5\x81\xe0\xa4\xb0\xe0\xa5\x8d\xe2\x80\x8d\xe0\xa4\xaf\xe0\xa4\xbe\xe0\xa4\x87\xe0\xa4\x8f\xe0\xa4\x95\xe0\xa5\x8b|\\n12345|\\n"
    पुर्‍याइएको|
    12345|
    
  • python wcwidth.wcswidth() measures width 5, while mlterm measures width 7.

Kannada

Sequence of language Kannada from midpoint of alignment failure records:

Codepoint

Python

Category

wcwidth

Name

U+0CB5

‘\u0cb5’

Lo

1

KANNADA LETTER VA

U+0CBE

‘\u0cbe’

Mc

0

KANNADA VOWEL SIGN AA

U+0C95

‘\u0c95’

Lo

1

KANNADA LETTER KA

U+0CCD

‘\u0ccd’

Mn

0

KANNADA SIGN VIRAMA

U+200C

‘\u200c’

Cf

0

ZERO WIDTH NON-JOINER

U+0CB8

‘\u0cb8’

Lo

1

KANNADA LETTER SA

U+0CCD

‘\u0ccd’

Mn

0

KANNADA SIGN VIRAMA

U+0CB5

‘\u0cb5’

Lo

1

KANNADA LETTER VA

U+0CBE

‘\u0cbe’

Mc

0

KANNADA VOWEL SIGN AA

U+0CA4

‘\u0ca4’

Lo

1

KANNADA LETTER TA

U+0C82

‘\u0c82’

Mc

0

KANNADA SIGN ANUSVARA

U+0CA4

‘\u0ca4’

Lo

1

KANNADA LETTER TA

U+0CCD

‘\u0ccd’

Mn

0

KANNADA SIGN VIRAMA

U+0CB0

‘\u0cb0’

Lo

1

KANNADA LETTER RA

U+0CCD

‘\u0ccd’

Mn

0

KANNADA SIGN VIRAMA

U+0CAF

‘\u0caf’

Lo

1

KANNADA LETTER YA

Total codepoints: 16

  • Shell test using printf(1), '|' should align in output:

    $ printf "\xe0\xb2\xb5\xe0\xb2\xbe\xe0\xb2\x95\xe0\xb3\x8d\xe2\x80\x8c\xe0\xb2\xb8\xe0\xb3\x8d\xe0\xb2\xb5\xe0\xb2\xbe\xe0\xb2\xa4\xe0\xb2\x82\xe0\xb2\xa4\xe0\xb3\x8d\xe0\xb2\xb0\xe0\xb3\x8d\xe0\xb2\xaf|\\n12345678|\\n"
    ವಾಕ್‌ಸ್ವಾತಂತ್ರ್ಯ|
    12345678|
    
  • python wcwidth.wcswidth() measures width 8, while mlterm measures width 9.

Panjabi, Western

Sequence of language Panjabi, Western from midpoint of alignment failure records:

Codepoint

Python

Category

wcwidth

Name

U+0628

‘\u0628’

Lo

1

ARABIC LETTER BEH

U+06D2

‘\u06d2’

Lo

1

ARABIC LETTER YEH BARREE

U+200C

‘\u200c’

Cf

0

ZERO WIDTH NON-JOINER

U+0631

‘\u0631’

Lo

1

ARABIC LETTER REH

U+0648

‘\u0648’

Lo

1

ARABIC LETTER WAW

U+0632

‘\u0632’

Lo

1

ARABIC LETTER ZAIN

U+06AF

‘\u06af’

Lo

1

ARABIC LETTER GAF

U+0627

‘\u0627’

Lo

1

ARABIC LETTER ALEF

U+0631

‘\u0631’

Lo

1

ARABIC LETTER REH

U+06CC

‘\u06cc’

Lo

1

ARABIC LETTER FARSI YEH

U+060C

‘\u060c’

Po

1

ARABIC COMMA

Total codepoints: 11

  • Shell test using printf(1), '|' should align in output:

    $ printf "\xd8\xa8\xdb\x92\xe2\x80\x8c\xd8\xb1\xd9\x88\xd8\xb2\xda\xaf\xd8\xa7\xd8\xb1\xdb\x8c\xd8\x8c|\\n1234567890|\\n"
    بے‌روزگاری،|
    1234567890|
    
  • python wcwidth.wcswidth() measures width 10, while mlterm measures width 11.

Yiddish, Eastern

Sequence of language Yiddish, Eastern from midpoint of alignment failure records:

Codepoint

Python

Category

wcwidth

Name

U+202E

‘\u202e’

Cf

0

RIGHT-TO-LEFT OVERRIDE

U+0041

‘A’

Lu

1

LATIN CAPITAL LETTER A

U+202C

‘\u202c’

Cf

0

POP DIRECTIONAL FORMATTING

Total codepoints: 3

  • Shell test using printf(1), '|' should align in output:

    $ printf "\xe2\x80\xaeA\xe2\x80\xac|\\n1|\\n"
    ‮A‬|
    1|
    
  • python wcwidth.wcswidth() measures width 1, while mlterm measures width 3.

Urdu

Sequence of language Urdu from midpoint of alignment failure records:

Codepoint

Python

Category

wcwidth

Name

U+0601

‘\u0601’

Cf

0

ARABIC SIGN SANAH

U+06F1

‘\u06f1’

Nd

1

EXTENDED ARABIC-INDIC DIGIT ONE

U+06F9

‘\u06f9’

Nd

1

EXTENDED ARABIC-INDIC DIGIT NINE

U+06F4

‘\u06f4’

Nd

1

EXTENDED ARABIC-INDIC DIGIT FOUR

U+06F8

‘\u06f8’

Nd

1

EXTENDED ARABIC-INDIC DIGIT EIGHT

U+0621

‘\u0621’

Lo

1

ARABIC LETTER HAMZA

Total codepoints: 6

  • Shell test using printf(1), '|' should align in output:

    $ printf "\xd8\x81\xdb\xb1\xdb\xb9\xdb\xb4\xdb\xb8\xd8\xa1|\\n12345|\\n"
    ؁۱۹۴۸ء|
    12345|
    
  • python wcwidth.wcswidth() measures width 5, while mlterm measures width 6.

Urdu (2)

Sequence of language Urdu (2) from midpoint of alignment failure records:

Codepoint

Python

Category

wcwidth

Name

U+0601

‘\u0601’

Cf

0

ARABIC SIGN SANAH

U+06F1

‘\u06f1’

Nd

1

EXTENDED ARABIC-INDIC DIGIT ONE

U+06F9

‘\u06f9’

Nd

1

EXTENDED ARABIC-INDIC DIGIT NINE

U+06F4

‘\u06f4’

Nd

1

EXTENDED ARABIC-INDIC DIGIT FOUR

U+06F8

‘\u06f8’

Nd

1

EXTENDED ARABIC-INDIC DIGIT EIGHT

U+0621

‘\u0621’

Lo

1

ARABIC LETTER HAMZA

Total codepoints: 6

  • Shell test using printf(1), '|' should align in output:

    $ printf "\xd8\x81\xdb\xb1\xdb\xb9\xdb\xb4\xdb\xb8\xd8\xa1|\\n12345|\\n"
    ؁۱۹۴۸ء|
    12345|
    
  • python wcwidth.wcswidth() measures width 5, while mlterm measures width 6.