|Written by Ian Elliot|
|Thursday, 17 January 2019|
Page 2 of 3
You can also treat a string as if it was an array of characters - for example "ABC" returns "B". However Strings as arrays was only introduced in ECMAScript 5 and it is read only i.e. you cannot assign to a string element.
Of course characters are represented by character codes and the day of the simple ASCII code has long gone. The charCodeAt method returns the Unicode of the character a given position. For example:
displays 66 - which is of course the ASCII code for B.
For the ASCII characters the two codes are the same so if you want to you can simply ignore Unicode and continue is if you were still using ASCII.
The inverse function to charCodeAt is the object method String.fromCharCode which returns a string composed of characters specified by the given Unicode parameters. For example:
returns the string "ABC", You can go beyond the usual ASCII characters however.
For example to create a string consisting of the single Unicode character 09B2 (hex) which is the Bengali LA character you would use:
(recall that 0x is the prefix for hexadecimal numbers).
You can also enter Unicode characters within a string using escape codes. If you include \uhexcode in a string then the Unicode character corresponding to the code is inserted. For example to insert the Bengali LA character you would use:
You can use \DDD with DDD in octal or \xDD with DD in hex for any Latin-1 encoded character
There are also escape characters to allow you to enter a range of non-printing control codes:
There are also Unicode equivalents of these old ASCII escape codes:
The big problem with Unicode when used outside of the standard ASCII characters is that the browser, or other display device, has to support Unicode and there has to be an appropriate Unicode font available. Most modern browsers do support Unicode, but there is still the problem of Unicode input.
There is also the problem of what to do if you want to go outside of the BMP?
For example, the "grinning cat face with smiling eyes" needs two 16 bit words in UTF16 - \uD83D\uDE38. However if you try:
s= "\uD83D\uDE38"; console.log(lengths);
reports the length of the string as two. even though only one character is coded.
s= "\uD83D\uDE38\uD83D\uDE38"; alert(s.charAt(1));
The charAt doesn't give you the final cat emoji, but the character corresponding to the first uDE38, which is an illegal Unicode character, i.e. it returns the 16-bit code corresponding to the second 16-bit word rather than the second character.
|Last Updated ( Thursday, 17 January 2019 )|