JavaScript Data Structures - the String Object
Written by Ian Elliot   
Monday, 30 April 2012
Article Index
JavaScript Data Structures - the String Object
String manipulation

The String is the most basic and most useful of the JavaScript data structures. It can also be used as a starting point for, or be incorporated into, many other data structures. Let's see how it all works.

This is part of a series on implementing data structures in JavaScript. The other articles are:

 

 

You can create a string literal using either " or ' and there is no difference between the two delimiters. That is

"hello string"

and

'hello string'

are treated identically.

The only reason for allowing two different delimiters is that it allows you to write things like:

"The String's Color"

or

'I can "quote" something'

To create a String object you use new in the usual way:

var s=new String("hello string");

You can leave the new out if you want to, but if you do then you don't create a String object but a string literal.

JavaScript takes a very pragmatic approach to strings. It allows you to define string literals, which are stored as simple character data, and it has a String object, which is less efficient but has all of the methods and properties that you need to make strings useful.

What this means is that when you write:

var s1 = "hello string";

you create a string literal, which is simply a sequence of character codes stored in memory, but when you write:

var s2 = new String("hello string");

you create a string object complete with properties and methods. (Note: the new is essential as without it you just create a string literal.)

This might sound complicated but you don't really have to worry about it too much. The reason is that any time you make use of a string property or method on a string literal JavaScript "wraps" or "boxes" the literal in a String object. Because of this there is very little difference in practice between a string literal and a String Object, even in terms of efficiency.

For example, the length property gives the number of characters in the string and you can use it on a literal or on a String object in exactly the same way:

s1.length
s2.length

or even

"hello string".length

There are two fairly obvious and well known places where the difference between a string literal and a string object makes a difference.

The first is in equality testing. If you test for equality between two string literals then it all works as you might expect. However, if you treat String objects as if they were string literals then things seem to go wrong.

For example if you compare two seemingly identical strings:

var s1 = "1+2";
var s2 = new String("1+2");

then s1==s2 is true but s1==s2 is false. Strict equality demands that s1 and s2 are the same object and in this case they are not. What might seem worse is that two String objects with the same value are not considered equal. That is if you define:

var s2 = new String("1+2");
var s3 = new String("1+2");

Then s2==s3 and s2==s3 are both false.

The second place the difference matters is when you use eval to work out the value of a string as an expression or even as a JavaScript program. Eval only evaluates a string literal not a String object. In the case of an object it returns the object's valueOf method. For example:

var s1 = "1+2";
var s2 = new String("1+2");

then eval(s1) returns 3 but eval(s2) returns the string "1+2".

There may be other differences but if so they are less obvious.

All of this sounds as if strings could be a real problem in JavaScript, but in practice most strings are derived from string literals and the use of the String object is rare.

If you want to make sure that string comparisons and eval work in the same way with literals and objects then simply always use valueOf. For example:

s1.valueOf() === s2.valueOf();
s1.valueOf() == s2.valueOf();

are both true, even though s1 is a literal and s2 an object. Similarly:

eval(s1.valueOf());
eval(s2.valueOf());

are both fully evaluated to 3.

Characters

The JavaScript String is a fundamental type in the sense that there is no character/char data type. If you want to work with a single character then you use a string of length 1. To access a single character in a string it is best to use the charAt(i) method which will return the character at the ith position. That is, "ABC".charAt(1) returns "B" as the first character in a string is character zero.

You can also treat a string as if it was an array of characters - for example "ABC"[1] returns "B". However Strings as arrays was only introduced in ECMAScript 5 and hence is only supported in modern browsers.

Of course characters are represented by character codes and the day of the simple ASCII code has long gone. The charCodeAt method returns the Unicode of the character a given position. For example:

var s1 = "ABC";
alert(s1.charCodeAt(1));

displays 66 - which is of course the ASCII code for B.

For the ASCII characters the two codes are the same so if you want to you can simply ignore Unicode and continue is if you were still using ASCII.

It is also worth noting that JavaScript only supports 16 bit Unicode and not the full range of 0 to hex 10ffff.

The inverse function to charCodeAt is the object method String.fromCharCode which returns a string composed of characters specified by the given Unicode parameters. For example:

String.fromCharCode(65,66,67)

returns the string "ABC", You can go beyond the usual ASCII characters however. For example to create a string consisting of the single Unicode character 09B2 (hex) which is the Bengali LA character you would use:

var s1 = String.fromCharCode(0x09B2);

(recall that 0x is the prefix for hexadecimal numbers).

Unicode characters are only supported in JavaScript after version 1.3 so some care is needed with older browsers.

You can also enter Unicode characters within a string using escape codes. If you include \uhexcode in a string then the Unicode character corresponding to the code is inserted. For example to insert the Bengali LA character you would use:

alert("\u09B2");

You can use \DDD with DDD in octal or \xDD with DD in hex for any Latin-1 encoded character

There are also escape characters to allow you to enter a range of non-printing control codes:

\' single quote
\" double quote
\\ backslash
\n new line
\r carriage return
\t tab
\b backspace
\f

form feed

 

There are also Unicode equivalents of these old ASCII escape codes:

\u0009 Tab
\u000B Vertical Tab
\u000C Form Feed
\u0020 Space
\u000A Line Feed
\u000D Carriage Return
\u0008 Backspace
\u0009 Horizontal Tab
\u0022 Double Quote
\u0027 Single Quote
\u005C Backslash

 

The big problem with Unicode when used outside of the standard ASCII characters is that the browser, or other display device, has to support Unicode and there has to be an appropriate Unicode font available. Most modern browsers do support Unicode, but there is still the problem of Unicode input.



Last Updated ( Tuesday, 15 January 2013 )
 
 

   
RSS feed of all content
I Programmer - full contents
Copyright © 2014 i-programmer.info. All Rights Reserved.
Joomla! is Free Software released under the GNU/GPL License.