Swift Strings

It’s been a while, but in the last post, we looked at the Unicode, the international standard for encoding, representing and handling, text. In this article, we build on that knowledge and look at how we can handle, process, and manipulate, characters and strings in Swift. Let’s start by looking at the smallest building block of most text processing – the character.

Table of Contents

Toggle

String and Character Basics

What is a Character?

In Swift, characters are represented using the Character type.

When rendered each Character value represents a single perceived character or grapheme.

The Character type in Swift is fully compliant with the Unicode standard and conceptually represents an extended grapheme cluster.

You may remember extended grapheme clusters from the previous post. As we learnt, in Unicode, certain accented characters such as the character é can be represented in multiple forms – either a single code point such as the character é (U+00E9 LATIN SMALL LETTER E WITH ACUTE) or as a sequence of two or more code points such as e (U+0065 LATIN SMALL LETTER E) followed by a ́ (U+0301 COMBINING ACUTE ACCENT). This latter form is known as a grapheme cluster and graphically combines the two code points together to turn the separate e and ́ characters into an é when rendered in a unicode aware text rendering system. Despite their representational differences under the hood, the Unicode standard does however define these two forms to be canonically equivalent which essentially means they represent the same grapheme. Whichever form is chosen though, both forms are represented by a single Character value in Swift (meaning that an individual character may consist of one or more code points behind the scenes and therefore does not necessarily have a fixed amount of storage unlike characters in other languages).

Ok, that’s the basic theory of Characters, what about strings?

What is a String?

We’ve all seen strings before. A string in the generic sense is nothing more than a sequence of characters such as those in the phrase "Hello World" or the name "Apple". In Swift, strings are represented using the String type.

String values in Swift consist of a sequence of encoding-independent Unicode code points. What might surprise you though, is that this does not mean that they are simply a collection of Character values (though at one point, this was in fact true).

Instead, as of Swift 2.0, the String type in Swift doesn’t represent it’s constituent characters directly but instead provides a number views onto those characters via a number of properties. These views can then be used to retrieve the strings components in different forms – either as a collection of Character values or in one of a number of other Unicode encoding forms such as UTF-8, UTF-16 and UTF-32. We’ll look the Unicode encoding forms in more detail later in this article.

Note: If your interested, this post on the Apple Swift blog provides more details behind the rationale for move away from strings being simple collection types.

Value Types

Another thing to note about strings in Swift is that they are value types. We’ve not really covered the difference between value types and reference types here on the blog yet but all this really means is that when you pass a String value into a function or method or assigned it to a variable or a constant, Swift first creates a copy of that String which it then passes in rather than supplying a reference to the original.

We can see this at work in the following example:

let firstString = "Hello"
var secondString = firstString
secondString += " World!"
print(firstString)
print(secondString)
// prints "Hello"
// prints "Hello World!"

See how even though we’ve modified the secondString variable we haven’t actually modified the firstString variable? The assignment in the second line of the example actually creates a copy of the string. This behaviour is known as copy-by-default and helps to clarify the ownership of String values in Swift and helps ensure that individual strings are not changed unexpectedly.

Now, you might be worried by the idea of all this copying. After all, copying strings everywhere must have an impact on the performance of your code and you’re correct it does – but not in as big a way as you may think.

In order to be as performant as possible, the Swift compiler optimises things under the hood so that the String values are only copied when absolutely necessary. This means that strings are actually copied when the string is first modified or mutated (for the performant minded amongst you this is an O(N) operation where ’N’ is the length of the string’s unspecified underlying representation). This approach minimises the performance impact of these copies whilst simultaneously maintaining the additional protection that the copy-by-default behaviour brings.

String Mutability

When it comes to mutability of the Character and String types in Swift, you’ll be glad to hear there are no surprises.

As with most other data types in Swift, the mutability of a Character or String values are governed by whether they are assign to either variables (in which case they can be mutated) or constants (in which case they can’t). It’s no more complicated than that!

Ok, so that’s the theory, let’s now have a look at some of the practice. Let’s start by looking at how we actually create Character and String values in Swift.

Creating Strings and Characters

Creating a String

String Literals

In Swift, string literals are predefined String values that you can use directly within our source code. Their name stems from the fact that they are literally written within the source code.

In general, a string literal in Swift consists of a fixed sequence of zero or more textual characters surrounded by a pair of double quotes ("") and can contain any of the printable characters you may expect (characters, numbers, punctuation etc) along with a number of escaped special characters and unicode scalar values that we’ll look at in a moment.

Creating Empty Strings

The simplest form of string literal that you can write is is an empty string literal. Any empty string literal is nothing more than a pair of double quotes ("") (i.e. a string literal containing no characters).

So why talk about empty strings? In and of themselves, empty strings aren’t the most interesting of things but they can be pretty useful and are often needed as a starting point for constructing larger, longer, more complex strings.

In Swift, there are two ways in which we can declare and initialise an empty string – using initialiser syntax or by using an empty string literal.

To declare and initialise a string using initialiser syntax all we have to do is write the String type followed by a set of empty parentheses:

var emptyString = String() // Initializer Syntax

The bonus here is that Swift’s type inference mechanism will automatically infer the variable (in this case emptyString) to be of type String.

In addition to initialiser syntax though, we also have the option of initialising a string by assigning an empty string literal:

var emptyString1 = "" // String literal

The choice is yours as to which you choose as the outcome is exactly the same.

Creating Non-Empty String Literals

If we move beyond empty strings, we can also use string literals to initialise a String value with a non-empty string.

Instead of using the empty string literal all we have to do is assign the string literal we want to initialise the string with:

var nonEmptyString = "Hello World!"

And hey presto, we have a String value that actually contains a string! It’s that easy. But that’s not all we can do…

Special Characters in String and Character Literals

In addition to string literals being able to contain normal printable characters (as in the previous example), string literals can also contain a number of escaped special characters.

All escaped means in this context is that these characters must be prefixed with a backslash (\\) (or escaped) in order to successfully include them within the string literal.

There are seven escaped characters we can choose from in Swift. These are:
– \0 (null character)
– \\ (backslash)
– \t (horizontal tab)
– \n (line feed)
– \r (carriage return)
– \" (double quote)
– \' (single quote).

Including them within a string literal is super simple. All we do is include the backslash and the required character between the double quotes of our string literal and that’s it:

var secondNonEmptyString = "\t\'Hello\'\tWorld!"
print(secondNonEmptyString)
// prints "    'Hello'    World!"

Using Unicode Scalars

Beyond including escaped characters, string literals in Swift also support the inclusion of arbitrary Unicode scalar values.

When included within string literals, Unicode scalar values are written in the form \u{xxxx} where xxxx is a hexadecimal number containing between 1 and 8-digit hexadecimal digits that represents a value equivalent to the corresponding Unicode code point we want to include.

For example, we could define a string literal containing the Unicode Scalar U+2615 HOT BEVERAGE as follows:

let hotBeverage = "\u{2615}" // "☕️"

That all looks good, but one thing you might not notice here is that despite the hotBeverage constant being initialised with just a single unicode scalar value, Swift’s type inference mechanism jumps in and infers the hotBeverage constant to be of type String rather than a single Character value. This is great if we want a string, but what if we wanted a Character value instead? Let’s look at that next.

Creating a Character

To create a Character value in Swift, we have a similar range of options to those we just looked at for creating String values – basically initialiser syntax or string literals.

The Character type in Swift has a range of initialisation functions that accept either a String value, a Unicode scalar value or a sequence of unicode scalar values that collectively form a single grapheme cluster.

For example, in the following code snippet I’m using the initialisation function that accepts a String value (in this case a string literal containing a single character):

let exclamationMark = Character("!")

In addition to initialisation syntax, we can also create a Character value using a string literal. However, with this route things are a little more tricky.

As we’ve seen, by default, Swift’s type inference mechanism infers any text between a pair of double quotes to be a String value (hence the term string literal). If we were to just assign a string literal to a new variable or constant directly (as we did in the previous section) Swift will automatically infer the constant or variable to be of type String.

To force Swift to create a Character value instead, we therefore have to include the Character type annotation to explicitly tell Swift what we a Character rather than a String:

let constantCharacter : Character = "A"
var variableCharacter : Character = "b"

When initialising a Character value using a string literal in addition to using a string literal containing a single printable character as we saw above we also have the option of assigning a string literal that contains one or more unicode scalar values. We saw how to include unicode scalar values within a string literal just now. There is one big caveat here though.

When assigning a string literal that contains multiple unicode scalar values we have to be really careful with that string literal to ensure that value the Character is initialised with is actually a single grapheme cluster.

As I’ve said, the Character type in Swift can only contain a single grapheme cluster and if we try to initialise it with something that isn’t (say two grapheme clusters instead of one), the compiler will raise an error.

When using string literals containing printable characters checking that we assign only a single character is relatively simple. When using a string literal that contains multiple unicode scalar values though, it’s relatively easy to fall foul of this little trap. The following couple of examples illustrate the problem.

This first example below is actually a valid declaration and initialisation. Here, I use to Unicode Scalar values (U+2615 HOT BEVERAGE and U+FE0F VARIATION SELECTOR-16) which combine to produce a single grapheme which is then used to initialise the hotBeverage constant:

let hotBeverage : Character = "\u{2615}\u{FE0F}" // "☕️"

But the following example (U+2615 HOT BEVERAGE, U+1F1EC REGIONAL INDICATOR SYMBOL LETTER G and U+1F1E7 REGIONAL INDICATOR SYMBOL LETTER B), would be invalid as the Unicode Scalars actually result in two grapheme clusters:

let invalidCharacter : Character = "\u{2615}\u{1F1EC}\u{1F1E7}" 
// Raises a compiler error.

There one unicode scalar different, but that difference is all important.

Here’s another example – again a valid one. This one might surprise you about why it is valid but I gave you a hint about why this is the case in the previous post:

let combinedFlags : Character = "\u{1F1FA}\u{1F1F8}\u{1F1EC}\u{1F1E7}" // "????"

Ok, let’s move on.

Creating a String from An Array of Characters

In addition to the String initialisation syntax we saw earlier, the String type also has one more initialisation method I wanted to mention – that of using an array of Character values to initialise the String.

I’m not sure this approach particularly falls into the convenience camp as it is normally more convenient to use string literals but it’s there should you need it:

let worldCharacters: [Character] = ["W", "o", "r", "l", "d"]
let worldString = String(worldCharacters)
print(worldString)
// Prints "World"

Ok, so we’ve looked at how to declare and initialise Character and String values, let next look at how we can combine characters and strings to create new values. We’ll start by looking at how to append a character to an existing string.

Appending a Character to a String

To append a Character value to an existing String value we use the append(_:) method.

The append(_:) method is pretty simple and takes a single parameter of type Character which will be appended to the String value to create a new String:

let exclamationMark : Character = "!"
string3.append(exclamationMark)
print(string3)
// prints "Hello World!"

A thing to note here is that there is no corresponding append(_:) function for the Character type. As we’ve discussed, a Character value Character values in Swift can only contain a single character so appending a Character to another Character doesn’t really make sense. If you want to join two Character values, you instead have to turn them into a String.

Concatenating Strings and Characters

In addition to being able to append a Character value to a String value, we can also combine two String values. The is called string concatenation.

In Swift, there are two ways in which we can concatenate (or join) strings together these are the + operator and the += operator.

Firstly, we can use the + operator along with two String values to create a new String value containing the concatenation of the two strings:

let string1 = "Hello"
let string2 = " World"
let welcome = string1 + string2
print(welcome)
// Prints "Hello World"

We can also do this with a single character string literal to almost mirror the behaviour of the append(_:) method we just looked at:

let welcome2 = string1 + "!"
print(welcome2)
// prints "Hello!"

In addition to the + operator, we can also use the += operator to modify an existing String value in place:

var string3 = "Hello"
let string4 = " World"
string3 += string4
print(string3)
// prints "Hello World"

Remember though that String values in Swift are actually value types so although this looks like it is modifying the string in place, behind the scenes the result above (string3) is actually a new string rather than a modification to the original string held in the string3 variable. It’s a subtle difference but an important one to remember.

String Interpolation

Along with the ability to append a character to a string, or concatenate two strings together, Swift also supports the concept of string interpolation.

String interpolation allows us to construct new String values from a combination of constants, variables, expressions and other string literals by incorporating their values within the body of another string literal.

The basic syntax for string interpolation is simple. All we have to do is include a set of parentheses prefixed with a backslash in the body of a string literal and then place the item we want to include within the parentheses. For example:

let score = 130
let message = "Congratulations, you scored \(score)!"
print(message)
// Prints "Congratulations, you scored 130!"

But string interpolation doesn’t stop at just being able to include simple constant or variable values within a string literal, the parentheses can also contain expressions:

let message2 = "That's an average of \(score / 5) per turn.")
print(message2)
// Prints "That's an average of 26 per turn."

And they can also contain other string literals:

let interpolatedString = "This is an \("interpolated") string"
print(interpolatedString)
// prints 'This is an interpolated string'

The only rules for including expressions within the parentheses is that they can’t contain an unescaped backslash (\\), a carriage return (\r) or a line feed (\n) but after that, the world is your oyster.

Anyway, it’s time to move on. Let’s next, take a look at how to determine the number of characters in a string.

Checking the Length of a String

Checking If a String is Empty

To test whether a string is empty or not we use the isEmpty property of the String type. The isEmpty property returns a boolean value to (unsurprisingly) indicate whether the string is empty or not:

let emptyString1 = ""
if emptyString1.isEmpty {
    print("The string was empty")
}
// The string was empty.

When it comes to checking the actual number of characters in a String though, things are a little more complicated.

The characters Property

As we’ve learnt previously, each Character value in Swift is the equivalent of a Unicode extended grapheme cluster and as such, may consist of zero or more Unicode scalar values. We also learnt that due to the different potential representations of a single character, there may even be one, two or a whole bunch of unicode scalar values used to represent exactly the same character. As you can probably imagine then, when taken together these factors make determining the boundaries of individual characters in a string, (and therefore the number of characters in that string) a non-trivial task. To make things a little simpler though the String type in Swift, has a built in property called characters.

The characters property is a property that allows us to access a view (of type String.CharacterView) onto the contents of the String. The good news here is that in order to present this view, Swift beavers away behind the scenes to determine where the different Character boundaries are so we don’t have to:

let welcome = "Hello"
for character in welcome.characters {
    print(character, terminator:"")
}
// H e l l o

Note: As I mentioned earlier, the characters property is just one of a number of views that are available on the String type. We’ll look at the others later in this article.

Now, although the availability of this characters property may be convenient, there is one thing to note about accessing this property and that is performance.

Performance Implications of the characters Property

As I’ve just mentioned, there is no easy way of determining the boundaries for the characters in a given string. To work out these boundaries then, each time we access the characters property on a String, Swift has to iterate through each of the underlying unicode scalar values that make up the string to determine where the boundaries actually are and in short, that activity is not cheap.

From a performance standpoint this is a linear operation (also known as O(N)) meaning that there is a linear relationship between the number of characters in the string and the time it takes to determine the number of characters it contains. More characters (or more precisely, more scalar values) means more time.

Most of the time, at least with short strings, this isn’t an issue but it’s definitely something you should consider if you start working with much longer strings in your code.

Counting the Number of Characters In a String

Ok, let’s get back to our problem of counting the characters in a String.

We’ve just looked at the characters property and seen how it can give us a view onto the string with all the character boundaries already identified. The good news is that the String.CharacterView type also has a count property built in that we can now use to count the Character values in the String:

var word = touche
print("Number of characters in \(word) is \(word.characters.count)")
// Number of characters in touche is 6

Another interesting thing to note about string length is that string concatenation and modification may not always affect the number of characters in the string:

word += "\u{301} // COMBINING ACUTE ACCENT, U+0301
print ("Number of characters in \(word) is \(word.characters.count)")
// Number of characters in touché is 6

Remember, that just because we’ve added a new Unicode scalar value doesn’t mean we’ve necessarily added a new character. As we can see in the example above, depending on the unicode scalar involved, that scalar may simply modify the last grapheme or character in the string rather than adding a new character and the character account may therefore remain unchanged.

Retrieving Characters from a String

Character Indices and Ranges

As we’ve seen, the variable-width boundaries for graphemes in Swift strings adds additional complications that you may not be used to from other programming languages.

Another implication of these variable boundaries though, is that unlike in other languages, the Character values that make up a String value can’t be indexed by simple integer values. Instead, the String type in Swift has an associated index type (String.Index) which we have to use instead.

Accessing Characters Using Subscript Notation

Built into the String type, are a number of convenience properties through which we can access certain key String.Index values.

To access the index of the first Character in a String we use the startIndex property.

The startIndex property returns a value of type String.Index which can then be used in conjunction with normal subscript notation to access the first character in a string:

let welcome = "Hello World!"
welcome[welcome.startIndex]
// H

Once we have retrieved the index of the first Character in the String value, we can also access the index of the next character. Due to the variable width of characters, this is not as simple as adding 1 to the start index though. Instead, we use the successor() method on the String.Index value which shelters us from all the Unicode complications under the hood and returns the index of the next character:

welcome[welcome.startIndex.successor()]
// e

If we move our attention to the end of the string it won’t surprise you to hear that there also a similar set of properties and methods that we can use to access the index of the last character in a string. There is a little gotcha here though.

The endIndex property allows us to retrieve the position of the first Character after the end of a given String value.

The key word here is ’after’. The endIndex property can’t be used as a subscript directly (due to it being beyond the end of the string) but we can use it in conjunction with another method, the predecessor() method, that retrieves the index of the immediately preceding character:

welcome[welcome.endIndex.predecessor()]
// !

So that’s the first and last characters in a string, but what about other characters? Well, here we have a couple of options.

The first option is to chain together the successor() or predecessor() calls we just saw until we get to the index we want:

welcome[welcome.startIndex.successor().successor()]
// l
welcome[welcome.endIndex.predecessor().predecessor()]
// d

But that’s not particularly elegant. Instead, we can use the advancedBy(_:) method. The advancedBy(_:) method on the String.Index type takes a single integer parameter that represents the number of characters we want to advance by:

let index1 = welcome.startIndex.advancedBy(6)
welcome[index1]
// W

It can also accept negative numbers if we’re working from the end of a string:

let index2 = welcome.endIndex.advancedBy(-3)
welcome[index2]
// l

We need to be careful though. If we end up trying to access an index beyond the end of the string, it will trigger a runtime error:

welcome[welcome.endIndex] // error
welcome.startIndex.predecessor() // error

We can protect ourselves when using the advancedBy(_:) method by using a alternative form that also takes a limit:

let index2 = welcome2.startIndex.advancedBy(30, limit: welcome2.endIndex.predecessor())
print(welcome2[index2]) // "d"

This form ensures that the advancedBy method never goes beyond the specified index.

Another thing we can do with the String.Index values is construct Range values. This allows us to retrieve ranges of characters from a string which are themselves returned as a String.CharacterView:

let range1 = welcome.startIndex.advancedBy(6)..<welcome.endIndex
print(welcome[range1])
// prints "World!"

One final thing I want to cover before we move on is the indices property.

The indices property is a property on the String.CharacterView data type that returns a Range of indices that spans all characters in the string and can in turn be used to access each individual Character value in a given String by iterating over the range:

for index in welcome.characters.indices {
    print("\(welcome[index]) ", terminator: "")
}
// H e l l o   W o r l d !

Prefix and Suffix Convenience Methods

So, using indices and subscript syntax is one way of accessing the characters in a String but there is also a few others.

In addition to the startIndex and endIndex properties that we talked about earlier, the String.CharacterView type also implements a number of convenience functions that allow us to access ranges of characters at either the start or the end of a string.

At the start of the string we have the prefix(_:) method. The prefix(_:) method is used to access the characters at the start of a String.CharacterView and takes a single parameter (the maximum number of characters you want returned) and returns another String.CharacterView containing up to the specified number of characters:

let prefixSuffixString = "Hello World"
let firstHellCharacterView : String.CharacterView = prefixSuffixString.characters.prefix(4)
print(String(firstHellCharacterView)) // Hell

Note: In this example, and the subsequent examples in this section, I’m going to include the type annotations for each of the variables and constants so that you can see what type of values are being returned.

We can also achieve a similar outcome using the second of our prefix operations the prefixUpTo(_:) method. Where the prefix(_:) method takes an integer value, the prefixUpTo(_:) method accepts a String.Index value and when called on the String.CharacterView returns another String.CharacterView containing the Characters from the start of the string up to but not including the supplied index:

let end = exampleString.characters.startIndex.advancedBy(4, limit: exampleString.characters.endIndex)
let secondHellCharacterView : String.CharacterView = exampleString.characters.prefixUpTo(end)
print(String(secondHellCharacterView)) // Hell

This is essentially the same as calling:

exampleString.characters[exampleString.characters.startIndex..<end]

There is also a slightly different variant, the prefixThrough(_:) method. This method again returns a String.CharacterView but this time the view contains the characters from the start of the string up to and including the supplied index:

let helloCharacterView : String.CharacterView = exampleString.characters.prefixThrough(end)
print(String(helloCharacterView)) // Hello

In this case, it’s essentially the same as:

print(String(exampleString.characters[exampleString.characters.startIndex...end] )) // Hello

It won’t surprise you that when it comes to accessing the characters at the end of a string we also have a couple of options.

The suffix(_:) method is the counterpart to the prefix(_:) method and takes a single integer parameter that specifies the maximum number characters that you want to return from the end of the string:

let firstWorldCharacterView = exampleString.characters.suffix(5)
print(String(firstWorldCharacterView)) // World

There is also a second method, the suffixFrom(_:) method. This method accepts a String.Index parameter and returns a new String.CharacterView containing the characters starting at the given index up to and including the last character in the string:

let startIndex = exampleString.characters.endIndex.advancedBy(-5, limit: exampleString.characters.startIndex)
let secondWorldCharacterView = exampleString.characters.suffixFrom(startIndex)
print(String(secondWorldCharacterView)) // World

In this case, it’s essentially the same as:

print(String(exampleString.characters[startIndex..<exampleString.characters.endIndex]))

Ok, now that we know how to access the characters in a string, let’s take a look at what we can compare either whole String values or parts of those strings.

String Comparison

In Swift, there are three ways that we can compare String values:
– String prefix and suffix equality
– String and Character equality
– Order comparison

Checking for String Prefixes and Suffixes

To check that a string starts with a given set of characters we use the hasPrefix(_:) method of the String type.

The hasPrefix(_:) method takes a single String parameter, returns a boolean value and checks whether the string starts with the given prefix or not:

let message = "Hello World"
if message.hasPrefix("Hello") {
    print("Has prefix \'Hello\'")
}
// Prints "Has prefix 'Hello'"

In determining whether the String has a given prefix, a character-by-character canonical equivalence comparison is performed between the extended grapheme clusters in each String value. If they match, the hasPrefix method returns true, otherwise it returns false.

It’s a similar story when it comes to the hasSuffix(_:) method. The hasSuffix(_:) method is used to check whether a String value ends with a given set of characters:

if message.hasSuffix("World") {
    print("Has suffix \'World\'")
}
// prints "Has suffix 'World'"

String and Character Equality

Due to the String types conformance to the Comparable protocol (which in turn inherits from the Equatable protocol the String type also have a whole load of options when it comes to testing complete String values for their equality and ordering.

By default, the rules for testing equality and ordering of Unicode-based String values are defined within the Unicode collation algorithm and Swift pretty much follow these rules to the letter.

In terms of equality comparison between two String (or two Character values) in Swift we use the “equal to” operator (==) and the “not equal to” operator (!=):

let firstWelcome = "Hello There"
let secondWelcome = "Hello There"
let thirdWelcome = "Hello There!"

if firstWelcome == secondWelcome {
   print("Greetings are considered equal")
}
// prints "Greetings are considered equal"

if firstWelcome != thirdWelcome {
   print("Greetings aren't equal.")
}
// prints "Greetings aren't equal"

Two String values (or two Character values) are considered equal in Swift if their associated extended grapheme clusters are canonically equivalent.

As we saw in the last post, canonical equivalence means that they have the same appearance and same linguistic meaning, even if they are composed of different Unicode Scalar values behind the scenes:

let string1 = "\u{00E9}" // é U+00E9 LATIN SMALL LETTER E WITH ACUTE
let string2 = "\u{0065}\u{0301}" // e U+0065 LATIN SMALL LETTER E +  ́ U+0301 COMBINING ACUTE ACCENT
if string1 == string2 {
    print("They are canonically equivalent.")
}

Sorting Strings

In addition to being able to check two strings for equality we can also order strings using the <, <=, > and >= operators. The rules for sorting of string and character values are also defined as part of the Unicode collation algorithm. I’m not going to go into them here as they are a little complicated but essentially they do as you would expect:

let a = "a"
let b = "b"
let A = "A"
let otherA = A
let one = "1"

a < b // true
b > A // true
one < a // true
otherA <= A // true
otherA > b // false

let a = "a"
let b = "b"
let A = "A"
let otherA = A
let one = "1"

a < b // true
b > A // true
one < a // true
otherA <= A // true
otherA > b // false

Note: Due to the wonders of the Swift language has been open-sourced, if you do want more details of how String equality and sorting in Swift works under the hood we can actually look for ourselves. To save you some time hunting around though you’ll find some of the underlying implementations of the < operator for the String type here and here. Only implementations of the == and < are provided though as definitions for the other operators used in the examples above are defined automatically in terms of those two operators.

Ok, so we’ve looked at how to create strings, access the strings contents and compare different strings. The next thing to look at is how to manipulate and modify the strings that we have created. Let’s do that next.

String Modification and Manipulation

Modifying Characters Case

Built into the String type in Swift, are a couple of properties that allow us to modify the case of the characters that make up a String value. These are the uppercaseString and lowecaseString properties.

As you might expect, the uppercaseString property returns a new String value with all the characters changed to uppercase:

var name = "Andy"
print(name.uppercaseString)
// prints "ANDY"

We can also use the lowercaseString property to return a new string with all the characters converted to lowercase:

print(name.lowercaseString)
// prints "andy"

Reversing the Characters in a String

We can also reverse the characters in a string.

Reversing a string in Swift is actually super-simple. All we have to do is call the reverse() method on the String.CharacterView that we obtain from the strings characters property. This returns a new String.CharacterView that we can then use to initialise a new String value:

print(String("Forward".characters.reverse()))
// "drawroF"

Splitting a String

We also have options for splitting a string into an array of substrings.

Conformance with the CollectionType protocol, means that the String.CharactersView type implements both the split(_:maxSplit:allowEmptySlices:) and split(_:allowEmptySlices:isSeparator:) methods. Both of these methods return an array of String.CharacterViews representing slices of the original String.CharacterView.

The first method, the split(_:maxSplit:allowEmptySlices:), takes three parameters. The first is a Character value that you want to use to denote the point at which the String.CharacterView will be split. This Character will be omitted from the resulting strings. The second is an integer value (maxSplit) that specifies the maximum number slices that will be returned (this has a default value of Int.max so you normally don’t have to specify it). The last parameter (allowEmptySlices) is a boolean parameter that specifies whether empty slices are allowed in the returned array (this also has a default value of false so again you don’t normally need to specify it):

var title = "Star Wars"
var charactersInTitle = title.characters
let components = charactersInTitle.split(" ")
for component in components {
    print(String(component))
}
// Star
// Wars

The second of our methods the split(_:allowEmptySlices:isSeparator:) method also takes three parameters.

The first parameter is the maxSplit parameter we just saw, and again it has a default value of Int.max. The second parameter is the same allowEmptySlices parameter we just saw and again this defaults to false. The last parameter however is different.

Instead of taking a Character value to use to split the characters in the String.CharacterView, the split(_:allowEmptySlices:isSeparator:) takes a closure. The closure parameter is a closure that accepts a single Character parameter and returns a boolean value indicating whether or not that particular Character value should be used to split the String.CharacterView:

let components2 = charactersInTitle.split{ $0 == " " }
for component in components2 {
    print(String(component))
}
// Star
// Wars

Inserting a Character Into a String

To insert a character into a String value, we can use the insert(_:atIndex:) method. The index method takes two parameters, the Character value you want to insert and the index at which you want to insert it:

var welcome2 = "Hello World"
welcome2.insert("!", atIndex:welcome2.endIndex)
print(welcome2)
// prints "Hello World!"

Notice that we have to define welcome2 as a variable as this is a mutating method which essentially modifies the value of welcome2 by creating a new string (containing our modifications) and re-assigns it to the welcome2 variable.

Inserting a String Into a String

We can also insert a String value into an existing String value using the insertContentsOf(_:at:) method.

The insertContentsOf(_:at:) method is almost identical to the insert(_:atIndex:) method in that it accepts, a String value, along with an index at which to insert the String:

welcome2.insertContentsOf(" to the entire ".characters, at: welcome2.startIndex.advanceBy(5))
print(welcome2)
// prints "Hello to the entire World!"

Again, the same caveats apply about this being a mutating method.

Replacing a Range of a Characters in a String

When it comes to replacing a range of characters in a string we use the replaceRange(_:with:) method.

As with most of the other string modification methods we’ve looked at, the method accepts two parameters. The first is the range of characters we want to replace (again a Range of type String.Index) and the second is the String value we want to replace those characters with:

var anotherMessage = "A message"
let range = anotherMessage.firstIndex...anotherMessage.endIndex.advancedBy(-4)
anotherMessage.replaceRange(range, with:"Bagg")
print(anotherMessage)
// prints "Baggage"

Removing a Character from a String

Removing characters or ranges of characters from a string is just as simple and we have a number of options at our disposal due to the String.CharacterView type conforming to the RangeReplaceableCollectionType protocol(which in turn conforms to the CollectionType protocol). However, there is a small complication with using these methods due to the fact that the characters property of a String is a read only property.

In order to use them then, we must first create a copy of the String.CharacterView before we can use these methods:

let myString = "abcdefghijklmnop"
var characters = myString.characters

With that setup, lets look at the first pair of methods the popFirst() and popLast() methods.

If the String.CharacterView is not empty (i.e. the isEmpty property returns false) these methods remove (and return) either the first, or the last, character of a String.CharacterView. If the String.CharacterView is empty, these functions return nil instead (making the return type of these methods a Character optional):

characters.popFirst() // "a"
print(String(characters)) // "bcdefghijklmnop"

characters.popLast() // "p"
print(String(characters)) // "bcdefghijklmno"

We also have the removeFirst() and removeLast() methods. These methods work in a similar way removing and returning either the first, or last character in the String.CharacterView. They differ from popFirst() and popLast() though in that they will raise an error if the String.CharacterView is empty:

characters.removeFirst() // "b"
print(String(characters)) // cdefghijklmno

characters.removeLast() // "o"
print(String(characters)) // cdefghijklmn

var emptyCharacters = "".characters
emptyCharacters.removeFirst() // raises an error

Now, where things get more interesting is that there are also variants of the removeFirst() and the removeFirst() methods that accept an integer parameter. The integer parameter represents the number of characters that should be removed from the String.CharacterView. These methods don’t however, return the Character values that were removed:

characters.removeFirst(2)
print(String(characters)) // efghijklmn

characters.removeLast(2)
print(String(characters)) // efghijkl

The next set of methods we have are the dropFirst(_:) and dropLast(_:) methods. These methods also take an integer parameter to indicate the number of characters to remove. Unlike the other methods, instead of modifying the original String.CharacterView and (potentially) returning the characters they’ve removed, they instead return a new String.CharacterView containing the characters that are remaining:

print(String(characters.dropFirst(2))) // ghijkl
print(String(characters.dropLast(2))) // efghij

So that’s pretty much it for the methods on the String.CharacterView but there are a couple of methods that we can call on String values directly.

The first is the removeAtIndex(_:) method. The removeAtIndex(_:) method takes a single parameter, the index of a character that you want to remove and modifies the string by removing the character at the given index:

var alphabetSlice = String(characters) // "efjhij"
alphabetSlice.removeAtIndex(alphabetSlice.startIndex.successor())
// alphabetSlice now contains "ejhij"

Here I’ve used it to remove the character f from the alphabetSlice variable.

Removing a Range of Characters from a String

In addition to removing a single character from a String value using the removeAtIndex(_:) method, we can also remove a range of characters in a given String value using the removeRange(_:) method.

The removeRange(_:) method accepts a single parameter, a Range of indices of type String.Index:

var anotherWelcome = "Hello World!"
let multiCharacterRange = anotherWelcome.startIndex..<anotherWelcome.startIndex.advancedBy(5)
anotherWelcome.removeRange(multiCharacterRange)
// anotherWelcome contains "World!"

And if we’re a little sneaky, we can also use this to remove a single character:

let singleCharacterRange = anotherWelcome.endIndex.predecessor()...anotherWelcome.endIndex.predecessor()
anotherWelcome.removeRange(singleCharacterRange)
// anotherWelcome contains “World”

Removing All Characters in a String

To remove all the characters from a string we can use the removeAll(keepCapacity:) method:

var message = "A message"
message.removeAll(keepCapacity:false)
if message.isEmpty {
    print("The string is empty")
}
// prints "The string is empty"

So that about covers the basics of the String types and Character types in Swift but there are a couple of other topics that I wanted to mention before we wrap up today, the first of which is Unicode String Representations.

Unicode String Representations

So far in this article we’ve talked about the String and Character types and 99% of the time, these are the data types you’ll be working with when working with text in Swift. Sometimes though it is necessary to drop down to a lower level of detail and work directly with the Unicode code points that make up those strings.

There are a number of reasons why you may want to do this.

Maybe your rendering your text into a UTF-8 encoded webpage. Maybe you want to work with some of the NSString APIs (which use UTF-16 code points under the hood). Maybe performance is your thing and you want to gain access to the performance benefits of working with code units directly or want to benefit from the random access capabilities that only the UTF-16 encoding can bring you. Whatever the reason, the good news is that the String type in Swift has you covered.

In addition to the characters property we looked at earlier, the String type in Swift also supports three other properties, each of which provides a different view on to the underlying unicode scalar values of a Swift string. These properties are the utf8, utf16 and unicodeScalars properties.

UTF-8 Representation

The first of the properties, the utf8 property provides a way to access the unicode scalar values of a Swift String as a collection of UTF-8 encoded code units. The property returns a value of type String.UTF8View. The String.UTF8View is a collection of unsigned 8-bit (UInt8) values, one for each byte of the String values UTF-8 representation:

let myString = "Hello \u{2615}" // Hello ☕
for codeUnit in myString.utf8 {
    print("\(codeUnit) ", terminator:"")
}
print("\r")
// 72 101 108 108 111 32 226 152 149

UTF-16 Representation

Next we have the utf16 property. The utf16 property returns a value of type String.UTF16View which is a collection of 16-bit (UInt16) values, one for each of the 16-bit code units of a String values UTF-16 representation:

for codeUnit in myString.utf16 {
    print("\(codeUnit) ", terminator:"")
}
print("\r")
// 72 101 108 108 111 32 9749

Note: As I’ve mentioned, conceptually NSString objects are UTF-16 encoded strings with platform endianness under the hood so becoming familiar with the UTF-16 encoding and how it works can be pretty useful.

In addition, the UTF-16 encoding scheme is the only unicode encoding scheme that allows random access due to the fixed-width nature of its encoding so it can also yield performance benefits if your looking for pure blazing speed. With these benefits comes some pitfalls though as UTF-16 is not fully unicode compliant, especially when it comes to strings containing diacritics like the é we saw earlier so you do need to be a little careful.

Unicode Scalar Representation

The final view property on the Swift String type is the unicodeScalars property. The unicodeScalars property provides access to the string as a collection of 21-bit unicode scalar values that are equivalent to the strings UTF-32 encoding form.

As such, the unicodeScalars property returns a value of type UnicodeScalarView which in turn represents a collection of UnicodeScalar values. The UnicodeScalar type differs slightly from the raw numeric types we’ve seen with the other views in that it is actually a struct and can be used directly to create new string values:

for scalar in myString.unicodeScalars {
    print("\(scalar) ")
}
print("\r")
// Hello ☕

And if it is the scalar value you’re after, we instead have to use the value property of the UnicodeScalar type which provides access to the underlying 21-bit unicode code unit as a UInt32 value:

for scalar in myString.unicodeScalars {
    print("\(scalar.value) ", terminator: "")
}
print("\r")
// 72 101 108 108 111 32 9749

Summary

So I’m going to leave it there for today. As we have seen, there is a whole load of functionality built into the String and Character types in Swift. In this article, we’ve seen how to create Character and String values, we’ve seen how to test the numbers of characters that make up a String and we’ve seen how to manipulate a strings contents by adding, inserting and removing Character values and substrings. There is however, a whole load that I haven’t covered.

Through Swift’s automatic bridging between the String type and the NSString class in the Foundation Framework, we also have access to a whole range of additional functionality above and beyond that covered in this article. This additional functionality includes the ability to search inside a string for a given set of characters, the ability to draw a string at a point on the screen and the ability to perform advanced natural language processing through classes such as NSLinguisticTagger to name but a few. With that said though, hopefully this article, along with the previous article, will have give you much better foundation from which to explore that new functionality and in doing so, has opened your eyes up to just how powerful the String and Character types are in Swift. I do like the Swift language.

String and Character Basics

What is a Character?

What is a String?

Value Types

String Mutability

Creating Strings and Characters

Creating a String

String Literals

Creating Empty Strings

Creating Non-Empty String Literals

Special Characters in String and Character Literals

Using Unicode Scalars

Creating a Character

Creating a String from An Array of Characters

Appending a Character to a String

Concatenating Strings and Characters

String Interpolation

Checking the Length of a String

Checking If a String is Empty

The characters Property

Performance Implications of the characters Property

Counting the Number of Characters In a String

Retrieving Characters from a String

Character Indices and Ranges

Accessing Characters Using Subscript Notation

Prefix and Suffix Convenience Methods

String Comparison

Checking for String Prefixes and Suffixes

String and Character Equality

Sorting Strings

String Modification and Manipulation

Modifying Characters Case

Reversing the Characters in a String

Splitting a String

Inserting a Character Into a String

Inserting a String Into a String

Replacing a Range of a Characters in a String

Removing a Character from a String

Removing a Range of Characters from a String

Removing All Characters in a String

Unicode String Representations

UTF-8 Representation

UTF-16 Representation

Unicode Scalar Representation

Summary

Footer CTA