It’s been a while, but in the last post, we looked at the Unicode, the international standard for encoding, representing and handling, text. In this article, we build on that knowledge and look at how we can handle, process, and manipulate, characters and strings in Swift. Let’s start by looking at the smallest building block of most text processing – the character.
Table of Contents
String and Character Basics
What is a Character?
In Swift, characters are represented using the Character
type.
When rendered each Character
value represents a single perceived character or grapheme.
The Character
type in Swift is fully compliant with the Unicode standard and conceptually represents an extended grapheme cluster.
You may remember extended grapheme clusters from the previous post. As we learnt, in Unicode, certain accented characters such as the character é
can be represented in multiple forms – either a single code point such as the character é
(U+00E9 LATIN SMALL LETTER E WITH ACUTE
) or as a sequence of two or more code points such as e
(U+0065 LATIN SMALL LETTER E
) followed by a ́
(U+0301 COMBINING ACUTE ACCENT
). This latter form is known as a grapheme cluster and graphically combines the two code points together to turn the separate e
and ́
characters into an é
when rendered in a unicode aware text rendering system. Despite their representational differences under the hood, the Unicode standard does however define these two forms to be canonically equivalent which essentially means they represent the same grapheme. Whichever form is chosen though, both forms are represented by a single Character
value in Swift (meaning that an individual character may consist of one or more code points behind the scenes and therefore does not necessarily have a fixed amount of storage unlike characters in other languages).
Ok, that’s the basic theory of Characters, what about strings?
What is a String?
We’ve all seen strings before. A string in the generic sense is nothing more than a sequence of characters such as those in the phrase "Hello World"
or the name "Apple"
. In Swift, strings are represented using the String
type.
String
values in Swift consist of a sequence of encoding-independent Unicode code points. What might surprise you though, is that this does not mean that they are simply a collection of Character
values (though at one point, this was in fact true).
Instead, as of Swift 2.0, the String
type in Swift doesn’t represent it’s constituent characters directly but instead provides a number views onto those characters via a number of properties. These views can then be used to retrieve the strings components in different forms – either as a collection of Character
values or in one of a number of other Unicode encoding forms such as UTF-8, UTF-16 and UTF-32. We’ll look the Unicode encoding forms in more detail later in this article.
Note: If your interested, this post on the Apple Swift blog provides more details behind the rationale for move away from strings being simple collection types.
Value Types
Another thing to note about strings in Swift is that they are value types. We’ve not really covered the difference between value types and reference types here on the blog yet but all this really means is that when you pass a String
value into a function or method or assigned it to a variable or a constant, Swift first creates a copy of that String
which it then passes in rather than supplying a reference to the original.
We can see this at work in the following example:
let firstString = "Hello"
var secondString = firstString
secondString += " World!"
print(firstString)
print(secondString)
// prints "Hello"
// prints "Hello World!"
See how even though we’ve modified the secondString
variable we haven’t actually modified the firstString
variable? The assignment in the second line of the example actually creates a copy of the string. This behaviour is known as copy-by-default and helps to clarify the ownership of String
values in Swift and helps ensure that individual strings are not changed unexpectedly.
Now, you might be worried by the idea of all this copying. After all, copying strings everywhere must have an impact on the performance of your code and you’re correct it does – but not in as big a way as you may think.
In order to be as performant as possible, the Swift compiler optimises things under the hood so that the String
values are only copied when absolutely necessary. This means that strings are actually copied when the string is first modified or mutated (for the performant minded amongst you this is an O(N) operation where ’N’ is the length of the string’s unspecified underlying representation). This approach minimises the performance impact of these copies whilst simultaneously maintaining the additional protection that the copy-by-default behaviour brings.
String Mutability
When it comes to mutability of the Character
and String
types in Swift, you’ll be glad to hear there are no surprises.
As with most other data types in Swift, the mutability of a Character
or String
values are governed by whether they are assign to either variables (in which case they can be mutated) or constants (in which case they can’t). It’s no more complicated than that!
Ok, so that’s the theory, let’s now have a look at some of the practice. Let’s start by looking at how we actually create Character
and String
values in Swift.
Creating Strings and Characters
Creating a String
String Literals
In Swift, string literals are predefined String
values that you can use directly within our source code. Their name stems from the fact that they are literally written within the source code.
In general, a string literal in Swift consists of a fixed sequence of zero or more textual characters surrounded by a pair of double quotes (""
) and can contain any of the printable characters you may expect (characters, numbers, punctuation etc) along with a number of escaped special characters and unicode scalar values that we’ll look at in a moment.
Creating Empty Strings
The simplest form of string literal that you can write is is an empty string literal. Any empty string literal is nothing more than a pair of double quotes (""
) (i.e. a string literal containing no characters).
So why talk about empty strings? In and of themselves, empty strings aren’t the most interesting of things but they can be pretty useful and are often needed as a starting point for constructing larger, longer, more complex strings.
In Swift, there are two ways in which we can declare and initialise an empty string – using initialiser syntax or by using an empty string literal.
To declare and initialise a string using initialiser syntax all we have to do is write the String
type followed by a set of empty parentheses:
var emptyString = String() // Initializer Syntax
The bonus here is that Swift’s type inference mechanism will automatically infer the variable (in this case emptyString
) to be of type String
.
In addition to initialiser syntax though, we also have the option of initialising a string by assigning an empty string literal:
var emptyString1 = "" // String literal
The choice is yours as to which you choose as the outcome is exactly the same.
Creating Non-Empty String Literals
If we move beyond empty strings, we can also use string literals to initialise a String
value with a non-empty string.
Instead of using the empty string literal all we have to do is assign the string literal we want to initialise the string with:
var nonEmptyString = "Hello World!"
And hey presto, we have a String
value that actually contains a string! It’s that easy. But that’s not all we can do…
Special Characters in String and Character Literals
In addition to string literals being able to contain normal printable characters (as in the previous example), string literals can also contain a number of escaped special characters.
All escaped means in this context is that these characters must be prefixed with a backslash (\\
) (or escaped) in order to successfully include them within the string literal.
There are seven escaped characters we can choose from in Swift. These are:
– \0
(null character)
– \\
(backslash)
– \t
(horizontal tab)
– \n
(line feed)
– \r
(carriage return)
– \"
(double quote)
– \'
(single quote).
Including them within a string literal is super simple. All we do is include the backslash and the required character between the double quotes of our string literal and that’s it:
var secondNonEmptyString = "\t\'Hello\'\tWorld!"
print(secondNonEmptyString)
// prints " 'Hello' World!"
Using Unicode Scalars
Beyond including escaped characters, string literals in Swift also support the inclusion of arbitrary Unicode scalar values.
When included within string literals, Unicode scalar values are written in the form \u{xxxx}
where xxxx
is a hexadecimal number containing between 1 and 8-digit hexadecimal digits that represents a value equivalent to the corresponding Unicode code point we want to include.
For example, we could define a string literal containing the Unicode Scalar U+2615 HOT BEVERAGE
as follows:
let hotBeverage = "\u{2615}" // "☕️"
That all looks good, but one thing you might not notice here is that despite the hotBeverage
constant being initialised with just a single unicode scalar value, Swift’s type inference mechanism jumps in and infers the hotBeverage
constant to be of type String
rather than a single Character
value. This is great if we want a string, but what if we wanted a Character
value instead? Let’s look at that next.
Creating a Character
To create a Character
value in Swift, we have a similar range of options to those we just looked at for creating String
values – basically initialiser syntax or string literals.
The Character
type in Swift has a range of initialisation functions that accept either a String
value, a Unicode scalar value or a sequence of unicode scalar values that collectively form a single grapheme cluster.
For example, in the following code snippet I’m using the initialisation function that accepts a String
value (in this case a string literal containing a single character):
let exclamationMark = Character("!")
In addition to initialisation syntax, we can also create a Character
value using a string literal. However, with this route things are a little more tricky.
As we’ve seen, by default, Swift’s type inference mechanism infers any text between a pair of double quotes to be a String
value (hence the term string literal). If we were to just assign a string literal to a new variable or constant directly (as we did in the previous section) Swift will automatically infer the constant or variable to be of type String
.
To force Swift to create a Character
value instead, we therefore have to include the Character
type annotation to explicitly tell Swift what we a Character
rather than a String
:
let constantCharacter : Character = "A"
var variableCharacter : Character = "b"
When initialising a Character
value using a string literal in addition to using a string literal containing a single printable character as we saw above we also have the option of assigning a string literal that contains one or more unicode scalar values. We saw how to include unicode scalar values within a string literal just now. There is one big caveat here though.
When assigning a string literal that contains multiple unicode scalar values we have to be really careful with that string literal to ensure that value the Character
is initialised with is actually a single grapheme cluster.
As I’ve said, the Character
type in Swift can only contain a single grapheme cluster and if we try to initialise it with something that isn’t (say two grapheme clusters instead of one), the compiler will raise an error.
When using string literals containing printable characters checking that we assign only a single character is relatively simple. When using a string literal that contains multiple unicode scalar values though, it’s relatively easy to fall foul of this little trap. The following couple of examples illustrate the problem.
This first example below is actually a valid declaration and initialisation. Here, I use to Unicode Scalar values (U+2615 HOT BEVERAGE
and U+FE0F VARIATION SELECTOR-16
) which combine to produce a single grapheme which is then used to initialise the hotBeverage
constant:
let hotBeverage : Character = "\u{2615}\u{FE0F}" // "☕️"
But the following example (U+2615 HOT BEVERAGE
, U+1F1EC REGIONAL INDICATOR SYMBOL LETTER G
and U+1F1E7 REGIONAL INDICATOR SYMBOL LETTER B
), would be invalid as the Unicode Scalars actually result in two grapheme clusters:
let invalidCharacter : Character = "\u{2615}\u{1F1EC}\u{1F1E7}"
// Raises a compiler error.
There one unicode scalar different, but that difference is all important.
Here’s another example – again a valid one. This one might surprise you about why it is valid but I gave you a hint about why this is the case in the previous post:
let combinedFlags : Character = "\u{1F1FA}\u{1F1F8}\u{1F1EC}\u{1F1E7}" // "????"
Ok, let’s move on.
Creating a String from An Array of Characters
In addition to the String
initialisation syntax we saw earlier, the String
type also has one more initialisation method I wanted to mention – that of using an array of Character
values to initialise the String
.
I’m not sure this approach particularly falls into the convenience camp as it is normally more convenient to use string literals but it’s there should you need it:
let worldCharacters: [Character] = ["W", "o", "r", "l", "d"]
let worldString = String(worldCharacters)
print(worldString)
// Prints "World"
Ok, so we’ve looked at how to declare and initialise Character
and String
values, let next look at how we can combine characters and strings to create new values. We’ll start by looking at how to append a character to an existing string.
Appending a Character to a String
To append a Character
value to an existing String
value we use the append(_:)
method.
The append(_:)
method is pretty simple and takes a single parameter of type Character
which will be appended to the String
value to create a new String
:
let exclamationMark : Character = "!"
string3.append(exclamationMark)
print(string3)
// prints "Hello World!"
A thing to note here is that there is no corresponding append(_:)
function for the Character
type. As we’ve discussed, a Character
value Character
values in Swift can only contain a single character so appending a Character
to another Character
doesn’t really make sense. If you want to join two Character
values, you instead have to turn them into a String
.
Concatenating Strings and Characters
In addition to being able to append a Character
value to a String
value, we can also combine two String
values. The is called string concatenation.
In Swift, there are two ways in which we can concatenate (or join) strings together these are the +
operator and the +=
operator.
Firstly, we can use the +
operator along with two String
values to create a new String
value containing the concatenation of the two strings:
let string1 = "Hello"
let string2 = " World"
let welcome = string1 + string2
print(welcome)
// Prints "Hello World"
We can also do this with a single character string literal to almost mirror the behaviour of the append(_:)
method we just looked at:
let welcome2 = string1 + "!"
print(welcome2)
// prints "Hello!"
In addition to the +
operator, we can also use the +=
operator to modify an existing String
value in place:
var string3 = "Hello"
let string4 = " World"
string3 += string4
print(string3)
// prints "Hello World"
Remember though that String
values in Swift are actually value types so although this looks like it is modifying the string in place, behind the scenes the result above (string3
) is actually a new string rather than a modification to the original string held in the string3
variable. It’s a subtle difference but an important one to remember.
String Interpolation
Along with the ability to append a character to a string, or concatenate two strings together, Swift also supports the concept of string interpolation.
String interpolation allows us to construct new String
values from a combination of constants, variables, expressions and other string literals by incorporating their values within the body of another string literal.
The basic syntax for string interpolation is simple. All we have to do is include a set of parentheses prefixed with a backslash in the body of a string literal and then place the item we want to include within the parentheses. For example:
let score = 130
let message = "Congratulations, you scored \(score)!"
print(message)
// Prints "Congratulations, you scored 130!"
But string interpolation doesn’t stop at just being able to include simple constant or variable values within a string literal, the parentheses can also contain expressions:
let message2 = "That's an average of \(score / 5) per turn.")
print(message2)
// Prints "That's an average of 26 per turn."
And they can also contain other string literals:
let interpolatedString = "This is an \("interpolated") string"
print(interpolatedString)
// prints 'This is an interpolated string'
The only rules for including expressions within the parentheses is that they can’t contain an unescaped backslash (\\
), a carriage return (\r
) or a line feed (\n
) but after that, the world is your oyster.
Anyway, it’s time to move on. Let’s next, take a look at how to determine the number of characters in a string.
Checking the Length of a String
Checking If a String is Empty
To test whether a string is empty or not we use the isEmpty
property of the String
type. The isEmpty
property returns a boolean value to (unsurprisingly) indicate whether the string is empty or not:
let emptyString1 = ""
if emptyString1.isEmpty {
print("The string was empty")
}
// The string was empty.
When it comes to checking the actual number of characters in a String
though, things are a little more complicated.
The characters Property
As we’ve learnt previously, each Character
value in Swift is the equivalent of a Unicode extended grapheme cluster and as such, may consist of zero or more Unicode scalar values. We also learnt that due to the different potential representations of a single character, there may even be one, two or a whole bunch of unicode scalar values used to represent exactly the same character. As you can probably imagine then, when taken together these factors make determining the boundaries of individual characters in a string, (and therefore the number of characters in that string) a non-trivial task. To make things a little simpler though the String
type in Swift, has a built in property called characters
.
The characters
property is a property that allows us to access a view (of type String.CharacterView
) onto the contents of the String
. The good news here is that in order to present this view, Swift beavers away behind the scenes to determine where the different Character
boundaries are so we don’t have to:
let welcome = "Hello"
for character in welcome.characters {
print(character, terminator:"")
}
// H e l l o
Note: As I mentioned earlier, the characters
property is just one of a number of views that are available on the String
type. We’ll look at the others later in this article.
Now, although the availability of this characters
property may be convenient, there is one thing to note about accessing this property and that is performance.
Performance Implications of the characters Property
As I’ve just mentioned, there is no easy way of determining the boundaries for the characters in a given string. To work out these boundaries then, each time we access the characters
property on a String
, Swift has to iterate through each of the underlying unicode scalar values that make up the string to determine where the boundaries actually are and in short, that activity is not cheap.
From a performance standpoint this is a linear operation (also known as O(N)) meaning that there is a linear relationship between the number of characters in the string and the time it takes to determine the number of characters it contains. More characters (or more precisely, more scalar values) means more time.
Most of the time, at least with short strings, this isn’t an issue but it’s definitely something you should consider if you start working with much longer strings in your code.
Counting the Number of Characters In a String
Ok, let’s get back to our problem of counting the characters in a String
.
We’ve just looked at the characters
property and seen how it can give us a view onto the string with all the character boundaries already identified. The good news is that the String.CharacterView
type also has a count
property built in that we can now use to count the Character
values in the String
:
var word = touche
print("Number of characters in \(word) is \(word.characters.count)")
// Number of characters in touche is 6
Another interesting thing to note about string length is that string concatenation and modification may not always affect the number of characters in the string:
word += "\u{301} // COMBINING ACUTE ACCENT, U+0301
print ("Number of characters in \(word) is \(word.characters.count)")
// Number of characters in touché is 6
Remember, that just because we’ve added a new Unicode scalar value doesn’t mean we’ve necessarily added a new character. As we can see in the example above, depending on the unicode scalar involved, that scalar may simply modify the last grapheme or character in the string rather than adding a new character and the character account may therefore remain unchanged.
Retrieving Characters from a String
Character Indices and Ranges
As we’ve seen, the variable-width boundaries for graphemes in Swift strings adds additional complications that you may not be used to from other programming languages.
Another implication of these variable boundaries though, is that unlike in other languages, the Character
values that make up a String
value can’t be indexed by simple integer values. Instead, the String
type in Swift has an associated index type (String.Index
) which we have to use instead.
Accessing Characters Using Subscript Notation
Built into the String
type, are a number of convenience properties through which we can access certain key String.Index
values.
To access the index of the first Character
in a String
we use the startIndex
property.
The startIndex
property returns a value of type String.Index
which can then be used in conjunction with normal subscript notation to access the first character in a string:
let welcome = "Hello World!"
welcome[welcome.startIndex]
// H
Once we have retrieved the index of the first Character
in the String
value, we can also access the index of the next character. Due to the variable width of characters, this is not as simple as adding 1 to the start index though. Instead, we use the successor()
method on the String.Index
value which shelters us from all the Unicode complications under the hood and returns the index of the next character:
welcome[welcome.startIndex.successor()]
// e
If we move our attention to the end of the string it won’t surprise you to hear that there also a similar set of properties and methods that we can use to access the index of the last character in a string. There is a little gotcha here though.
The endIndex
property allows us to retrieve the position of the first Character
after the end of a given String
value.
The key word here is ’after’. The endIndex
property can’t be used as a subscript directly (due to it being beyond the end of the string) but we can use it in conjunction with another method, the predecessor()
method, that retrieves the index of the immediately preceding character:
welcome[welcome.endIndex.predecessor()]
// !
So that’s the first and last characters in a string, but what about other characters? Well, here we have a couple of options.
The first option is to chain together the successor()
or predecessor()
calls we just saw until we get to the index we want:
welcome[welcome.startIndex.successor().successor()]
// l
welcome[welcome.endIndex.predecessor().predecessor()]
// d
But that’s not particularly elegant. Instead, we can use the advancedBy(_:)
method. The advancedBy(_:)
method on the String.Index
type takes a single integer parameter that represents the number of characters we want to advance by:
let index1 = welcome.startIndex.advancedBy(6)
welcome[index1]
// W
It can also accept negative numbers if we’re working from the end of a string:
let index2 = welcome.endIndex.advancedBy(-3)
welcome[index2]
// l
We need to be careful though. If we end up trying to access an index beyond the end of the string, it will trigger a runtime error:
welcome[welcome.endIndex] // error
welcome.startIndex.predecessor() // error
We can protect ourselves when using the advancedBy(_:)
method by using a alternative form that also takes a limit:
let index2 = welcome2.startIndex.advancedBy(30, limit: welcome2.endIndex.predecessor())
print(welcome2[index2]) // "d"
This form ensures that the advancedBy
method never goes beyond the specified index.
Another thing we can do with the String.Index
values is construct Range
values. This allows us to retrieve ranges of characters from a string which are themselves returned as a String.CharacterView
:
let range1 = welcome.startIndex.advancedBy(6)..<welcome.endIndex
print(welcome[range1])
// prints "World!"
One final thing I want to cover before we move on is the indices
property.
The indices
property is a property on the String.CharacterView
data type that returns a Range
of indices that spans all characters in the string and can in turn be used to access each individual Character
value in a given String
by iterating over the range:
for index in welcome.characters.indices {
print("\(welcome[index]) ", terminator: "")
}
// H e l l o W o r l d !
Prefix and Suffix Convenience Methods
So, using indices and subscript syntax is one way of accessing the characters in a String
but there is also a few others.
In addition to the startIndex
and endIndex
properties that we talked about earlier, the String.CharacterView
type also implements a number of convenience functions that allow us to access ranges of characters at either the start or the end of a string.
At the start of the string we have the prefix(_:)
method. The prefix(_:)
method is used to access the characters at the start of a String.CharacterView
and takes a single parameter (the maximum number of characters you want returned) and returns another String.CharacterView
containing up to the specified number of characters:
let prefixSuffixString = "Hello World"
let firstHellCharacterView : String.CharacterView = prefixSuffixString.characters.prefix(4)
print(String(firstHellCharacterView)) // Hell
Note: In this example, and the subsequent examples in this section, I’m going to include the type annotations for each of the variables and constants so that you can see what type of values are being returned.
We can also achieve a similar outcome using the second of our prefix operations the prefixUpTo(_:)
method. Where the prefix(_:)
method takes an integer value, the prefixUpTo(_:)
method accepts a String.Index
value and when called on the String.CharacterView
returns another String.CharacterView
containing the Characters
from the start of the string up to but not including the supplied index:
let end = exampleString.characters.startIndex.advancedBy(4, limit: exampleString.characters.endIndex)
let secondHellCharacterView : String.CharacterView = exampleString.characters.prefixUpTo(end)
print(String(secondHellCharacterView)) // Hell
This is essentially the same as calling:
exampleString.characters[exampleString.characters.startIndex..<end]
There is also a slightly different variant, the prefixThrough(_:)
method. This method again returns a String.CharacterView
but this time the view contains the characters from the start of the string up to and including the supplied index:
let helloCharacterView : String.CharacterView = exampleString.characters.prefixThrough(end)
print(String(helloCharacterView)) // Hello
In this case, it’s essentially the same as:
print(String(exampleString.characters[exampleString.characters.startIndex...end] )) // Hello
It won’t surprise you that when it comes to accessing the characters at the end of a string we also have a couple of options.
The suffix(_:)
method is the counterpart to the prefix(_:)
method and takes a single integer parameter that specifies the maximum number characters that you want to return from the end of the string:
let firstWorldCharacterView = exampleString.characters.suffix(5)
print(String(firstWorldCharacterView)) // World
There is also a second method, the suffixFrom(_:)
method. This method accepts a String.Index
parameter and returns a new String.CharacterView
containing the characters starting at the given index up to and including the last character in the string:
let startIndex = exampleString.characters.endIndex.advancedBy(-5, limit: exampleString.characters.startIndex)
let secondWorldCharacterView = exampleString.characters.suffixFrom(startIndex)
print(String(secondWorldCharacterView)) // World
In this case, it’s essentially the same as:
print(String(exampleString.characters[startIndex..<exampleString.characters.endIndex]))
Ok, now that we know how to access the characters in a string, let’s take a look at what we can compare either whole String
values or parts of those strings.
String Comparison
In Swift, there are three ways that we can compare String
values:
– String
prefix and suffix equality
– String
and Character
equality
– Order comparison
Checking for String Prefixes and Suffixes
To check that a string starts with a given set of characters we use the hasPrefix(_:)
method of the String
type.
The hasPrefix(_:)
method takes a single String
parameter, returns a boolean value and checks whether the string starts with the given prefix or not:
let message = "Hello World"
if message.hasPrefix("Hello") {
print("Has prefix \'Hello\'")
}
// Prints "Has prefix 'Hello'"
In determining whether the String
has a given prefix, a character-by-character canonical equivalence comparison is performed between the extended grapheme clusters in each String
value. If they match, the hasPrefix
method returns true
, otherwise it returns false
.
It’s a similar story when it comes to the hasSuffix(_:)
method. The hasSuffix(_:)
method is used to check whether a String
value ends with a given set of characters:
if message.hasSuffix("World") {
print("Has suffix \'World\'")
}
// prints "Has suffix 'World'"
String and Character Equality
Due to the String
types conformance to the Comparable
protocol (which in turn inherits from the Equatable
protocol the String
type also have a whole load of options when it comes to testing complete String
values for their equality and ordering.
By default, the rules for testing equality and ordering of Unicode-based String values are defined within the Unicode collation algorithm and Swift pretty much follow these rules to the letter.
In terms of equality comparison between two String
(or two Character
values) in Swift we use the “equal to” operator (==
) and the “not equal to” operator (!=
):
let firstWelcome = "Hello There"
let secondWelcome = "Hello There"
let thirdWelcome = "Hello There!"
if firstWelcome == secondWelcome {
print("Greetings are considered equal")
}
// prints "Greetings are considered equal"
if firstWelcome != thirdWelcome {
print("Greetings aren't equal.")
}
// prints "Greetings aren't equal"
Two String
values (or two Character
values) are considered equal in Swift if their associated extended grapheme clusters are canonically equivalent.
As we saw in the last post, canonical equivalence means that they have the same appearance and same linguistic meaning, even if they are composed of different Unicode Scalar values behind the scenes:
let string1 = "\u{00E9}" // é U+00E9 LATIN SMALL LETTER E WITH ACUTE
let string2 = "\u{0065}\u{0301}" // e U+0065 LATIN SMALL LETTER E + ́ U+0301 COMBINING ACUTE ACCENT
if string1 == string2 {
print("They are canonically equivalent.")
}
Sorting Strings
In addition to being able to check two strings for equality we can also order strings using the <
, <=
, >
and >=
operators. The rules for sorting of string and character values are also defined as part of the Unicode collation algorithm. I’m not going to go into them here as they are a little complicated but essentially they do as you would expect:
let a = "a"
let b = "b"
let A = "A"
let otherA = A
let one = "1"
a < b // true
b > A // true
one < a // true
otherA <= A // true
otherA > b // false
let a = "a"
let b = "b"
let A = "A"
let otherA = A
let one = "1"
a < b // true
b > A // true
one < a // true
otherA <= A // true
otherA > b // false
Note: Due to the wonders of the Swift language has been open-sourced, if you do want more details of how String equality and sorting in Swift works under the hood we can actually look for ourselves. To save you some time hunting around though you’ll find some of the underlying implementations of the <
operator for the String
type here and here. Only implementations of the ==
and <
are provided though as definitions for the other operators used in the examples above are defined automatically in terms of those two operators.
Ok, so we’ve looked at how to create strings, access the strings contents and compare different strings. The next thing to look at is how to manipulate and modify the strings that we have created. Let’s do that next.
String Modification and Manipulation
Modifying Characters Case
Built into the String
type in Swift, are a couple of properties that allow us to modify the case of the characters that make up a String
value. These are the uppercaseString
and lowecaseString
properties.
As you might expect, the uppercaseString
property returns a new String
value with all the characters changed to uppercase:
var name = "Andy"
print(name.uppercaseString)
// prints "ANDY"
We can also use the lowercaseString
property to return a new string with all the characters converted to lowercase:
print(name.lowercaseString)
// prints "andy"
Reversing the Characters in a String
We can also reverse the characters in a string.
Reversing a string in Swift is actually super-simple. All we have to do is call the reverse()
method on the String.CharacterView
that we obtain from the strings characters
property. This returns a new String.CharacterView
that we can then use to initialise a new String
value:
print(String("Forward".characters.reverse()))
// "drawroF"
Splitting a String
We also have options for splitting a string into an array of substrings.
Conformance with the CollectionType
protocol, means that the String.CharactersView
type implements both the split(_:maxSplit:allowEmptySlices:)
and split(_:allowEmptySlices:isSeparator:)
methods. Both of these methods return an array of String.CharacterViews
representing slices of the original String.CharacterView
.
The first method, the split(_:maxSplit:allowEmptySlices:)
, takes three parameters. The first is a Character
value that you want to use to denote the point at which the String.CharacterView
will be split. This Character
will be omitted from the resulting strings. The second is an integer value (maxSplit
) that specifies the maximum number slices that will be returned (this has a default value of Int.max
so you normally don’t have to specify it). The last parameter (allowEmptySlices
) is a boolean parameter that specifies whether empty slices are allowed in the returned array (this also has a default value of false
so again you don’t normally need to specify it):
var title = "Star Wars"
var charactersInTitle = title.characters
let components = charactersInTitle.split(" ")
for component in components {
print(String(component))
}
// Star
// Wars
The second of our methods the split(_:allowEmptySlices:isSeparator:)
method also takes three parameters.
The first parameter is the maxSplit
parameter we just saw, and again it has a default value of Int.max
. The second parameter is the same allowEmptySlices
parameter we just saw and again this defaults to false
. The last parameter however is different.
Instead of taking a Character
value to use to split the characters in the String.CharacterView
, the split(_:allowEmptySlices:isSeparator:)
takes a closure. The closure parameter is a closure that accepts a single Character
parameter and returns a boolean value indicating whether or not that particular Character
value should be used to split the String.CharacterView
:
let components2 = charactersInTitle.split{ $0 == " " }
for component in components2 {
print(String(component))
}
// Star
// Wars
Inserting a Character Into a String
To insert a character into a String
value, we can use the insert(_:atIndex:)
method. The index method takes two parameters, the Character
value you want to insert and the index at which you want to insert it:
var welcome2 = "Hello World"
welcome2.insert("!", atIndex:welcome2.endIndex)
print(welcome2)
// prints "Hello World!"
Notice that we have to define welcome2
as a variable as this is a mutating method which essentially modifies the value of welcome2
by creating a new string (containing our modifications) and re-assigns it to the welcome2
variable.
Inserting a String Into a String
We can also insert a String
value into an existing String
value using the insertContentsOf(_:at:)
method.
The insertContentsOf(_:at:)
method is almost identical to the insert(_:atIndex:)
method in that it accepts, a String
value, along with an index at which to insert the String
:
welcome2.insertContentsOf(" to the entire ".characters, at: welcome2.startIndex.advanceBy(5))
print(welcome2)
// prints "Hello to the entire World!"
Again, the same caveats apply about this being a mutating method.
Replacing a Range of a Characters in a String
When it comes to replacing a range of characters in a string we use the replaceRange(_:with:)
method.
As with most of the other string modification methods we’ve looked at, the method accepts two parameters. The first is the range of characters we want to replace (again a Range
of type String.Index
) and the second is the String
value we want to replace those characters with:
var anotherMessage = "A message"
let range = anotherMessage.firstIndex...anotherMessage.endIndex.advancedBy(-4)
anotherMessage.replaceRange(range, with:"Bagg")
print(anotherMessage)
// prints "Baggage"
Removing a Character from a String
Removing characters or ranges of characters from a string is just as simple and we have a number of options at our disposal due to the String.CharacterView
type conforming to the RangeReplaceableCollectionType
protocol(which in turn conforms to the CollectionType
protocol). However, there is a small complication with using these methods due to the fact that the characters
property of a String
is a read only property.
In order to use them then, we must first create a copy of the String.CharacterView
before we can use these methods:
let myString = "abcdefghijklmnop"
var characters = myString.characters
With that setup, lets look at the first pair of methods the popFirst()
and popLast()
methods.
If the String.CharacterView
is not empty (i.e. the isEmpty
property returns false
) these methods remove (and return) either the first, or the last, character of a String.CharacterView
. If the String.CharacterView
is empty, these functions return nil
instead (making the return type of these methods a Character
optional):
characters.popFirst() // "a"
print(String(characters)) // "bcdefghijklmnop"
characters.popLast() // "p"
print(String(characters)) // "bcdefghijklmno"
We also have the removeFirst()
and removeLast()
methods. These methods work in a similar way removing and returning either the first, or last character in the String.CharacterView
. They differ from popFirst()
and popLast()
though in that they will raise an error if the String.CharacterView
is empty:
characters.removeFirst() // "b"
print(String(characters)) // cdefghijklmno
characters.removeLast() // "o"
print(String(characters)) // cdefghijklmn
var emptyCharacters = "".characters
emptyCharacters.removeFirst() // raises an error
Now, where things get more interesting is that there are also variants of the removeFirst()
and the removeFirst()
methods that accept an integer parameter. The integer parameter represents the number of characters that should be removed from the String.CharacterView
. These methods don’t however, return the Character
values that were removed:
characters.removeFirst(2)
print(String(characters)) // efghijklmn
characters.removeLast(2)
print(String(characters)) // efghijkl
The next set of methods we have are the dropFirst(_:)
and dropLast(_:)
methods. These methods also take an integer parameter to indicate the number of characters to remove. Unlike the other methods, instead of modifying the original String.CharacterView
and (potentially) returning the characters they’ve removed, they instead return a new String.CharacterView
containing the characters that are remaining:
print(String(characters.dropFirst(2))) // ghijkl
print(String(characters.dropLast(2))) // efghij
So that’s pretty much it for the methods on the String.CharacterView
but there are a couple of methods that we can call on String
values directly.
The first is the removeAtIndex(_:)
method. The removeAtIndex(_:)
method takes a single parameter, the index of a character that you want to remove and modifies the string by removing the character at the given index:
var alphabetSlice = String(characters) // "efjhij"
alphabetSlice.removeAtIndex(alphabetSlice.startIndex.successor())
// alphabetSlice now contains "ejhij"
Here I’ve used it to remove the character f
from the alphabetSlice
variable.
Removing a Range of Characters from a String
In addition to removing a single character from a String
value using the removeAtIndex(_:)
method, we can also remove a range of characters in a given String
value using the removeRange(_:)
method.
The removeRange(_:)
method accepts a single parameter, a Range
of indices of type String.Index
:
var anotherWelcome = "Hello World!"
let multiCharacterRange = anotherWelcome.startIndex..<anotherWelcome.startIndex.advancedBy(5)
anotherWelcome.removeRange(multiCharacterRange)
// anotherWelcome contains "World!"
And if we’re a little sneaky, we can also use this to remove a single character:
let singleCharacterRange = anotherWelcome.endIndex.predecessor()...anotherWelcome.endIndex.predecessor()
anotherWelcome.removeRange(singleCharacterRange)
// anotherWelcome contains “World”
Removing All Characters in a String
To remove all the characters from a string we can use the removeAll(keepCapacity:)
method:
var message = "A message"
message.removeAll(keepCapacity:false)
if message.isEmpty {
print("The string is empty")
}
// prints "The string is empty"
So that about covers the basics of the String
types and Character
types in Swift but there are a couple of other topics that I wanted to mention before we wrap up today, the first of which is Unicode String Representations.
Unicode String Representations
So far in this article we’ve talked about the String
and Character
types and 99% of the time, these are the data types you’ll be working with when working with text in Swift. Sometimes though it is necessary to drop down to a lower level of detail and work directly with the Unicode code points that make up those strings.
There are a number of reasons why you may want to do this.
Maybe your rendering your text into a UTF-8 encoded webpage. Maybe you want to work with some of the NSString
APIs (which use UTF-16 code points under the hood). Maybe performance is your thing and you want to gain access to the performance benefits of working with code units directly or want to benefit from the random access capabilities that only the UTF-16 encoding can bring you. Whatever the reason, the good news is that the String
type in Swift has you covered.
In addition to the characters
property we looked at earlier, the String
type in Swift also supports three other properties, each of which provides a different view on to the underlying unicode scalar values of a Swift string. These properties are the utf8
, utf16
and unicodeScalars
properties.
UTF-8 Representation
The first of the properties, the utf8
property provides a way to access the unicode scalar values of a Swift String
as a collection of UTF-8 encoded code units. The property returns a value of type String.UTF8View
. The String.UTF8View
is a collection of unsigned 8-bit (UInt8
) values, one for each byte of the String
values UTF-8 representation:
let myString = "Hello \u{2615}" // Hello ☕
for codeUnit in myString.utf8 {
print("\(codeUnit) ", terminator:"")
}
print("\r")
// 72 101 108 108 111 32 226 152 149
UTF-16 Representation
Next we have the utf16
property. The utf16
property returns a value of type String.UTF16View
which is a collection of 16-bit (UInt16
) values, one for each of the 16-bit code units of a String
values UTF-16 representation:
for codeUnit in myString.utf16 {
print("\(codeUnit) ", terminator:"")
}
print("\r")
// 72 101 108 108 111 32 9749
Note: As I’ve mentioned, conceptually NSString
objects are UTF-16 encoded strings with platform endianness under the hood so becoming familiar with the UTF-16 encoding and how it works can be pretty useful.
In addition, the UTF-16 encoding scheme is the only unicode encoding scheme that allows random access due to the fixed-width nature of its encoding so it can also yield performance benefits if your looking for pure blazing speed. With these benefits comes some pitfalls though as UTF-16 is not fully unicode compliant, especially when it comes to strings containing diacritics like the é
we saw earlier so you do need to be a little careful.
Unicode Scalar Representation
The final view property on the Swift String
type is the unicodeScalars
property. The unicodeScalars
property provides access to the string as a collection of 21-bit unicode scalar values that are equivalent to the strings UTF-32 encoding form.
As such, the unicodeScalars
property returns a value of type UnicodeScalarView
which in turn represents a collection of UnicodeScalar
values. The UnicodeScalar
type differs slightly from the raw numeric types we’ve seen with the other views in that it is actually a struct and can be used directly to create new string values:
for scalar in myString.unicodeScalars {
print("\(scalar) ")
}
print("\r")
// Hello ☕
And if it is the scalar value you’re after, we instead have to use the value
property of the UnicodeScalar
type which provides access to the underlying 21-bit unicode code unit as a UInt32
value:
for scalar in myString.unicodeScalars {
print("\(scalar.value) ", terminator: "")
}
print("\r")
// 72 101 108 108 111 32 9749
Summary
So I’m going to leave it there for today. As we have seen, there is a whole load of functionality built into the String
and Character
types in Swift. In this article, we’ve seen how to create Character
and String
values, we’ve seen how to test the numbers of characters that make up a String
and we’ve seen how to manipulate a strings contents by adding, inserting and removing Character
values and substrings. There is however, a whole load that I haven’t covered.
Through Swift’s automatic bridging between the String
type and the NSString
class in the Foundation Framework, we also have access to a whole range of additional functionality above and beyond that covered in this article. This additional functionality includes the ability to search inside a string for a given set of characters, the ability to draw a string at a point on the screen and the ability to perform advanced natural language processing through classes such as NSLinguisticTagger
to name but a few. With that said though, hopefully this article, along with the previous article, will have give you much better foundation from which to explore that new functionality and in doing so, has opened your eyes up to just how powerful the String
and Character
types are in Swift. I do like the Swift language.