Skip to content

Strings

Text shows up in almost every program you write. Names, messages, scores, labels. In Python, any piece of text is called a string: any value you wrap in quote marks. Single or double, both work the same way.

Strings are Python's primary text type. They carry everything from a username to a URL path to formatted output. Single and double quotes produce identical results; the choice is stylistic.

str is Python's immutable Unicode sequence type. It sits at every system boundary: terminal I/O, file contents, network responses, serialised data. Both quote styles produce the same object; the tokeniser treats them identically.

python
greeting = "Hello, world"
username = 'alice'

The only time the choice of quotes matters is when your text contains quote marks. Use the opposite style so you don't have to escape them:

The community convention is double quotes. The practical reason to switch styles is to avoid escaping when the content contains that character:

Convention is double quotes. The only reason to switch is to avoid a backslash escape when the content contains that delimiter:

python
note    = "It's a great day"      # apostrophe inside, use double quotes
message = 'She said "hello"'      # double quotes inside, use single quotes
escaped = "She said \"hello\""    # or escape with a backslash

Immutability

Strings are immutable: once you create one, you cannot change it. Think of a string as permanently fixed the moment it is made. Any operation that looks like it is modifying a string is actually producing a brand new one. The original stays exactly as it was.

Strings are immutable: no method modifies a string in place. Every operation that transforms text returns a new string and leaves the original untouched. The practical consequence is that a method call you do not assign anywhere has no effect on anything.

str objects are immutable: the internal buffer is fixed at construction and cannot be written to. This gives strings three useful properties: they are hashable (valid as dict keys and set members), safe to share across references without copying, and eligible for CPython's interning optimisation on short literals.

python
name = "alice"
name = name.upper()   # "ALICE" is a new string; "alice" is unchanged

The direct consequence: you cannot change a character at a specific position. Python will raise an error if you try.

python
name = "alice"
name[0] = "A"   # TypeError: 'str' object does not support item assignment

To get a modified string, build a new one using slicing or a method. Both are covered below.

Attempting character assignment shows the constraint directly:

python
name = "alice"
name[0] = "A"   # TypeError: 'str' object does not support item assignment

When you need a modified version, the standard tools are slicing with concatenation for positional edits, and replace() for substitutions. Both produce a new string and leave the original untouched.

str.__setitem__ is not implemented; item assignment raises TypeError unconditionally. For positional modification, use slicing: name[:1].upper() + name[1:]. For substitution, replace(). For assembling many pieces, "".join(parts) is important: repeated s += chunk inside a loop is O(n²) because each + allocates a new buffer of the combined length and copies both operands in. join() allocates once.

Indexing and slicing

Every character in a string has a numbered position, starting at zero. You can read individual characters by putting that position number in square brackets. Negative numbers count backward from the end.

Strings are sequences with zero-based indexing. Negative indices count from the end. Slicing extracts any contiguous range in a single expression, and it never raises an error on out-of-range values.

str implements the full sequence protocol. Subscript access (s[i]) goes through __getitem__ with an integer and raises IndexError on out-of-range input. Slicing (s[start:stop:step]) passes a slice object; indices are silently clamped to valid range, so no IndexError is possible from a slice.

python
word = "Python"
#       012345

print(word[0])    # "P"
print(word[2])    # "t"
print(word[5])    # "n"
print(word[-1])   # "n"  (last character)
print(word[-2])   # "o"  (second to last)

-1 is always the last character, -2 the second to last, and so on. They are useful when you want the end of a string without knowing its exact length.

Negative indices wrap: -1 is len(s) - 1, -2 is len(s) - 2. Most useful for end-anchored access when you do not want to compute the length manually. A negative index that goes out of range still raises IndexError, same as a positive one.

Negative indices are normalised to len(s) + i before the bounds check. There is no special-casing in the interpreter; it is arithmetic. Out-of-range raises IndexError regardless of sign.

Slicing extracts a chunk. [start:stop] includes start and excludes stop:

python
word = "Python"

print(word[0:2])   # "Py"     (positions 0 and 1)
print(word[2:])    # "thon"   (position 2 to end)
print(word[:3])    # "Pyt"    (start to position 2)
print(word[:])     # "Python" (a copy of the whole string)
print(word[::2])   # "Pto"    (every second character)
print(word[::-1])  # "nohtyP" (reversed)

Three patterns to reach for most: word[:n] for the first n characters, word[n:] for everything from position n onward, word[-n:] for the last n characters. word[::-1] reverses a string. It looks odd the first time, but it is idiomatic Python and you will see it often.

Unlike direct indexing, slicing never raises IndexError. Python clamps out-of-range indices silently, so word[100:] on a short string returns "" rather than crashing. The step argument controls stride: word[::2] takes every other character, word[::-1] traverses in reverse.

s[start:stop:step] passes slice(start, stop, step) to __getitem__. All three arguments default to None, not 0 and len(). With a negative step, the defaults invert: start defaults to len - 1, stop to -(len + 1). That is what makes [::-1] traverse the full string in reverse without any explicit bounds.

Essential string methods

Strings come with a set of built-in methods: operations you call directly on any string value. You write the string (or the variable holding it), then a dot, then the method name. Each method returns a new string. The original is never changed.

String methods are functions attached to the str type. Because strings are immutable, every method returns a new string rather than modifying the original. A method call you do not assign or pass somewhere has no lasting effect.

str's methods are defined on the type object and implemented in C. All transforming methods follow the immutability contract: they return new str objects. CPython's implementation is Unicode-aware throughout; methods operate on code points, not bytes.

Case

python
text = "Hello, World"

text.lower()       # "hello, world"
text.upper()       # "HELLO, WORLD"
text.title()       # "Hello, World"  (each word capitalised)
text.capitalize()  # "Hello, world"  (first word only)

lower() and upper() are the two you will use most. lower() is particularly useful when comparing text: "Alice" and "alice" become the same thing once you call .lower() on both sides.

lower() is the standard normalisation step before comparison or storage. title() capitalises the first letter of each word using a simple rule that misfires on contractions: "it's" becomes "It'S". Treat it as display-only formatting.

lower() applies Unicode full case conversion. For case-insensitive comparison, casefold() is more correct: it applies additional transformations (e.g. German ß becomes ss) that lower() skips. title() capitalises after any non-alphanumeric character, which mishandles contractions and hyphenated names. For correct title casing, implement the logic manually.

Whitespace

python
text = "  hello  "

text.strip()    # "hello"    (both sides)
text.lstrip()   # "hello  "  (left only)
text.rstrip()   # "  hello"  (right only)

strip() removes spaces from both ends of a string. You will use it almost any time you handle user input or text from a file, because stray spaces cause silent failures: "alice" != "alice ".

strip() removes all leading and trailing whitespace: spaces, tabs, and newlines. The directional variants let you clean only one side, useful for stripping a trailing newline without touching indentation. All three accept an optional characters argument to strip specific characters instead.

strip() without arguments removes characters for which str.isspace() returns True, a Unicode-aware set that includes non-ASCII whitespace. With a character argument, it strips any character in that set from both ends (a character membership check, not a prefix match). "xxhelloxx".strip("x") returns "hello". Multi-character arguments strip any of those characters individually, which is a common source of subtle bugs.

Finding

python
text = "Hello, world"

text.find("world")         # 7
text.find("Python")        # -1  (not found)
text.count("l")            # 3
text.startswith("Hello")   # True
text.endswith("world")     # True

find() returns the position where a piece of text starts inside your string. If it is not there, it returns -1. Use startswith() and endswith() when you only care whether the string begins or ends with something specific.

find() returns the start index of the first match, or -1. The -1 convention lets you use the result directly in slicing or arithmetic without a check. startswith() and endswith() each accept a tuple of strings, making it easy to test multiple prefixes or suffixes in one call.

find() is a left-to-right linear scan, O(n*m) in the worst case. index() is identical but raises ValueError on no match: use index() when absence is a programming error, find() when it is expected input. startswith() and endswith() short-circuit on the first mismatch and are faster than a find() or in check for prefix/suffix tests.

Replacing

python
text = "Hello, world"

text.replace("world", "Python")   # "Hello, Python"
text.replace("l", "L")            # "HeLLo, worLd"  (all occurrences)
text.replace("l", "L", 1)         # "HeLlo, world"  (first only)

replace() swaps every occurrence of one piece of text for another and gives you back a new string. The original is not changed. Pass a third argument if you only want to replace the first occurrence.

replace() replaces all non-overlapping occurrences by default. The count argument caps how many get replaced. Since it returns a new string, calls can be chained: text.replace("a", "A").replace("e", "E") applies both substitutions in sequence.

replace() performs a literal substring scan and builds the result in a single allocation when no count is given; with a count it stops early. For pattern-based substitution, Python's re module is the right tool. That is covered in the Modules chapter.

Splitting and joining

split() cuts a string into pieces at a separator and returns them as a list. You tell it what to cut on:

split() partitions at a separator and returns the segments as a list. Called with no argument, it splits on any whitespace run and discards empty strings from multiple consecutive spaces:

split(sep) scans left-to-right, splitting at every non-overlapping occurrence of sep. With no argument it uses a different algorithm: it splits on any run of whitespace and strips leading and trailing whitespace from the result. rsplit(sep, n) splits from the right, useful for isolating the last segment of a dotted path or namespaced identifier:

python
csv_row = "Alice,28,London"
parts = csv_row.split(",")     # ["Alice", "28", "London"]

"  hello   world  ".split()   # ["hello", "world"]

What's a list?

A list is an ordered collection of values. ["Alice", "28", "London"] above is one. Lists get their own chapter; for now, treat them as a sequence of items that split() produces and join() consumes.

join() does the reverse: it combines a list of strings into one. The string before .join() is placed between each item:

python
words = ["Hello", "world"]

" ".join(words)    # "Hello world"
", ".join(words)   # "Hello, world"
"".join(words)     # "Helloworld"

The pattern to remember: separator.join(list_of_strings). The separator goes on the left, the list on the right. " ".join(words) puts a space between each word. "".join(words) glues them with nothing between.

join() is the right tool whenever you are assembling a single string from multiple pieces. It performs a single allocation rather than creating a new string at each step. For two or three strings, + is perfectly fine. Once you have a list of any significant size, reach for join().

join() is O(n): it calls __iter__ once, computes the total length needed in one pass, performs a single allocation, then writes each piece and the separator directly into the buffer. Repeated + is O(n²): each operation allocates a new buffer of the combined length and copies both operands in. CPython has a limited optimisation for repeated += on a single local variable, but it is fragile across refactors and not guaranteed. join() is always correct and always fast.

f-strings

f-strings embed values directly inside text. Put f before the opening quote, then wrap any variable or expression in curly braces. Python fills it in when the code runs. You can also add a colon after the value to control how it is displayed.

f-strings evaluate any expression inside {} at runtime and convert the result to a string. A colon inside the braces introduces a format spec: a compact syntax for controlling decimal places, alignment, and number formatting.

f-strings (PEP 498) compile each {} expression to bytecode that calls format(value, spec), which delegates to value.__format__(spec). Any class that implements __format__ controls its own display inside an f-string. Conversion flags !r, !s, !a apply repr(), str(), or ascii() before the format call.

python
name  = "Alice"
score = 94.5

print(f"Hello, {name}!")           # "Hello, Alice!"
print(f"Score: {score:.1f}%")      # "Score: 94.5%"
print(f"2 + 2 = {2 + 2}")          # "2 + 2 = 4"
print(f"Name: {name.upper()}")     # "Name: ALICE"

The format spec after : controls how the value is displayed:

SpecMeaningExample
.2f2 decimal placesf"{3.14159:.2f}""3.14"
.0%percentage, no decimalsf"{0.94:.0%}""94%"
,thousands separatorf"{1000000:,}""1,000,000"
>10right-align in 10 charsf"{'hi':>10}"" hi"

You will use .2f most: any time you display a decimal and want a tidy number rather than a long run of digits. Everything else in the table is there when you need it. You can put any variable, arithmetic, or method call inside the {}.

.2f and .0% cover most display formatting. The alignment specifiers (>, <, ^) produce tabular output when combined with a width. The general pattern is {value:[align][width][.precision][type]}. Once you recognise the pieces, any spec is readable without memorising all combinations.

The spec is passed verbatim to __format__; built-in types handle it in C. !r is the most useful conversion flag: it calls repr() before formatting, which adds quotes around strings and makes invisible characters (tabs, trailing spaces, newlines) visible as escape sequences. Custom classes can implement __format__ to accept arbitrary spec strings and produce any output.

Multiline strings

To write a string that spans more than one line, use triple quote marks: three " at the start and three at the end. Python preserves all the line breaks and spacing exactly as you wrote them.

Triple-quoted strings preserve all whitespace and line breaks literally. They are standard for long text blocks such as email templates and SQL queries, and for docstrings: the inline documentation placed at the start of a function or class body.

Triple-quoted literals preserve all characters verbatim, including leading whitespace on each line. When used as the first statement in a function, class, or module body, Python stores the string as the __doc__ attribute of that object. Tools such as help() display it; the leading whitespace is typically stripped by textwrap.dedent(). Triple ''' and """ are equivalent; """ is the convention.

python
message = """
Dear Alice,

Thank you for your order.

Best regards,
The Team
"""

Escape sequences

Some characters are hard to type directly inside a string. Python uses escape sequences: a backslash followed by a letter that stands for something. The two you will use constantly are \n for a new line and \t for a tab.

Escape sequences let you embed characters that would otherwise break the syntax or cannot be typed directly. The ones you will reach for: \n (newline), \t (tab), \\ (a literal backslash), \" and \' (quotes inside a matching-delimiter string). Windows paths require backslashes, which collide with escape processing. Prefix with r to disable it.

Python supports the C-style escape set plus Unicode escapes: \uXXXX (16-bit code point), \UXXXXXXXX (32-bit), \xNN (hex byte value), \N{name} (named Unicode character). Raw string literals (r"...") suppress all escape processing, passing every backslash through to the string verbatim. This is essential for Windows paths and regular expressions, where backslashes carry meaning to the consumer rather than to Python's tokeniser.

SequenceCharacter
\nNewline
\tTab
\\Literal backslash
\"Double quote
\'Single quote
python
print("Line one\nLine two")        # two lines of output
print("Name:\tAlice")              # Name:   Alice
path = r"C:\Users\Alice\Documents" # raw string, no escape processing

Checking string contents

Python has methods that answer yes/no questions about what a string contains. They return True or False. The most useful early on: isdigit() lets you check whether a string is all numbers before converting it, so you can avoid a crash on unexpected input.

The is* methods each test a specific property of the entire string and return True only if every character satisfies the condition. Their main use is input validation: check before converting to avoid a crash on unexpected input. isdigit() before int() is the classic pattern.

The is* methods use Unicode category checks, not ASCII ranges. isdigit() returns True for superscript digits and other numeric Unicode code points beyond 0-9. For strict ASCII digit checking, combine s.isascii() and s.isdigit(). isnumeric() is broader still, covering fractions and numeric-valued Unicode characters. Know which one you actually need before reaching for it.

python
"42".isdigit()       # True
"hello".isalpha()    # True
"hello42".isalnum()  # True
"   ".isspace()      # True
"Hello".islower()    # False
"HELLO".isupper()    # True

In practice

Strip whitespace, normalise case, then pull out what you need. This sequence handles almost any user-provided text:

python
raw_input = "  [email protected]  "
email     = raw_input.strip().lower()   # "[email protected]"

at_pos   = email.find("@")
username = email[:at_pos]
domain   = email[at_pos + 1:]

print(f"User:   {username}")    # "alice"
print(f"Domain: {domain}")      # "example.com"

Building a URL from parts and immediately validating and parsing it:

python
BASE_URL = "https://api.example.com"
version  = "v2"
resource = "users"
user_id  = 42

url      = f"{BASE_URL}/{version}/{resource}/{user_id}"
# "https://api.example.com/v2/users/42"

protocol = url.split("://")[0]                    # "https"
secured  = url.startswith("https")
domain   = url.split("://")[1].split("/")[0]      # "api.example.com"

print(f"Protocol : {protocol}")
print(f"Secure   : {secured}")
print(f"Domain   : {domain}")

Parsing a structured log line using find(), slicing, and f-string alignment:

python
log_entry = "[2024-01-15 09:42:11] ERROR: File not found: report.csv"

timestamp = log_entry[1:20]
rest      = log_entry[22:]                # "ERROR: File not found: report.csv"
colon_pos = rest.find(":")
level     = rest[:colon_pos]              # "ERROR"
message   = rest[colon_pos + 2:]          # "File not found: report.csv"

print(f"[{timestamp}] {level:>8}: {message}")
# [2024-01-15 09:42:11]    ERROR: File not found: report.csv

find() locates the boundary, slicing extracts the parts, and the >8 format spec right-aligns the severity label so columns stay consistent when level names differ in length.

Method reference

MethodWhat it does
.lower() / .upper()Convert to all lowercase / all uppercase
.title() / .capitalize()Capitalise each word / only the first
.strip() / .lstrip() / .rstrip()Remove surrounding whitespace
.find(sub)Index of first match, or -1
.count(sub)How many times sub appears
.startswith(s) / .endswith(s)Prefix / suffix check
.replace(old, new)Replace occurrences
.split(sep)Split into a list
sep.join(iterable)Join items into a string
.isdigit() / .isalpha() / .isalnum()Character type checks