An Interviewer’s Favorite Question: “How Are Python Strings Stored in Internal Memory”

Understand the internal implementation of Python strings

Shubh Patni
Better Programming

--

Block letters
Photo by Ryan Wallace on Unsplash

This article was co-authored with Muhammad Abutahir, You can find him on linkedin and instagram.

Strings! One of the most favorite topics for all the programming interviewers, and loved by everyone who starts programming no matter what programming language they choose. Playing with strings is extremely interesting, but do you know how Python stores the strings internally?

What if I ask you a question like “Are duplicates allowed in strings?”. Most of you would say yes! And would give an example like “Mommy.” We can see here that the character ‘m’ is repeating, but is that really the case?

In this article, I will give you a very clear picture of how strings are stored internally inside memory, and I promise your perspective will change completely regarding strings.

One important piece of advice that I would like to give to the readers is that understanding a programming language from a memory perspective is the most efficient way of learning a programming language! I bet you’ll hardly forget the core concepts of programming once you try this out.

With that said, let’s move on to the actual topic.

A string in Python is just a sequence of Unicode characters enclosed within quotes. Remember that in Python there can be single quotes, double quotes, or even triple single or triple double quotes.

When it comes to Python, strings are extremely efficient in terms of memory cost. So let’s understand the reason!

If you dig deeper, it turns out that strings use ‘Interned Dictionary.’ It’s a simple dictionary that stores the character as the key and the address as the value. Let’s understand this with the help of an example:

s = “Hello world”

In the above line, I created a string Hello world and stored it in a variable called s. Abstractly, we can visualize this as it is represented inside memory as shown below.

Image created by author ‘Muhammad Abutahir’: Representation of the object in memory

Now let’s see what actually happens internally and how an interned dictionary works. Let me give you an example by creating a single character string s1 and assigning it to a new variable s2.

s1 = ‘A’ #Single character string
s2 = s1
print(s1)# A
print(s2)# A
print(id(s1))#12345
print(id(s2))#12345
Image created by author Muhammad Abutahir’: Showing string interning

Okay! let’s break down the above image: when we created the first string s1, a string object gets created inside the memory, after this starts the process of string interning. Python will first look up into the interned dictionary if the character ‘A’ exists, as it was empty initially. A new key-value pair gets created, the character ‘A’ is set as the key and the location of the object in which it resides is set as the value that is 123.

Note: 123 is just an assumed id.

In the next step, when we assigned the string s1 to s2, the address present in s1 is sent to s2 and s2 starts to point towards the same object! We call this a reference-type assignment.

Image created by author ‘Muhammad Abutahir’: the reference type assignment and string interning

Okay, now that’s clear! But why did I tell you that strings are extremely memory efficient? Here’s why.

Why Strings Are Extremely Memory Efficient

As we saw earlier, strings use an interned dictionary, and it’s very similar to a normal dictionary. We know dictionaries don’t allow duplicate keys, so a key should be unique! Now how does that concept apply here in strings?

Let’s understand it clearly with the help of a ‘Multi-character’ string, shown below:

s1 = ‘Hello’ #Multi character string
s2 = ‘World’
print(s1)#Hello
print(s2)#World
print(id(s1))#123123
print(id(s2))#454545

The above concept is very clear, but what would happen if I print the ids of a character that is present in both the strings as common?

# Printing the ids of character ‘o’
print(id(s1[4]))#1004
print(id(s2[1]))#1004

Wow! That’s unusual, right? How can a character in different objects with unique addresses have the same id?! Let’s understand it with the help of the below figure:

Image created by author ‘Muhammad Abutahir’:The complete string interning process

So, I started with the creation of a string s1, it’s important to understand that the process of string interning starts simultaneously as the objects are created. A multi-character string is a complete object but also from the figure above you can notice that individual characters are also objects and they have their own unique ids.

In the process of string interning, the individual characters get created in the memory. Python will look into the interned dictionary to see if those characters are already present, and if they are not present, an object is created and the address along with the character as key are stored in the interned dictionary.

In the above image, our string starts from H, so Python looks into the container. Because it is empty, it stores the H as the key and its address as the value in it. Next, the same thing repeats for the following two letters E and L. The next letter is L again, so Python looks into the dictionary. As it is already present, Python does not create a new object, rather it returns the address of the previous L to the index location and this process continues.

The most interesting part is that this is not the case with just individual strings stored in different variables! There is only one common interned dictionary that is used by the whole Python program itself. Thus, even if the strings are present in the different variables, they all will share the same addresses for the unique characters present in the interned dictionary! This will make it extremely memory-efficient! Also, about duplicates, they aren’t allowed when you think from a memory perspective.

Summary

In this article, I discussed the internal implementation of strings and the process of string interning in Python. As I have mentioned before, understanding a programming language from its memory perspective is the secret of mastering the fundamental concepts of that language.

String interning is a process of ensuring that only a single memory location is allocated for a single unique character, and in the future, if the same character occurs, then it will return the previously stored address.

--

--