Of the major data types built into Python, the set is one of the least discussed, but also one of the most powerful. (python for data science baku)A Python set lets you create collections of objects where each object is unique to the collection, and it works with the speed and efficiency of Python’s dictionaries.
However, because Python’s sets are not as widely discussed as its lists or dictionaries, it’s easy to miss out on how sets can make your Python apps smarter and more elegant. Let’s fix that!
TABLE OF CONTENTS
- Python set basics
- Uses for Python sets
- Adding and removing members of Python sets
- Unions and intersections with Python sets
- Differences with Python sets
Python set basics
Sets are defined with a syntax that is reminiscent of Python’s dictionary type:
my_set = {1,2,3,4}
The fact that this looks a little like a dictionary is no accident. You can think of a set as a dictionary that stores only keys, no values. In fact, many of the mechanisms under Python’s hood for sets are built with the same code as for dictionaries.(python for data science baku)
You can also create a set with the set() built-in, which takes any iterable:
my_set = set([1,2,3,4])
Set members can contain any hashable type — basically, any object in Python that can be guaranteed not to change over its lifetime.(python courses baku) Numbers and strings are all OK, as are instances of user-defined classes. (Even if their properties change over time, the instances themselves don’t change.) Again, this is exactly the same as how the keys work in Python’s dictionaries.
If you try to define a set with redundant members, the redundancies will be removed automatically, with previously defined members taking priority. For instance, if we defined my_set as {1,2,3,2,4,5}, the result would be {1,2,3,4,5}.
Uses for Python sets
One powerful and common use for sets is deduplicating the members of a collection or the output generated by an iterable. python for data science baku For instance, if you have a list, you can quickly deduplicate the list by making a set from its contents:
list_1 = [1,2,3,4,3,4,2,4,5,3]
set_1 = set(list_1)
# yields {1,2,3,4,5}
(Note that the original list is preserved.)
This is far faster than iterating through the list and testing for duplicates manually.(python courses baku) You can also do this for any iterable, not just a list, although lists are a common source. If you do this with a string, for instance, you’ll get a set that contains all the unique characters in the string:
s1=”Hello there”
set(s1)
# yields {‘ ‘, ‘r’, ‘l’, ‘t’, ‘e’, ‘h’, ‘o’, ‘H’}
Note that this technique will work only if the objects in the list are all hashable. You’ll get a TypeError if you try to add an unhashable object. Also, there is no parameter you can pass that will ignore unhashable objects, so if you’re in doubt about what’s hashable or not, you’ll have to iterate through the collection and .add() each element manually, testing as you go.
Another common use for sets is to quickly test for the presence of a small collection of objects within a larger collection, or vice versa, by way of the superset/subset methods described below. Note that this works best when the larger of the two collections is something you can convert to a set once and then test against many times, because the overhead of converting a list to a set (especially a long list) might outstrip the performance gains from using sets in the first place. But on the whole, set membership testing is generally faster than iterating through objects and testing membership manually.
Adding and removing members of Python sets
If you want to add and remove members from sets, use the .add() and .remove() methods. For example, my_set.add(5) would update my_set to include 5, and my_set.remove(5) would remove 5 if it were present.
If you try to .remove() something from a set that isn’t there, you’ll get a KeyError — same as if you try to reference a key in a dictionary that doesn’t exist. To remove something without the risk of raising an error if it isn’t there, use .discard() instead of remove().
To drop all elements from a set, you can use .clear(), or reassign the variable to an empty set:
my_set = set()
Unions and intersections with Python sets
Sets support a number of operations where you take two or more sets and generate new ones from them. A union of two sets combines the two into a single set, removing any duplicates:
set_1 = {1,2,3}
set_2 = {4,5,6}
set_3 = set_1.union(set_2)
# yields {1,2,3,4,5,6}
You can also use the pipe operator to perform a union:
set_3 = set_1 | set_2
Again, this is a handy way to perform deduplication across multiple collections of items.
An intersection generates a new set from only the elements common to multiple sets:
set_1 = {1,2,3}
set_2 = {2,3,4}
set_3 = set_1.intersection(set_2)
# yields {2,3}
The & operator can also be used to combine two sets (union):
set_3 = set_1 & set_2
Many set operations can be expressed with operators, which we’ll illustrate below.
Differences with Python sets
if you want to find out which members two sets don’t have in common, you can use the difference() method:
set_1 = {1,2,3}
set_2 = {4,5,6}
set_3 = set_1.difference(set_2)
# yields {1,2,3}
set_3 = set_1 – set_2
# different way to express same operation
One way to express this in English might be, “Create a new set that has everything in set 1 that isn’t in set 2.”
By contrast, if we used set_3 = set_2.difference(set_1), the results would be {4,5,6}.
Python sets also support symmetric difference operations. The symmetric difference returns elements that are in one set or the other, but not both.
set_1 = {1,2,3,4}
set_2 = {4,5,6,7}
set_3 = set_1.symmetric_difference(set_2)
# yields {1, 2, 3, 5, 6, 7}
set_3 = set_1 ^ set_2
# operator version
Supersets and subsets in Python
You’re probably familiar by now with Python’s in operator, which you can use to search for the presence of a character in a string or an object in a list. Sets support in as well:
set_1 = {1,2,3,4}
1 in set_1 # this is True
5 in set_1 # this is False
What if you wanted to test for the presence of all the elements of one set inside another set? You can’t use in for that — Python will think you’re testing for the presence of the entire set object, not its individual elements. Fortunately, Python does provide ways to check such things with other set methods:
set_1 = {1,2,3,4}
set_2 = {1,2}
# Tests if members of set_2 are in set_1:
set_2.issubset(set_1)
# Operator version:
set_2 <= set_1
# Tests if set_1 contains all members of set_2:
set_1.issuperset(set_2)
# Operator version:
set_1 >= set_2
Set updates in Python
Up until now we’ve only explored how to generate new sets from intersections or differences of existing sets.(data science machine learning baku) Python also lets you update a set in-place with intersections or differences:
# In-place update of set_1 with set_2:
set_1 |= set_2
# In-place intersection of set_1 with set_2;
set_1 &= set_2
# In-place difference of set_1 with set_2:
set_1 -= set_2
# In-place symmetric difference of set_1 with set_2:
set_1 ^= set_2
In-place updates are handy when you’re dealing with a very large set, and you don’t want to create an entirely new instance of the set (with all the overhead that goes with such an operation). Instead, you can make the changes directly to the existing set, which is more efficient.
Frozen sets in Python
I mentioned before how sets can only be made of things that are hashable. Since sets are mutable, they can’t themselves be used as set elements or dictionary keys.(data science machine learning baku) But there is a variety of set called the frozen set that isn’t mutable, and so can be used as a set element, as a dictionary key, or in any other context where you need a hashable type.
To create a frozen set, just use frozenset() to generate one from an existing set or iterable:
set_1 = {1,2,3,4}
f_set = frozenset(set_1)
set_2 = {f_set,2,3,4}
Note that once you create a frozen set, it can’t be altered.(python courses baku) The .add() and .remove() methods won’t work on a frozen set.(data science machine learning baku) You can use a frozen set to generate set intersections or differences, as long as you don’t try to store the results of such operations in-place.