list_1 = list(range(25))
print(list_1)[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]
Yarri Bryn
January 1, 2023
Plus some bonus Python coding

Photo by Ameen Fahmy on Unsplash
The main reason I’m starting this blog is because I’ve been hesitant to do so. I prefer not to share too much information, so this is a sort of personal challenge. Also, Jeremy and Rachel over at FastAI are adament about blogging as a means of learning deep learning (or any topic for that matter). I agree with this perspective despite maintaining that the idea of blogging is better than actually doing it. I’ll keep tabs on this perspective over time.
I’m going to do things a little differently, in my own style, and will probably break a bunch of blogging guidelines. I’m going to be mixing thoughts and ideas with data science code snippets and projects. While I have some ideas for upcoming posts I also am likely to interject random tutorials which I would consider things that are useful to me.
The goal of this isn’t to make money or anything like that. It is simply to get comfortable being uncomfortable in posting content and sharing thoughts, ideas and projects. The personal gain I am seeking will be the product of committing to the habit and practicing it.
So with that, lets go through a little python code that I find to be indespensible: comprehensions. I find myself interjecting list and dictionary comprehensions in all manner of code I write in Python. For instance, if I need to: filter some data based on a condition, manipulate dictionary keys and values, or update Pandas Dataframe column names. I’ll show those examples below, in contrast with some other methods.
Perform calculations on a list
First we need a list to work with, which is easy enough to create in python:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]
Ok, so we have a list. Lets find the odd numbers using loops:
[1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23]
Well that works, but kind of long winded. A list comp can shorten it up for us:
[1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23]
Similar methods also work for nested lists. It should be noted however, that comprehensions can be harder to read, so extensively nested statements might be best being split up or incorporating some sort of loop or helper function.
Getting some corpus - body of text - into a dictionary of word counts
Well we need a dict, so lets take some text and do a word count. We will use the first 2 paragraphs from the Wikipedia entry for Gandalf
gandalf = """
Gandalf is a protagonist in J. R. R. Tolkien's novels The Hobbit and The Lord of the Rings.
He is a wizard, one of the Istari order, and the leader of the Fellowship of the Ring. Tolkien took the
name "Gandalf" from the Old Norse "Catalogue of Dwarves" (Dvergatal) in the Völuspá.
As a wizard and the bearer of one of the Three Rings, Gandalf has great power, but works mostly by
encouraging and persuading. He sets out as Gandalf the Grey, possessing great knowledge and
travelling continually. Gandalf is focused on the mission to counter the Dark Lord Sauron by
destroying the One Ring. He is associated with fire; his ring of power is Narya, the Ring of
Fire. As such, he delights in fireworks to entertain the hobbits of the Shire, while in great
need he uses fire as a weapon. As one of the Maiar, he is an immortal spirit from Valinor,
but his physical body can be killed.
"""
gandalf'\nGandalf is a protagonist in J. R. R. Tolkien\'s novels The Hobbit and The Lord of the Rings. \nHe is a wizard, one of the Istari order, and the leader of the Fellowship of the Ring. Tolkien took the \nname "Gandalf" from the Old Norse "Catalogue of Dwarves" (Dvergatal) in the Völuspá.\n\nAs a wizard and the bearer of one of the Three Rings, Gandalf has great power, but works mostly by \nencouraging and persuading. He sets out as Gandalf the Grey, possessing great knowledge and \ntravelling continually. Gandalf is focused on the mission to counter the Dark Lord Sauron by \ndestroying the One Ring. He is associated with fire; his ring of power is Narya, the Ring of \nFire. As such, he delights in fireworks to entertain the hobbits of the Shire, while in great \nneed he uses fire as a weapon. As one of the Maiar, he is an immortal spirit from Valinor, \nbut his physical body can be killed.\n'
Well as is we have some work to do on the text to clean it up, as is often required. So lets do that. we need to strip special characters, quotes, and some other things.
Steps: - use a regular expression (regex) to strip out punctuaton - convert everything to lower
import re
gandalf_filtered = re.sub(r'[^\w\s]|\n', '', gandalf).lower().split(' ')
print(gandalf_filtered)['gandalf', 'is', 'a', 'protagonist', 'in', 'j', 'r', 'r', 'tolkiens', 'novels', 'the', 'hobbit', 'and', 'the', 'lord', 'of', 'the', 'rings', 'he', 'is', 'a', 'wizard', 'one', 'of', 'the', 'istari', 'order', 'and', 'the', 'leader', 'of', 'the', 'fellowship', 'of', 'the', 'ring', 'tolkien', 'took', 'the', 'name', 'gandalf', 'from', 'the', 'old', 'norse', 'catalogue', 'of', 'dwarves', 'dvergatal', 'in', 'the', 'völuspáas', 'a', 'wizard', 'and', 'the', 'bearer', 'of', 'one', 'of', 'the', 'three', 'rings', 'gandalf', 'has', 'great', 'power', 'but', 'works', 'mostly', 'by', 'encouraging', 'and', 'persuading', 'he', 'sets', 'out', 'as', 'gandalf', 'the', 'grey', 'possessing', 'great', 'knowledge', 'and', 'travelling', 'continually', 'gandalf', 'is', 'focused', 'on', 'the', 'mission', 'to', 'counter', 'the', 'dark', 'lord', 'sauron', 'by', 'destroying', 'the', 'one', 'ring', 'he', 'is', 'associated', 'with', 'fire', 'his', 'ring', 'of', 'power', 'is', 'narya', 'the', 'ring', 'of', 'fire', 'as', 'such', 'he', 'delights', 'in', 'fireworks', 'to', 'entertain', 'the', 'hobbits', 'of', 'the', 'shire', 'while', 'in', 'great', 'need', 'he', 'uses', 'fire', 'as', 'a', 'weapon', 'as', 'one', 'of', 'the', 'maiar', 'he', 'is', 'an', 'immortal', 'spirit', 'from', 'valinor', 'but', 'his', 'physical', 'body', 'can', 'be', 'killed']
Ok, so we have this list of words. Now what? Well to get the word counts we have a few options: - Cheat and use collections.Counter - Use a for loop
In either case we can get our word counts into a dict, but using a dictionary comprehension isn’t super efficient because we need to do some information retrieval from the dict. An alternative is using collections.defaultdict.
# method 1: collections.Counter
from collections import Counter
gandalf_wc_v1 = Counter(gandalf_filtered)
print(gandalf_wc_v1)Counter({'the': 20, 'of': 11, 'is': 6, 'he': 6, 'gandalf': 5, 'and': 5, 'a': 4, 'in': 4, 'one': 4, 'ring': 4, 'as': 4, 'great': 3, 'fire': 3, 'r': 2, 'lord': 2, 'rings': 2, 'wizard': 2, 'from': 2, 'power': 2, 'but': 2, 'by': 2, 'to': 2, 'his': 2, 'protagonist': 1, 'j': 1, 'tolkiens': 1, 'novels': 1, 'hobbit': 1, 'istari': 1, 'order': 1, 'leader': 1, 'fellowship': 1, 'tolkien': 1, 'took': 1, 'name': 1, 'old': 1, 'norse': 1, 'catalogue': 1, 'dwarves': 1, 'dvergatal': 1, 'völuspáas': 1, 'bearer': 1, 'three': 1, 'has': 1, 'works': 1, 'mostly': 1, 'encouraging': 1, 'persuading': 1, 'sets': 1, 'out': 1, 'grey': 1, 'possessing': 1, 'knowledge': 1, 'travelling': 1, 'continually': 1, 'focused': 1, 'on': 1, 'mission': 1, 'counter': 1, 'dark': 1, 'sauron': 1, 'destroying': 1, 'associated': 1, 'with': 1, 'narya': 1, 'such': 1, 'delights': 1, 'fireworks': 1, 'entertain': 1, 'hobbits': 1, 'shire': 1, 'while': 1, 'need': 1, 'uses': 1, 'weapon': 1, 'maiar': 1, 'an': 1, 'immortal': 1, 'spirit': 1, 'valinor': 1, 'physical': 1, 'body': 1, 'can': 1, 'be': 1, 'killed': 1})
# method 2: for loop
gandalf_wc_v2 = {}
# Count number of times each word comes up in list of words (in dictionary)
for w in gandalf_filtered:
if w not in gandalf_wc_v2.keys():
gandalf_wc_v2[w] = 1
else:
gandalf_wc_v2[w] += 1
print(gandalf_wc_v2){'gandalf': 5, 'is': 6, 'a': 4, 'protagonist': 1, 'in': 4, 'j': 1, 'r': 2, 'tolkiens': 1, 'novels': 1, 'the': 20, 'hobbit': 1, 'and': 5, 'lord': 2, 'of': 11, 'rings': 2, 'he': 6, 'wizard': 2, 'one': 4, 'istari': 1, 'order': 1, 'leader': 1, 'fellowship': 1, 'ring': 4, 'tolkien': 1, 'took': 1, 'name': 1, 'from': 2, 'old': 1, 'norse': 1, 'catalogue': 1, 'dwarves': 1, 'dvergatal': 1, 'völuspáas': 1, 'bearer': 1, 'three': 1, 'has': 1, 'great': 3, 'power': 2, 'but': 2, 'works': 1, 'mostly': 1, 'by': 2, 'encouraging': 1, 'persuading': 1, 'sets': 1, 'out': 1, 'as': 4, 'grey': 1, 'possessing': 1, 'knowledge': 1, 'travelling': 1, 'continually': 1, 'focused': 1, 'on': 1, 'mission': 1, 'to': 2, 'counter': 1, 'dark': 1, 'sauron': 1, 'destroying': 1, 'associated': 1, 'with': 1, 'fire': 3, 'his': 2, 'narya': 1, 'such': 1, 'delights': 1, 'fireworks': 1, 'entertain': 1, 'hobbits': 1, 'shire': 1, 'while': 1, 'need': 1, 'uses': 1, 'weapon': 1, 'maiar': 1, 'an': 1, 'immortal': 1, 'spirit': 1, 'valinor': 1, 'physical': 1, 'body': 1, 'can': 1, 'be': 1, 'killed': 1}
I will use either method, but tend to prefer writing less code that I have to maintain. If there is a helper class or function that is in a stable release of a library, it makes life easier to use it.
Because it doesn’t matter much which one we use for our example, we’ll just grab gandalf_wc_v2 and get the top N values that exceed a certain word length. There are a ton of ways to do this, we will just use plain python.
We could have done this in one more line on our
collections.Countermethod call like this:gandalf_wc_v1.most_common(n=5)
But that is too easy, so let’s do it longhand:
# key value pairs where key is >= 3
gandalf_wl_geq_3 = {k:v for k,v in gandalf_wc_v2.items() if len(k) >= 3}
# top N counts
n = 5
top_n = sorted(gandalf_wl_geq_3.values(), reverse=True)[:n]
#[20, 5, 5, 4, 4]
# finally, a fun use of a dict comp:
{k:v for k,v in gandalf_wl_geq_3.items() if v in top_n}{'gandalf': 5, 'the': 20, 'and': 5, 'one': 4, 'ring': 4}
Last but perhaps one of the best, comprehensions on Pandas Dataframe columns
lets create a dataframe with some sample data. To make it interesting I’ll use some comprehensions and other native python capabilities to create a dataset for this example, instead of using Iris or Housing.
import pandas as pd
import numpy as np
from itertools import combinations
yrs = range(2018,2022,1)
cities = 'Bozeman, MT', 'Spokane, WA', 'Bangor, ME', 'White Plains, NY', 'Sedona, AZ'
cat1 = ['wizard', 'ranger', 'elf', 'hutt', 'orc', 'nazgul', 'numenorean', 'deciever']
colnames = 'year','city','role','points'
ds = [[y,c,k, np.random.randint(-10, 10)] for k in cat1 for c in cities for y in yrs]
df = pd.DataFrame(ds, columns=colnames)
display(df.head(3), df.shape)| year | city | role | points | |
|---|---|---|---|---|
| 0 | 2018 | Bozeman, MT | wizard | 3 |
| 1 | 2019 | Bozeman, MT | wizard | -9 |
| 2 | 2020 | Bozeman, MT | wizard | -8 |
(160, 4)
So we have some meaningless data. Now lets group by to get a multiindex that we want to manipulate, the whole point of this little exercise.
dfg = df.groupby(by=['city', 'role']).agg({'points': ['mean','sum', np.std]}).reset_index()
dfg.head(3)| city | role | points | |||
|---|---|---|---|---|---|
| mean | sum | std | |||
| 0 | Bangor, ME | deciever | 4.50 | 18 | 3.872983 |
| 1 | Bangor, ME | elf | 0.75 | 3 | 7.889867 |
| 2 | Bangor, ME | hutt | -1.25 | -5 | 5.560276 |
So this isn’t super useful for anything. Now normally some dataset for machine learning or what not will have a much broader set of columns (a.k.a. features), but the concept is pretty much the same. Just remember to work on manageable chunks and don’t get intimidated by a long chain of transformations.
Initial column list: MultiIndex([( 'city', ''),
( 'role', ''),
('points', 'mean'),
('points', 'sum'),
('points', 'std')],
)
So first I’ll demonstrate updating this with a list comprehension. This is a bit more complex of a list comprehension in that in incorporates a conditional ''.join in the output. Basically what is happening is we are looking at each tuple in the multiindex, and if the last item is an empty string, we underscore join all but the lst element, otherwise we join the whole thing.
dfg_copy1 = dfg.copy()
dfg_copy1.columns = ['_'.join(list(x) if len(x[-1]) > 0 else x[:-1]) for x in dfg_copy1.columns]
dfg_copy1.head(2)| city | role | points_mean | points_sum | points_std | |
|---|---|---|---|---|---|
| 0 | Bangor, ME | deciever | 4.50 | 18 | 3.872983 |
| 1 | Bangor, ME | elf | 0.75 | 3 | 7.889867 |
There is another way, however, that is pretty clever. It is more of a functional style and I’m fairly certain I’ve seen it used in the Fast AI course or notebooks in addition to numerous tutorials on the internet. We will be using the map function over our columns.
dfg_copy2 = dfg.copy()
dfg_copy2.columns = dfg_copy2.columns.map(lambda x : '_'.join(x) if x[-1] != '' else x[0]))
dfg_copy2.head(3)| city | role | points | points | points | |
|---|---|---|---|---|---|
| 0 | Bangor, ME | deciever | 4.50 | 18 | 3.872983 |
| 1 | Bangor, ME | elf | 0.75 | 3 | 7.889867 |
| 2 | Bangor, ME | hutt | -1.25 | -5 | 5.560276 |
While this is really quick and easy if there isn’t any weird conditions (e.g. you can just use {dataframe_name}.columns.map('_'.join)), above we only have the multiindex on some columns, so handling the city and role columns differently is something useful to do for readbility. So there you have it, a bonus method of map + lambda to achieve the same goal as a list comprehension.
Well for now that is plenty of information. Hopefully this helps a future task of yours regardless of your proficiency in technical terminonology.