How to Perform Element-Wise Operations Between Multiple DataFrames
How do you perform element-wise operations on a DataFrame and another element data structure?
Take the following dataframe:
data={
'hot_dogs': [5, 11, 3, 14, 15],
'pizza':[9, 2, 11, 12, 6],
'burgers':[8, 9, 7, 5, 6]
}
years = pd.Series([2015, 2016, 2017, 2018, 2019], name='year')
df = pd.DataFrame(data=data, index=years)
df
hot_dogs | pizza | burgers | |
---|---|---|---|
year | |||
2015 | 5 | 9 | 8 |
2016 | 11 | 2 | 9 |
2017 | 3 | 11 | 7 |
2018 | 14 | 12 | 5 |
2019 | 15 | 6 | 6 |
Let’s say that, for each year, you want to compute each item as a proportion of the total items sold. In 2015, for instance, we’d expect a proportion of .23 for hot dogs (5 / (5 + 9 + 8)
) and .41 for pizza (9 / (5 + 9 + 8)
).
It’s simple enough to find the sum of each row:
annual_sum = df.sum(axis=1)
annual_sum
year
2015 22
2016 22
2017 21
2018 31
2019 27
dtype: int64
So just a matter of dividing our dataframe by the sum of each of its rows, right?
df / annual_sum
2015 | 2016 | 2017 | 2018 | 2019 | burgers | hot_dogs | pizza | |
---|---|---|---|---|---|---|---|---|
year | ||||||||
2015 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2016 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2017 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2018 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2019 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
Uh oh, what the heck happened?
Well, Pandas is attempting to align the columns of df
(‘pizza’, ‘burgers’, ‘hot_dogs’) with the index of annual_sum
(‘2015’, ‘2016’, ‘2017’, ‘2018’, ‘2019’) and then perform the division, but the two do not align. For this operation to work, each pizza
value in df
would need to be divided be a value sharing that same index pizza
.
Although it doesn’t serve our purposes here, the following illustrates this behaviour:
foods = pd.Series([5, 10, 20], index=['pizza', 'burgers', 'pie'])
foods
pizza 5
burgers 10
pie 20
dtype: int64
df / foods
burgers | hot_dogs | pie | pizza | |
---|---|---|---|---|
year | ||||
2015 | 0.8 | NaN | NaN | 1.8 |
2016 | 0.9 | NaN | NaN | 0.4 |
2017 | 0.7 | NaN | NaN | 2.2 |
2018 | 0.5 | NaN | NaN | 2.4 |
2019 | 0.6 | NaN | NaN | 1.2 |
Now, each value in df['burgerss']
is divided by 10 (the burgers
-indexed value in foods
), and each value in df['pizza']
is divded by 5 (the pizza
-indexed value in foods
). But notice the NaNs that result: NaNs for the df['hot_dogs']
values, which in this case do not have a matching divisor in foods
, and NaN’s for a new column pie
, since there was no matching dividend in df
.
This is not what we want. To perform this division operation elementwise for each row, we’d instead have to use an object method: pd.DataFrame.div
.
On it’s own, this method produces a behavior idential to that which we observed before:
df.div(annual_sum)
2015 | 2016 | 2017 | 2018 | 2019 | burgers | hot_dogs | pizza | |
---|---|---|---|---|---|---|---|---|
year | ||||||||
2015 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2016 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2017 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2018 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2019 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
Why? Because by default the axis 1 or columns, which means that pandas is once again attempting to match the columns of df
(‘pizza’, ‘burgers’, ‘hot_dogs’) with the index of annual_sum
(‘2015’, ‘2016’, ‘2017’, ‘2018’, ‘2019’), which we already saw doesn’t work.
All we have to do is specify axis=0
so that we’re matching the index of df
to annual sum, and in this case there is a match, since we’re performing division on each 2015
value of df
and the matching 2015
value of annual_sum
, and so on:
df.div(annual_sum, axis=0)
hot_dogs | pizza | burgers | |
---|---|---|---|
year | |||
2015 | 0.227273 | 0.409091 | 0.363636 |
2016 | 0.500000 | 0.090909 | 0.409091 |
2017 | 0.142857 | 0.523810 | 0.333333 |
2018 | 0.451613 | 0.387097 | 0.161290 |
2019 | 0.555556 | 0.222222 | 0.222222 |
Now we know that pizza apparently accounted for almsot 41% of all sales in 2015 but only 9% the following year–weird!