How to Perform Element-Wise Operations Between Multiple DataFrames

February 27, 2023 3 minute read

How do you perform element-wise operations on a DataFrame and another element data structure?

Take the following dataframe:

data={
    'hot_dogs': [5, 11, 3, 14, 15],
    'pizza':[9, 2, 11, 12, 6],
    'burgers':[8, 9, 7, 5, 6]
}
years = pd.Series([2015, 2016, 2017, 2018, 2019], name='year')

df = pd.DataFrame(data=data, index=years)

df

	hot_dogs	pizza	burgers
year
2015	5	9	8
2016	11	2	9
2017	3	11	7
2018	14	12	5
2019	15	6	6

Let’s say that, for each year, you want to compute each item as a proportion of the total items sold. In 2015, for instance, we’d expect a proportion of .23 for hot dogs (5 / (5 + 9 + 8)) and .41 for pizza (9 / (5 + 9 + 8)).

It’s simple enough to find the sum of each row:

annual_sum = df.sum(axis=1)
annual_sum

year
  22
  22
  21
  31
  27
dtype: int64

So just a matter of dividing our dataframe by the sum of each of its rows, right?

df / annual_sum

	2015	2016	2017	2018	2019	burgers	hot_dogs	pizza
year
2015	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2016	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2017	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2018	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2019	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

Uh oh, what the heck happened?

Well, Pandas is attempting to align the columns of df (‘pizza’, ‘burgers’, ‘hot_dogs’) with the index of annual_sum (‘2015’, ‘2016’, ‘2017’, ‘2018’, ‘2019’) and then perform the division, but the two do not align. For this operation to work, each pizza value in df would need to be divided be a value sharing that same index pizza.

Although it doesn’t serve our purposes here, the following illustrates this behaviour:

foods = pd.Series([5, 10, 20], index=['pizza', 'burgers', 'pie'])
foods

pizza       5
burgers    10
pie        20
dtype: int64

df / foods

	burgers	hot_dogs	pie	pizza
year
2015	0.8	NaN	NaN	1.8
2016	0.9	NaN	NaN	0.4
2017	0.7	NaN	NaN	2.2
2018	0.5	NaN	NaN	2.4
2019	0.6	NaN	NaN	1.2

Now, each value in df['burgerss'] is divided by 10 (the burgers-indexed value in foods), and each value in df['pizza'] is divded by 5 (the pizza-indexed value in foods). But notice the NaNs that result: NaNs for the df['hot_dogs'] values, which in this case do not have a matching divisor in foods, and NaN’s for a new column pie, since there was no matching dividend in df.

This is not what we want. To perform this division operation elementwise for each row, we’d instead have to use an object method: pd.DataFrame.div.

On it’s own, this method produces a behavior idential to that which we observed before:

df.div(annual_sum)

	2015	2016	2017	2018	2019	burgers	hot_dogs	pizza
year
2015	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2016	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2017	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2018	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2019	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

Why? Because by default the axis 1 or columns, which means that pandas is once again attempting to match the columns of df (‘pizza’, ‘burgers’, ‘hot_dogs’) with the index of annual_sum (‘2015’, ‘2016’, ‘2017’, ‘2018’, ‘2019’), which we already saw doesn’t work.

All we have to do is specify axis=0 so that we’re matching the index of df to annual sum, and in this case there is a match, since we’re performing division on each 2015 value of df and the matching 2015 value of annual_sum, and so on:

df.div(annual_sum, axis=0)

	hot_dogs	pizza	burgers
year
2015	0.227273	0.409091	0.363636
2016	0.500000	0.090909	0.409091
2017	0.142857	0.523810	0.333333
2018	0.451613	0.387097	0.161290
2019	0.555556	0.222222	0.222222

Now we know that pizza apparently accounted for almsot 41% of all sales in 2015 but only 9% the following year–weird!

Tech:

Python Pandas

GitHub Repo

Zach Rottman, PhD

Tech: