4 minute read

I’m working on a machine learning project that involves modeling art auction prices. The data that I’ve collected includes auction results from nearly 40 years. In general, I want to leave realized prices in nominal terms, since the model will predict sales in nominal terms: A major component of how much a work fetches on the secondary market is how long ago the auction occurred. However, as part of my preliminary exploratory data analysis work, it would be useful to have an adjusted currency feature in constant dollars so that I can get a better sense for how certain features correlate with realized price.

For instance, let’s say we’re interested in seeing whether there’s a relationship between size (i.e., painting area) and price realized. Without constant dollars, this would be complicated, since the realized price is a function of when the auction was, as well. But if we’re dealing with constant dollars, then we can compare apples with apples.

To accomplish this currency adjustment, I discovered the cpi module. Let’s take a look at how easy it is to use.

CPI basics

The first time you load cpi, you may get a warning that the CPI data needs to be updated. That’s easily solved by calling cpi.update().

To adjust nominal currency amounts to a constant currency, the fundamental method we need is cpi.inflate().

# Inflate $150 in 1950 to present (i.e., 2022)
cpi.inflate(150, 1950)
1821.504149377593

There you have it: \$150 in 1950 is approximately identical to \$1,821.50 today. It’s as easy as that.

You can also inflate currencies to other years besides the present.

# Inflate $150 in 1950 to 2000
cpi.inflate(150, 1950, to=2000)
1071.7842323651453

CPI and Pandas

For my purposes, I needed to take auction_year and price_realized features from a DataFrame and inflate those values to the present. This is also pretty darn easy.

# Load data
df = pd.read_csv('../../art_auction/data/auction_data_cleaned.csv', index_col=0)

# Select columns of interest
cols = ['artist_name', 'title', 'date', 'auction_year', 'price_realized_USD']

# Preview
df[cols].sample(10)
artist_name title date auction_year price_realized_USD
22892 George Condo NUDE WITH PURPLE HAIR NaN 2017 535772.0
38131 Sam Francis Untitled 1990 1997 16335.0
21527 Chu Teh-Chun NaN NaN 2016 474497.0
28172 Richard Prince Song 2120 South Michigan Avenue 1989 2005 140000.0
52689 Leonora Carrington TUESDAY 1946 2020 1100000.0
8103 Gerhard Richter Porträt Schniewind 1964 2010 1746520.0
38665 Zhang Xiaogang Hills And Spirits NaN 2018 863251.0
2396 Andy Warhol KEY SERVICE (NEGATIVE) 1985 2019 73149.0
4884 Andy Warhol Children Paintings, Parrot 1983 2004 NaN
27498 Zhou Chunya Red mountain rock series - nude 1992 2011 158882.0

As you can see, we have auctions from a variety of years. For each of these works, we’ll need to call cpi.inflate() for the auction_year and price_realized_USD. The best way to do this is with a lambda function. But because the cpi library only supports dates through 2022, we’ll have to make sure we only apply this function to dates prior to then, otherwise it’ll throw an error.

# Create mask for pre-2023 auctions
pre2023 = (df['auction_year'] < 2023)

# Inflate pre-2023 prices
df['adjusted_price'] = (
    df.loc[pre2023].apply(lambda x: cpi.inflate(x['price_realized_USD'], x['auction_year']), axis=1)
)

# Assign 2023 prices directly
df.loc[~pre2023, 'adjusted_price'] = df.loc[~pre2023, 'price_realized_USD']

# Select columns
cols = ['artist_name', 'title', 'date', 'auction_year', 'price_realized_USD', 'adjusted_price']

# Preview
df[cols].sample(25)
artist_name title date auction_year price_realized_USD adjusted_price
40696 Kazuo Shiraga Composition, 1970 2012 NaN NaN
15276 Jean Dubuffet Riant ete 1954 2002 NaN NaN
14711 Yoshitomo Nara Guitar girl 2003 2005 1884.0 2.823154e+03
7215 Zao Wou-Ki Untitled 1958 1998 90771.0 1.629729e+05
9661 Yayoi Kusama Fallen Blossoms NaN 2018 225079.0 2.623204e+05
16962 Joan Mitchell After April, Bernie 1925-1992 2006 688000.0 9.987433e+05
6621 Zao Wou-Ki 16.9.91 1991 2015 2609047.0 3.221502e+06
22988 George Condo Blue Sky Portrait 1997 2013 365000.0 4.585356e+05
43109 Jean-Paul Riopelle Sans titre 1923-2002 2017 1491281.0 1.780478e+06
29026 Rudolf Stingel UNTITLED NaN 2009 15677.0 2.138537e+04
34294 Bernard Buffet Pont dans un village 1928-1999 2018 79099.0 9.218667e+04
40137 Pierre Bonnard Femme au chapeau 1867-1947 2007 360000.0 5.081257e+05
34808 Bernard Buffet Roses 1982 2013 97432.0 1.224001e+05
15081 Jean Dubuffet Réchaud-Four à gaz V 1966 2011 636000.0 8.274625e+05
7798 Gerhard Richter Souvenir (no 48) 1995 2014 80000.0 9.889666e+04
46506 Anselm Kiefer LILITH NaN 2011 494500.0 6.433651e+05
12174 Marc Chagall EVOCATION DU VILLAGE NATAL OU ETUDE POUR LA FE... NaN 2018 259384.0 3.023015e+05
7161 Zao Wou-Ki Sans titre 1983 2003 55513.0 8.829433e+04
9871 Yayoi Kusama Infinity Nets OQWWS 2006 2016 909052.0 1.108462e+06
17386 Henri Matisse Odalisque, brasero et coupe de fruits 1929 1991 NaN NaN
44705 Tom Wesselmann Study for Helen nude 1981 2006 230000.0 3.338822e+05
50151 Robert Ryman Whitney revision painting 2 - diptych 1969 1998 120000.0 2.154515e+05
38974 Donald Judd Untitled (1968-76) NaN 2012 86500.0 1.102584e+05
35428 Bernard Buffet Vase de fleurs 1928-1999 2006 84000.0 1.219396e+05
45905 Georges Braque Les balances NaN 1987 NaN NaN

That’s it! We can see that the adjusted_price feature now has inflated each price_realized_USD feature based on the auction_year.