Questions home owners often ask themselves:
If I suddenly get $30,000, should I put it towards my mortgage payments, or should I invest it elsewhere?
After paying all my expenses, should I put all my extra money into the house or should I invest it?
People have strong beliefs about this. I decided to do a quantitative analysis, using the S&P 500 as the investment vehicle.
import pandas as pd
from scipy.stats.mstats import gmean
from numpy import roll
import numpy as np
import plotly.plotly as ply
import pylab as plt
import plotly.tools as tls
from scipy.optimize import fsolve
import seaborn as sn
%matplotlib inline
Ultimately, these are the Shiller data. http://www.econ.yale.edu/~shiller/data.htm
raw = pd.read_csv("data.csv", index_col=0)
raw.head()
The SP500 column has the value of the S&P 500.
snp500 = raw["SP500"]
I want to calculate the annualized rate of return, so I divide the following year's entry by today's entry.
aror_snp500 = (snp500.shift(-12)/snp500 - 1)*100
aror_snp500 = aror_snp500.dropna()
aror_snp500.tail(20)
The number for 2013-01-01 is the rate of return for the year beginning in 2013-01-01! Likewise for 2013-10-01, etc. This is an annualized return, but rolling for each month.
So for 2013, starting Jan 1st, the annual return was 23.0992%
# We only want the January periods.
ror_snp500s = aror_snp500[0::12]
ror_snp500s.tail()
A common question is: Over a 10 year period, what was the average annual rate of return?
Many people make the mistake of taking each year's rate of return, and computing the mean. This is plain wrong. If you gain 10% one year and lose 10% the next year, the mean tells you your equivalent rate of return is 0, which means you have the same amount of money as when you started. If you do the math, you'll find you actually lost 1%.
The correct formula is:
$$r=\left(\prod_{i=1}^{n}\left(1+r_{i}\right)\right)^{1/n}-1$$Over here we assume that $r_{i}$ is the annual rate of return for year $i$, and is expressed as a fraction. So 10% would be 0.1.
In words, this is just the geometric mean.
def avgrate(arr):
"""
Given an array of annual returns, compute the equivalent annual
rate of return across all those years. The value it returns is
expressed as a percentage, not a decimal.
"""
return (gmean(1 + arr/100) - 1) * 100
I want to add new columns in the data frame that have a rolling annualized average rate of return for a 5, 10, 20 and 30 year period.
def nwindow(df, years, src_label, new_label):
"""
Compute the rolling rate of return over the last so many years.
"""
ror = pd.rolling_apply(df[src_label], years, avgrate)
# Uncomment the line below to make it show the rate at the START
# of the investing period.
df[new_label] = roll(ror, -(years-1))
# Uncomment the line below to make it show the rate at the END
# of the investing period.
# df[new_label] = roll(ror, 1)
# Convert the series into a dataframe.
ror_snp500 = pd.DataFrame(ror_snp500s)
nwindow(ror_snp500, 5, "SP500", "05yrs")
nwindow(ror_snp500, 10, "SP500", "10yrs")
nwindow(ror_snp500, 20, "SP500", "20yrs")
nwindow(ror_snp500, 30, "SP500", "30yrs")
ror_snp500.tail(30)
So for 5 years beginning in 2009, the effective annual rate of return was 16.1%. For the 10 years beginning in 2004, it was only 4.87% per year.
ror_snp500.index = pd.to_datetime(ror_snp500.index)
SCALE = 3
def make_plot(df, title, ylabel, filename, new_fig=True, save_fig=True):
# sn.axes_style("darkgrid")
if new_fig:
f = plt.figure()
FONTSIZE = 15
ax = df.plot(fontsize=FONTSIZE,
sort_columns=True, title=title, figsize=(4*SCALE,3*SCALE))
plt.ylabel(ylabel)
ax.set_title(title, fontsize=FONTSIZE)
plt.legend(loc=0, prop={'size':FONTSIZE})
ax.yaxis.label.set_size(FONTSIZE)
plt.grid(True)
if save_fig:
fig = ax.get_figure()
fig.savefig("plots/%s.png" % filename)
make_plot(ror_snp500[[x for x in ror_snp500.columns if x!="SP500"]],
"Historic annualized rate of return (rolling window)",
"Annualized rate of return (%)", "ror_ls")
for num_years in "30 20 10 05".split():
make_plot(ror_snp500["%syrs" % num_years],
"%syr window" % num_years,
"Annualized rate of return (%)",
"ror_ls_%s" % num_years)
These charts look purely at the S&P 500 returns, and assume the dividends are not reinvested.
How should we account for the reinvestment of dividends? I used the approach in the link above where the dividends for each month are divided by 12, invested at the current S&P 500 "price", and then at the end of the year all these "shares" are added up and "sold" at the new price (Jan 1).
I'm assuming the Dividend column is the amount in Dollars one would receive if they owned 1 "share" of the S&P 500 fund?
nraw = raw[["SP500", "Dividend"]]
nraw["Dividend"] = nraw["Dividend"]/12.
nraw["Div Shares"] = nraw["Dividend"]/nraw["SP500"]
nraw["Div Sum"] = pd.rolling_sum(nraw["Div Shares"], 12).shift(1)
rord = (nraw.shift(-12)["SP500"]*(1+nraw.shift(-12)["Div Sum"])/nraw["SP500"]-1)*100
rord[1480:1500]
This means that in 1995, we got an effective rate of return of 35.4% if we include dividends. Look at the 1995-01-01 number.
# Take every 12 entries.
arord = rord[::12]
arord = pd.DataFrame(arord)
arord.columns = ["SP500D"]
nwindow(arord, 5, "SP500D", "05yrs")
nwindow(arord, 10, "SP500D", "10yrs")
nwindow(arord, 20, "SP500D", "20yrs")
nwindow(arord, 30, "SP500D", "30yrs")
arord[110:125]
arord.index = pd.to_datetime(arord.index)
make_plot(arord[[x for x in arord.columns if x!="SP500D"]],
"Historic annualized rate of return (rolling window) with reinvestment of dividends",
"Annualized rate of return (%)", "ror_ls_div")
for num_years in "30 20 10 05".split():
make_plot(arord["%syrs" % num_years],
"%syr window with dividends reinvested" % num_years,
"Annualized rate of return (%)", "ror_ls_div_%s" % num_years)
Let's say I don't reinvest dividends and I keep them as cash. This avoids the loss in downtimes, but prevents large gains in the uptimes.
nnraw = raw[["SP500", "Dividend"]]
nnraw["Dividend"] = nnraw["Dividend"]/12.
def nwindow_c(df, years, new_label):
"""
Compute the rolling rate of return over the last so many years.
"""
# Add the dividend entries
tmp = pd.rolling_sum(df["Dividend"], 12*years).shift(1)
tmpdf = df.copy()
tmpdf["Sum"] = tmpdf["SP500"]+tmp
ror = tmpdf["Sum"]/tmpdf["SP500"].shift(12*years)
ror = (np.power(ror, 1./years)-1) * 100
df[new_label] = roll(ror, -(12*years))
nwindow_c(nnraw, 5, "05yrs")
nwindow_c(nnraw, 10, "10yrs")
nwindow_c(nnraw, 20, "20yrs")
nwindow_c(nnraw, 30, "30yrs")
nnraw.index = pd.to_datetime(nnraw.index)
make_plot(nnraw[[x for x in nnraw.columns if x not in ("SP500", "Dividend")]],
"Historic annualized rate of return (rolling window): Dividends kept as cash",
"Annualized rate of return (%)", "ror_ls_cash")
for num_years in "30 20 10 05".split():
make_plot(nnraw["%syrs" % num_years],
"%syr window: Includes dividends kept as cash" % num_years,
"Annualized rate of return (%)", "ror_ls_cash_%s" % num_years)
Was it ever better to keep the money instead of reinvesting it? Below I plot the ratio of the rate of returns: Keeping vs Investing. If the number is below 1, then reinvesting was better.
columns = [x for x in arord.columns if x!="SP500D"]
ratio = (nnraw[columns]/arord[columns]).dropna()
for num_years in "30 20 10 05".split():
tmpdf = nnraw[["%syrs" % num_years]][::12]
tmpdf.columns = ["Cash: %syrs" % num_years]
tmpdf["Dividend: %syrs" % num_years] = arord["%syrs" % num_years]
make_plot(tmpdf,
"%syr window" % num_years,
"Hold vs Reinvest", "ror_ls_cash_div_%s" % num_years)
For a 20 or 30 year period, it seems one should always reinvest. Even in the other cases, there were very few times in recent history where keeping the cash was a better idea. And don't be fooled by the extremes. That's often merely because the denominator was near zero.
If you're suddenly endowed with a lot of money, these plots show the effective annual rate of return if you invest in an S&P 500 index fund (ignoring any fund fees, of course).
It is noteworthy that over a 30 year period, the effective annual rate of return never dips below 5%. In recent times it rarely dipped below 10%. If you have that many years to live, and if your mortgage interest rate is below 5%, then never once in the history of the S&P 500 was it a better idea to put money into the house. The catch is that you should keep the money in the fund and not withdraw when the economy goes south.
At the other extreme is the 5 year window. If you plan to put in money and withdraw after only 5 years, it often does dip below 5%.
The 10 year window had a negative rate of return only twice: During the Great Depression and during the recent recession. It occasionally drops below 5%, though.
Let's bring inflation into the picture.
inflation = pd.read_csv('inflation.csv', parse_dates=True, index_col=0)
inflation.head()
def avgrate_inflation(arr):
"""
Given an array of annual returns, compute the equivalent annual
rate of return across all those years. The value it returns is
expressed as a percentage, not a decimal.
"""
return (gmean(arr) - 1) * 100
def nwindow_inflation(df, inflation, years, src_label, new_label):
"""
Compute the rolling rate of return over the last so many years,
adjusted for inflation.
"""
tmpdf = (1+df[src_label]/100)#/(1+inflation/100)
tmpinf = (1+inflation["Inflation"]/100)
tmpdf = tmpdf/tmpinf
tmpdf = tmpdf.dropna()
ror = pd.rolling_apply(tmpdf, years, avgrate_inflation)
df[new_label] = pd.Series(roll(ror, -(years-1)), index=tmpdf.index)
arordi = rord[::12]
arordi = pd.DataFrame(arordi)
arordi.columns = ["SP500DI"]
arordi.index = pd.to_datetime(arordi.index)
nwindow_inflation(arordi, inflation, 5, "SP500DI", "05yrs")
nwindow_inflation(arordi, inflation, 10, "SP500DI", "10yrs")
nwindow_inflation(arordi, inflation, 20, "SP500DI", "20yrs")
nwindow_inflation(arordi, inflation, 30, "SP500DI", "30yrs")
arordi.tail(31)
inflation.plot()
The first 43 years are just NaN as I don't have inflation data for it.
arordi=arordi[43:]
make_plot(arordi[[x for x in arordi.columns if x not in ("SP500DI", "Dividend")]],
"Rate of Return with Dividends & Inflation Adjusted",
"Annualized rate of return with dividends and inflation (%)",
"ror_ls_div_inflation")
for num_years in "30 20 10 05".split():
make_plot(arordi["%syrs" % num_years],
"%syr window" % num_years,
"Annualized rate of return with dividends and inflation adjusted (%)",
"ror_ls_div_inflation_%s" % num_years)
Note that as I only have annual inflation data (as opposed to per month), the accuracy may be off a little. However, I suspect the true result will not be very far off.
Not many of us suddenly receive a large sum of money. More common is dollar cost averaging where we put in, say, an extra $100 towards our mortgage or into investments every month. Let's see what this method of investing gives us.
Given a set of annual rate of returns $r_{i}$, if I put in $1 each year, how much money will I have after all those years?
$$\sum_{k=1}^{k=n}\prod_{i=k}^{n}\left(1+r_{i}\right)$$The index $k$ is the year.
def periodicsum(arr):
"""
Given an array of annual interest rates, assume you put in $1
each year. How much money will you have after all those years?
"""
result = 0
# Note: I start indexing from 0 as the way I defined my
# rates, Jan 1, 1990 has the rate for 1990!
for x in xrange(0, len(arr/100)):
result += np.prod(1+arr[x:]/100)
return result
How do we define the effective annual rate of return in this case?
Basically, I want to know the value of $r$ that satisfies:
$\sum_{k=1}^{k=n}\left(1+r\right)^{k}=\sum_{k=1}^{k=n}\prod_{i=1}^{k}\left(1+r\right)=\sum_{k=1}^{k=n}\prod_{i=k}^{n}\left(1+r_{i}\right)$
In words: Assume the annual rate of return is $r$ every year. What value of $r$ will give me the same result?
def periodicrate(arr):
"""
With dollar cost averaging, what is the effective annual rate of
return?
"""
total = periodicsum(arr)
def objective(r):
tmp = 0
for k in xrange(1, len(arr)+1):
tmp += np.power(1+r, k)
return tmp - total
return fsolve(objective, 0.1) * 100
prord = rord[::12]
prord = pd.DataFrame(prord)
prord.columns = ["SP500PD"]
prord.index = pd.to_datetime(prord.index)
def nwindow_periodic(df, years, src_label, new_label):
ror = pd.rolling_apply(df[src_label], years, periodicrate)
df[new_label] = roll(ror, -(years-1))
nwindow_periodic(prord, 5, "SP500PD", "05yrs")
nwindow_periodic(prord, 10, "SP500PD", "10yrs")
nwindow_periodic(prord, 20, "SP500PD", "20yrs")
nwindow_periodic(prord, 30, "SP500PD", "30yrs")
make_plot(prord[[x for x in prord.columns if x!="SP500PD"]],
"Annualized rate of return (rolling window) (averaging)",
"Annualized rate of return (%)", "ror_dca")
for num_years in "30 20 10 05".split():
make_plot(prord["%syrs" % num_years],
"%syr window: Dollar Cost Averaging with Reinvestment of Dividends" % num_years,
"Annualized rate of return - Averaging (%)", "ror_dca_%s" % num_years)
Let's overlay this with lumped sum
for num_years in "30 20 10 05".split():
make_plot(prord["%syrs" % num_years],
"%syr window: Dollar cost averaging vs lumped sum" % num_years,
"Annualized rate of return - Averaging vs Lumped Sum(%)",
"ror_dca_vs_ls_%s" % num_years, save_fig=False)
make_plot(arord["%syrs" % num_years],
"%syr window" % num_years,
"Annualized rate of return - Averaging vs Lumped Sum (%)",
"ror_dca_vs_ls_%s" % num_years, False, True)
def nwindow_periodic_inflation(df, inflation, years, src_label, new_label):
"""
Compute the rolling rate of return over the last so many years,
adjusted for inflation.
"""
tmpdf = (1+df[src_label]/100)#/(1+inflation/100)
tmpinf = (1+inflation["Inflation"]/100)
tmpdf = tmpdf/tmpinf
tmpdf = (tmpdf - 1)*100 # Need this to convert 1.03 to 3%
tmpdf = tmpdf.dropna()
ror = pd.rolling_apply(tmpdf, years, periodicrate)
df[new_label] = pd.Series(roll(ror, -(years-1)), index=tmpdf.index)
prordi = rord[::12]
prordi = pd.DataFrame(prordi)
prordi.columns = ["SP500PDI"]
prordi.index = pd.to_datetime(prordi.index)
nwindow_periodic_inflation(prordi, inflation, 5, "SP500PDI", "05yrs")
nwindow_periodic_inflation(prordi, inflation, 10, "SP500PDI", "10yrs")
nwindow_periodic_inflation(prordi, inflation, 20, "SP500PDI", "20yrs")
nwindow_periodic_inflation(prordi, inflation, 30, "SP500PDI", "30yrs")
prordi = prordi[43:]
make_plot(prordi[[x for x in prordi.columns if x!="SP500PDI"]],
"Annualized rate of return (rolling window) (averaging) (with inflation)",
"Annualized rate of return (%)", "ror_dca_inf")
for num_years in "30 20 10 05".split():
make_plot(prordi["%syrs" % num_years],
"%syr window: Dollar Cost Averaging with Reinvestment of Dividends and Inflation Adjusted" % num_years,
"Annualized rate of return - Averaging (%)", "ror_dca_inf_%s" % num_years)
Arguably, the annual rates of return from year to year have poor correlation. So instead of looking at consecutive years, just treat the annual rates as a pool to draw from, and pick any 30 at random to calculate a 30 year effective rate of return.
mcrordi = rord[::12]
mcrordi = pd.DataFrame(mcrordi)
mcrordi.columns = ["SP500MCI"]
mcrordi.index = pd.to_datetime(mcrordi.index)
mcrordi = mcrordi.dropna()
def adjust_for_inflation(df, inflation, src_label):
"""
Take a dataframe with annual return values (not in decimal).
Take a series of inflation rates for each year (not in decimal).
Return the rates adjusted for inflation (not in decimal).
"""
tmpdf = (1+df[src_label]/100)
tmpinf = (1+inflation["Inflation"]/100)
tmpdf = tmpdf/tmpinf
tmpdf = (tmpdf - 1)*100 # Need this to convert 1.03 to 3%
tmpdf = tmpdf.dropna()
return tmpdf
mcrordi["SP500MCI"] = adjust_for_inflation(mcrordi, inflation, "SP500MCI")
mcrordi = mcrordi.dropna()
def mc_returns(df, num_years, num_trials, src_label):
rates = []
for x in xrange(num_trials):
rates.append(avgrate(df[src_label].sample(num_years, replace=True)))
return pd.Series(rates)
NUM_TRIALS = 20000
mcrordi_dist = pd.DataFrame(mc_returns(mcrordi, 5, NUM_TRIALS, "SP500MCI"))
mcrordi_dist.columns = ["05yrs"]
mcrordi_dist["10yrs"] = mc_returns(mcrordi, 10, NUM_TRIALS, "SP500MCI")
mcrordi_dist["20yrs"] = mc_returns(mcrordi, 20, NUM_TRIALS, "SP500MCI")
mcrordi_dist["30yrs"] = mc_returns(mcrordi, 30, NUM_TRIALS, "SP500MCI")
SCALE = 3
def make_hist(df, title, xlabel, filename, new_fig=True, xlim=None, ylim=None):
if new_fig:
f = plt.figure()
FONTSIZE = 15
ax = df.plot(kind='hist', bins=50, fontsize=FONTSIZE,
sort_columns=True, title=title, figsize=(4*SCALE,3*SCALE),
xlim=xlim, ylim=ylim)
plt.xlabel(xlabel)
ax.set_title(title, fontsize=FONTSIZE)
plt.legend(loc=0, prop={'size':FONTSIZE})
ax.xaxis.label.set_size(FONTSIZE)
ax.yaxis.label.set_size(FONTSIZE)
plt.grid(True)
fig = ax.get_figure()
fig.savefig("plots/%s.png" % filename)
make_hist(mcrordi_dist,
"Distribution of effective annual rate of returns (Dividends Invested) (Inflation Adjusted)",
"Annualized rate of return (%)", "ror_mc_ls_inf")
for num_years in "30 20 10 05".split():
make_hist(mcrordi_dist["%syrs" % num_years],
"%syr window: Dividends Invested and Inflation Adjusted" % num_years,
"Annualized rate of return (%)", "ror_mc_inf_%s" % num_years, xlim=(-30,50))
mcrordi_dist.describe()
These plots, more than anything, show the volatility of short term investing. Although they all have the same expected rate of return, the spread is more the shorter the investing interval.
mcrordidca = rord[::12]
mcrordidca = pd.DataFrame(mcrordidca)
mcrordidca.columns = ["SP500MCIDCA"]
mcrordidca.index = pd.to_datetime(mcrordidca.index)
mcrordidca = mcrordidca.dropna()
mcrordidca["SP500MCIDCA"] = adjust_for_inflation(mcrordidca, inflation, "SP500MCIDCA")
mcrordidca = mcrordidca.dropna()
def mc_returns_dca(df, num_years, num_trials, src_label):
rates = []
for x in xrange(num_trials):
rates.append(periodicrate(df[src_label].sample(num_years, replace=True))[0])
return pd.Series(rates)
NUM_TRIALS = 20000
mcrordidca_dist = pd.DataFrame(mc_returns_dca(mcrordidca, 5, NUM_TRIALS, "SP500MCIDCA"))
mcrordidca_dist.columns = ["05yrs"]
mcrordidca_dist["10yrs"] = mc_returns_dca(mcrordidca, 10, NUM_TRIALS, "SP500MCIDCA")
mcrordidca_dist["20yrs"] = mc_returns_dca(mcrordidca, 20, NUM_TRIALS, "SP500MCIDCA")
mcrordidca_dist["30yrs"] = mc_returns_dca(mcrordidca, 30, NUM_TRIALS, "SP500MCIDCA")
make_hist(mcrordidca_dist,
"Distribution of effective annual rate of returns (Dividends Invested) (Inflation Adjusted) (Dollar Cost Avg)",
"Annualized rate of return (%)", "ror_mc_dca_inf")
for num_years in "30 20 10 05".split():
make_hist(mcrordidca_dist["%syrs" % num_years],
"%syr window: Dividends Invested, Inflation Adjusted, and Dollar Cost Avg" % num_years,
"Annualized rate of return (%)", "ror_mc_dca_inf_%s" % num_years, xlim=(-30,50))
mcrordidca_dist.describe()