Questions home owners often ask themselves:
If I suddenly get $30,000, should I put it towards my mortgage payments, or should I invest it elsewhere?
After paying all my expenses, should I put all my extra money into the house or should I invest it?
People have strong beliefs about this. I decided to do a quantitative analysis, using the S&P 500 as the investment vehicle.
import pandas as pd
from scipy.stats.mstats import gmean
from numpy import roll
import numpy as np
import plotly.plotly as ply
import pylab as plt
import plotly.tools as tls
from scipy.optimize import fsolve
import seaborn as sn
%matplotlib inline
Ultimately, these are the Shiller data. http://www.econ.yale.edu/~shiller/data.htm
raw = pd.read_csv("data.csv", index_col=0)
raw.head()
The SP500 column has the value of the S&P 500.
snp500 = raw["SP500"]
I want to calculate the annualized rate of return, so I divide the following year's entry by today's entry.
aror_snp500 = (snp500.shift(-12)/snp500 - 1)*100
aror_snp500 = aror_snp500.dropna()
aror_snp500.tail(20)
The number for 2013-01-01 is the rate of return for the year beginning in 2013-01-01! Likewise for 2013-10-01, etc. This is an annualized return, but rolling for each month.
So for 2013, starting Jan 1st, the annual return was 23.0992%
# We only want the January periods.
ror_snp500s = aror_snp500[0::12]
ror_snp500s.tail()
A common question is: Over a 10 year period, what was the average annual rate of return?
Many people make the mistake of taking each year's rate of return, and computing the mean. This is plain wrong. If you gain 10% one year and lose 10% the next year, the mean tells you your equivalent rate of return is 0, which means you have the same amount of money as when you started. If you do the math, you'll find you actually lost 1%.
The correct formula is:
$$r=\left(\prod_{i=1}^{n}\left(1+r_{i}\right)\right)^{1/n}-1$$Over here we assume that $r_{i}$ is the annual rate of return for year $i$, and is expressed as a fraction. So 10% would be 0.1.
In words, this is just the geometric mean.
def avgrate(arr):
"""
Given an array of annual returns, compute the equivalent annual
rate of return across all those years. The value it returns is
expressed as a percentage, not a decimal.
"""
return (gmean(1 + arr/100) - 1) * 100
I want to add new columns in the data frame that have a rolling annualized average rate of return for a 5, 10, 20 and 30 year period.
def nwindow(df, years, src_label, new_label):
"""
Compute the rolling rate of return over the last so many years.
"""
ror = pd.rolling_apply(df[src_label], years, avgrate)
# Uncomment the line below to make it show the rate at the START
# of the investing period.
df[new_label] = roll(ror, -(years-1))
# Uncomment the line below to make it show the rate at the END
# of the investing period.
# df[new_label] = roll(ror, 1)
# Convert the series into a dataframe.
ror_snp500 = pd.DataFrame(ror_snp500s)
nwindow(ror_snp500, 5, "SP500", "05yrs")
nwindow(ror_snp500, 10, "SP500", "10yrs")
nwindow(ror_snp500, 20, "SP500", "20yrs")
nwindow(ror_snp500, 30, "SP500", "30yrs")
ror_snp500.tail(30)
So for 5 years beginning in 2009, the effective annual rate of return was 16.1%. For the 10 years beginning in 2004, it was only 4.87% per year.
ror_snp500.index = pd.to_datetime(ror_snp500.index)
SCALE = 3
def make_plot(df, title, ylabel, filename, new_fig=True, save_fig=True):
# sn.axes_style("darkgrid")
if new_fig:
f = plt.figure()
FONTSIZE = 15
ax = df.plot(fontsize=FONTSIZE,
sort_columns=True, title=title, figsize=(4*SCALE,3*SCALE))
plt.ylabel(ylabel)
ax.set_title(title, fontsize=FONTSIZE)
plt.legend(loc=0, prop={'size':FONTSIZE})
ax.yaxis.label.set_size(FONTSIZE)
plt.grid(True)
if save_fig:
fig = ax.get_figure()
fig.savefig("plots/%s.png" % filename)
make_plot(ror_snp500[[x for x in ror_snp500.columns if x!="SP500"]],
"Historic annualized rate of return (rolling window)",
"Annualized rate of return (%)", "ror_ls")
for num_years in "30 20 10 05".split():
make_plot(ror_snp500["%syrs" % num_years],
"%syr window" % num_years,
"Annualized rate of return (%)",
"ror_ls_%s" % num_years)
These charts look purely at the S&P 500 returns, and assume the dividends are not reinvested.
How should we account for the reinvestment of dividends? I used the approach in the link above where the dividends for each month are divided by 12, invested at the current S&P 500 "price", and then at the end of the year all these "shares" are added up and "sold" at the new price (Jan 1).
I'm assuming the Dividend column is the amount in Dollars one would receive if they owned 1 "share" of the S&P 500 fund?
nraw = raw[["SP500", "Dividend"]]
nraw["Dividend"] = nraw["Dividend"]/12.
nraw["Div Shares"] = nraw["Dividend"]/nraw["SP500"]
nraw["Div Sum"] = pd.rolling_sum(nraw["Div Shares"], 12).shift(1)
rord = (nraw.shift(-12)["SP500"]*(1+nraw.shift(-12)["Div Sum"])/nraw["SP500"]-1)*100
rord[1480:1500]
This means that in 1995, we got an effective rate of return of 35.4% if we include dividends. Look at the 1995-01-01 number.
# Take every 12 entries.
arord = rord[::12]
arord = pd.DataFrame(arord)
arord.columns = ["SP500D"]
nwindow(arord, 5, "SP500D", "05yrs")
nwindow(arord, 10, "SP500D", "10yrs")
nwindow(arord, 20, "SP500D", "20yrs")
nwindow(arord, 30, "SP500D", "30yrs")
arord[110:125]
arord.index = pd.to_datetime(arord.index)
make_plot(arord[[x for x in arord.columns if x!="SP500D"]],
"Historic annualized rate of return (rolling window) with reinvestment of dividends",
"Annualized rate of return (%)", "ror_ls_div")
for num_years in "30 20 10 05".split():
make_plot(arord["%syrs" % num_years],
"%syr window with dividends reinvested" % num_years,
"Annualized rate of return (%)", "ror_ls_div_%s" % num_years)
Let's say I don't reinvest dividends and I keep them as cash. This avoids the loss in downtimes, but prevents large gains in the uptimes.
nnraw = raw[["SP50