Pay Down Mortgage or Invest?

Questions home owners often ask themselves:

  1. If I suddenly get $30,000, should I put it towards my mortgage payments, or should I invest it elsewhere?

  2. After paying all my expenses, should I put all my extra money into the house or should I invest it?

People have strong beliefs about this. I decided to do a quantitative analysis, using the S&P 500 as the investment vehicle.

In [1]:
import pandas as pd
from scipy.stats.mstats import gmean
from numpy import roll
import numpy as np
import plotly.plotly as ply
import pylab as plt
import plotly.tools as tls
from scipy.optimize import fsolve
import seaborn as sn
In [2]:
%matplotlib inline

Data Sources

S & P 500

In [3]:
raw = pd.read_csv("data.csv", index_col=0)
In [4]:
raw.head()
Out[4]:
SP500 Dividend Earnings Consumer Price Index Long Interest Rate Real Price Real Dividend Real Earnings P/E10
Date
1871-01-01 4.44 0.26 0.4 12.46 5.32 83.78 4.91 7.55 NaN
1871-02-01 4.50 0.26 0.4 12.84 5.32 82.40 4.76 7.32 NaN
1871-03-01 4.61 0.26 0.4 13.03 5.33 83.18 4.69 7.22 NaN
1871-04-01 4.74 0.26 0.4 12.56 5.33 88.76 4.87 7.49 NaN
1871-05-01 4.86 0.26 0.4 12.27 5.33 93.13 4.98 7.66 NaN

The SP500 column has the value of the S&P 500.

In [5]:
snp500 = raw["SP500"]

I want to calculate the annualized rate of return, so I divide the following year's entry by today's entry.

In [6]:
aror_snp500 = (snp500.shift(-12)/snp500 - 1)*100
aror_snp500 = aror_snp500.dropna()
In [7]:
aror_snp500.tail(20)
Out[7]:
Date
2012-05-01    22.260246
2012-06-01    22.311633
2012-07-01    22.716910
2012-08-01    18.998896
2012-09-01    16.886977
2012-10-01    19.627631
2012-11-01    27.897254
2012-12-01    27.103474
2013-01-01    23.099162
2013-02-01    20.149969
2013-03-01    20.162752
2013-04-01    18.689756
2013-05-01    15.241121
2013-06-01    20.282066
2013-07-01    18.243162
2013-08-01    17.450557
2013-09-01    18.140436
2013-10-01    12.630012
2013-11-01    14.635500
2013-12-01    13.549768
Name: SP500, dtype: float64

The number for 2013-01-01 is the rate of return for the year beginning in 2013-01-01! Likewise for 2013-10-01, etc. This is an annualized return, but rolling for each month.

So for 2013, starting Jan 1st, the annual return was 23.0992%

In [8]:
# We only want the January periods.
ror_snp500s = aror_snp500[0::12]
In [9]:
ror_snp500s.tail()
Out[9]:
Date
2009-01-01    29.806604
2010-01-01    14.154755
2011-01-01     1.400259
2012-01-01    13.826139
2013-01-01    23.099162
Name: SP500, dtype: float64

Computing the Effective Annual Rate of Return Over Several Years

A common question is: Over a 10 year period, what was the average annual rate of return?

Many people make the mistake of taking each year's rate of return, and computing the mean. This is plain wrong. If you gain 10% one year and lose 10% the next year, the mean tells you your equivalent rate of return is 0, which means you have the same amount of money as when you started. If you do the math, you'll find you actually lost 1%.

The correct formula is:

$$r=\left(\prod_{i=1}^{n}\left(1+r_{i}\right)\right)^{1/n}-1$$

Over here we assume that $r_{i}$ is the annual rate of return for year $i$, and is expressed as a fraction. So 10% would be 0.1.

In words, this is just the geometric mean.

In [10]:
def avgrate(arr):
    """
    Given an array of annual returns, compute the equivalent annual 
    rate of return across all those years. The value it returns is 
    expressed as a percentage, not a decimal.
    """
    return (gmean(1 + arr/100) - 1) * 100

Computing the Rolling Annualized Rates of Returns

I want to add new columns in the data frame that have a rolling annualized average rate of return for a 5, 10, 20 and 30 year period.

In [11]:
def nwindow(df, years, src_label, new_label):
    """
    Compute the rolling rate of return over the last so many years.
    """
    ror = pd.rolling_apply(df[src_label], years, avgrate)
    # Uncomment the line below to make it show the rate at the START
    # of the investing period. 
    df[new_label] = roll(ror, -(years-1))
    # Uncomment the line below to make it show the rate at the END
    # of the investing period. 
    # df[new_label] = roll(ror, 1)
In [12]:
# Convert the series into a dataframe.
ror_snp500 = pd.DataFrame(ror_snp500s)
In [13]:
nwindow(ror_snp500, 5, "SP500", "05yrs")
nwindow(ror_snp500, 10, "SP500", "10yrs")
nwindow(ror_snp500, 20, "SP500", "20yrs")
nwindow(ror_snp500, 30, "SP500", "30yrs")
In [14]:
ror_snp500.tail(30)
Out[14]:
SP500 05yrs 10yrs 20yrs 30yrs
Date
1984-01-01 3.125000 11.393572 11.011984 10.063832 8.305213
1985-01-01 21.328671 14.652799 10.488457 10.127091 NaN
1986-01-01 27.041306 9.348136 11.429059 9.500198 NaN
1987-01-01 -5.293006 9.483893 11.222533 8.781904 NaN
1988-01-01 13.932136 11.681752 14.418984 8.901547 NaN
1989-01-01 19.120533 10.631703 15.905181 5.704302 NaN
1990-01-01 -4.259199 6.475370 15.413179 6.159330 NaN
1991-01-01 27.831884 13.549582 15.163499 7.097176 NaN
1992-01-01 4.602480 12.988782 10.606538 5.863930 NaN
1993-01-01 8.675873 17.223304 7.485817 6.312177 NaN
1994-01-01 -1.636398 21.430029 9.123778 6.976679 NaN
1995-01-01 32.062332 25.101251 9.766907 NaN NaN
1996-01-01 24.706227 16.800355 7.604727 NaN NaN
1997-01-01 25.728903 8.274521 6.394831 NaN NaN
1998-01-01 29.626516 -1.442797 3.650167 NaN NaN
1999-01-01 14.159533 -1.935304 -3.598792 NaN NaN
2000-01-01 -6.310370 -3.687823 -2.352543 NaN NaN
2001-01-01 -14.631298 -0.866934 -0.404163 NaN NaN
2002-01-01 -21.432017 4.547773 1.324676 NaN NaN
2003-01-01 26.419896 9.006312 5.151352 NaN NaN
2004-01-01 4.316922 -5.234062 4.871826 NaN NaN
2005-01-01 8.237614 -0.998752 NaN NaN NaN
2006-01-01 11.373003 0.060768 NaN NaN NaN
2007-01-01 -3.187844 -1.799057 NaN NaN NaN
2008-01-01 -37.220401 1.432722 NaN NaN NaN
2009-01-01 29.806604 16.055411 NaN NaN NaN
2010-01-01 14.154755 NaN NaN NaN NaN
2011-01-01 1.400259 NaN NaN NaN NaN
2012-01-01 13.826139 NaN NaN NaN NaN
2013-01-01 23.099162 NaN NaN NaN NaN

So for 5 years beginning in 2009, the effective annual rate of return was 16.1%. For the 10 years beginning in 2004, it was only 4.87% per year.

In [15]:
ror_snp500.index = pd.to_datetime(ror_snp500.index)
In [116]:
SCALE = 3
def make_plot(df, title, ylabel, filename, new_fig=True, save_fig=True):
    # sn.axes_style("darkgrid")
    if new_fig:
        f = plt.figure()
    FONTSIZE = 15
    ax = df.plot(fontsize=FONTSIZE,
      sort_columns=True, title=title, figsize=(4*SCALE,3*SCALE))
    plt.ylabel(ylabel)  
    ax.set_title(title, fontsize=FONTSIZE)
    plt.legend(loc=0, prop={'size':FONTSIZE})
    ax.yaxis.label.set_size(FONTSIZE)
    plt.grid(True)
    if save_fig:
        fig = ax.get_figure()
        fig.savefig("plots/%s.png" % filename)
In [79]:
make_plot(ror_snp500[[x for x in ror_snp500.columns if x!="SP500"]], 
          "Historic annualized rate of return (rolling window)",
          "Annualized rate of return (%)", "ror_ls")
<matplotlib.figure.Figure at 0x7f2beefc4fd0>
In [80]:
for num_years in "30 20 10 05".split():
    make_plot(ror_snp500["%syrs" % num_years], 
              "%syr window" % num_years,
              "Annualized rate of return (%)",
             "ror_ls_%s" % num_years)

These charts look purely at the S&P 500 returns, and assume the dividends are not reinvested.

Reinvest Dividends

How should we account for the reinvestment of dividends? I used the approach in the link above where the dividends for each month are divided by 12, invested at the current S&P 500 "price", and then at the end of the year all these "shares" are added up and "sold" at the new price (Jan 1).

I'm assuming the Dividend column is the amount in Dollars one would receive if they owned 1 "share" of the S&P 500 fund?

In [19]:
nraw = raw[["SP500", "Dividend"]]
nraw["Dividend"] = nraw["Dividend"]/12.
nraw["Div Shares"] = nraw["Dividend"]/nraw["SP500"]
nraw["Div Sum"] = pd.rolling_sum(nraw["Div Shares"], 12).shift(1)
rord = (nraw.shift(-12)["SP500"]*(1+nraw.shift(-12)["Div Sum"])/nraw["SP500"]-1)*100
/usr/lib64/python2.7/site-packages/IPython/kernel/__main__.py:2: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

/usr/lib64/python2.7/site-packages/IPython/kernel/__main__.py:3: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

/usr/lib64/python2.7/site-packages/IPython/kernel/__main__.py:4: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

In [20]:
rord[1480:1500]
Out[20]:
Date
1994-05-01    19.408109
1994-06-01    21.858827
1994-07-01    26.851684
1994-08-01    23.684119
1994-09-01    27.250015
1994-10-01    28.988608
1994-11-01    32.529444
1994-12-01    38.454788
1995-01-01    35.356233
1995-02-01    38.079163
1995-03-01    34.358346
1995-04-01    30.421610
1995-05-01    29.166120
1995-06-01    26.783260
1995-07-01    18.167716
1995-08-01    21.186385
1995-09-01    19.203382
1995-10-01    22.998648
1995-11-01    26.241075
1995-12-01    23.561270
dtype: float64

This means that in 1995, we got an effective rate of return of 35.4% if we include dividends. Look at the 1995-01-01 number.

In [21]:
# Take every 12 entries.
arord = rord[::12]
arord = pd.DataFrame(arord)
arord.columns = ["SP500D"]
In [22]:
nwindow(arord, 5, "SP500D", "05yrs")
nwindow(arord, 10, "SP500D", "10yrs")
nwindow(arord, 20, "SP500D", "20yrs")
nwindow(arord, 30, "SP500D", "30yrs")
In [23]:
arord[110:125]
Out[23]:
SP500D 05yrs 10yrs 20yrs 30yrs
Date
1981-01-01 -7.374496 14.583149 13.783636 15.725125 10.760281
1982-01-01 30.045597 22.886735 17.884676 15.330400 11.166137
1983-01-01 20.334368 16.019957 15.683760 12.550949 10.749651
1984-01-01 7.829149 15.547594 14.824409 12.925126 10.908009
1985-01-01 26.372237 18.623213 14.088416 12.830541 NaN
1986-01-01 31.417088 12.989702 14.874652 12.056537 NaN
1987-01-01 -2.448462 13.086223 14.517339 11.232317 NaN
1988-01-01 17.904591 15.348538 17.664846 11.288423 NaN
1989-01-01 22.964587 14.105750 18.959270 7.964258 NaN
1990-01-01 -0.917357 9.726979 18.223921 8.402917 NaN
1991-01-01 31.979363 16.791048 17.699742 9.278864 NaN
1992-01-01 7.707534 15.966566 12.831469 7.951851 NaN
1993-01-01 11.688355 20.027669 9.502978 8.362084 NaN
1994-01-01 1.113588 24.019236 11.057259 9.000190 NaN
1995-01-01 35.356233 27.378842 11.586535 NaN NaN
In [24]:
arord.index = pd.to_datetime(arord.index)
In [81]:
make_plot(arord[[x for x in arord.columns if x!="SP500D"]], 
          "Historic annualized rate of return (rolling window) with reinvestment of dividends",
          "Annualized rate of return (%)", "ror_ls_div")
<matplotlib.figure.Figure at 0x7f2bee0f1210>
In [82]:
for num_years in "30 20 10 05".split():
    make_plot(arord["%syrs" % num_years], 
              "%syr window with dividends reinvested" % num_years,
              "Annualized rate of return (%)", "ror_ls_div_%s" % num_years)

Keep Dividends as Cash

Let's say I don't reinvest dividends and I keep them as cash. This avoids the loss in downtimes, but prevents large gains in the uptimes.

In [27]:
nnraw = raw[["SP50