Pay Down Mortgage or Invest?

Questions home owners often ask themselves:

  1. If I suddenly get $30,000, should I put it towards my mortgage payments, or should I invest it elsewhere?

  2. After paying all my expenses, should I put all my extra money into the house or should I invest it?

People have strong beliefs about this. I decided to do a quantitative analysis, using the S&P 500 as the investment vehicle.

In [1]:
import pandas as pd
from scipy.stats.mstats import gmean
from numpy import roll
import numpy as np
import plotly.plotly as ply
import pylab as plt
import plotly.tools as tls
from scipy.optimize import fsolve
import seaborn as sn
In [2]:
%matplotlib inline

Data Sources

S & P 500

In [3]:
raw = pd.read_csv("data.csv", index_col=0)
In [4]:
raw.head()
Out[4]:
SP500 Dividend Earnings Consumer Price Index Long Interest Rate Real Price Real Dividend Real Earnings P/E10
Date
1871-01-01 4.44 0.26 0.4 12.46 5.32 83.78 4.91 7.55 NaN
1871-02-01 4.50 0.26 0.4 12.84 5.32 82.40 4.76 7.32 NaN
1871-03-01 4.61 0.26 0.4 13.03 5.33 83.18 4.69 7.22 NaN
1871-04-01 4.74 0.26 0.4 12.56 5.33 88.76 4.87 7.49 NaN
1871-05-01 4.86 0.26 0.4 12.27 5.33 93.13 4.98 7.66 NaN

The SP500 column has the value of the S&P 500.

In [5]:
snp500 = raw["SP500"]

I want to calculate the annualized rate of return, so I divide the following year's entry by today's entry.

In [6]:
aror_snp500 = (snp500.shift(-12)/snp500 - 1)*100
aror_snp500 = aror_snp500.dropna()
In [7]:
aror_snp500.tail(20)
Out[7]:
Date
2012-05-01    22.260246
2012-06-01    22.311633
2012-07-01    22.716910
2012-08-01    18.998896
2012-09-01    16.886977
2012-10-01    19.627631
2012-11-01    27.897254
2012-12-01    27.103474
2013-01-01    23.099162
2013-02-01    20.149969
2013-03-01    20.162752
2013-04-01    18.689756
2013-05-01    15.241121
2013-06-01    20.282066
2013-07-01    18.243162
2013-08-01    17.450557
2013-09-01    18.140436
2013-10-01    12.630012
2013-11-01    14.635500
2013-12-01    13.549768
Name: SP500, dtype: float64

The number for 2013-01-01 is the rate of return for the year beginning in 2013-01-01! Likewise for 2013-10-01, etc. This is an annualized return, but rolling for each month.

So for 2013, starting Jan 1st, the annual return was 23.0992%

In [8]:
# We only want the January periods.
ror_snp500s = aror_snp500[0::12]
In [9]:
ror_snp500s.tail()
Out[9]:
Date
2009-01-01    29.806604
2010-01-01    14.154755
2011-01-01     1.400259
2012-01-01    13.826139
2013-01-01    23.099162
Name: SP500, dtype: float64

Computing the Effective Annual Rate of Return Over Several Years

A common question is: Over a 10 year period, what was the average annual rate of return?

Many people make the mistake of taking each year's rate of return, and computing the mean. This is plain wrong. If you gain 10% one year and lose 10% the next year, the mean tells you your equivalent rate of return is 0, which means you have the same amount of money as when you started. If you do the math, you'll find you actually lost 1%.

The correct formula is:

$$r=\left(\prod_{i=1}^{n}\left(1+r_{i}\right)\right)^{1/n}-1$$

Over here we assume that $r_{i}$ is the annual rate of return for year $i$, and is expressed as a fraction. So 10% would be 0.1.

In words, this is just the geometric mean.

In [10]:
def avgrate(arr):
    """
    Given an array of annual returns, compute the equivalent annual 
    rate of return across all those years. The value it returns is 
    expressed as a percentage, not a decimal.
    """
    return (gmean(1 + arr/100) - 1) * 100

Computing the Rolling Annualized Rates of Returns

I want to add new columns in the data frame that have a rolling annualized average rate of return for a 5, 10, 20 and 30 year period.

In [11]:
def nwindow(df, years, src_label, new_label):
    """
    Compute the rolling rate of return over the last so many years.
    """
    ror = pd.rolling_apply(df[src_label], years, avgrate)
    # Uncomment the line below to make it show the rate at the START
    # of the investing period. 
    df[new_label] = roll(ror, -(years-1))
    # Uncomment the line below to make it show the rate at the END
    # of the investing period. 
    # df[new_label] = roll(ror, 1)
In [12]:
# Convert the series into a dataframe.
ror_snp500 = pd.DataFrame(ror_snp500s)
In [13]:
nwindow(ror_snp500, 5, "SP500", "05yrs")
nwindow(ror_snp500, 10, "SP500", "10yrs")
nwindow(ror_snp500, 20, "SP500", "20yrs")
nwindow(ror_snp500, 30, "SP500", "30yrs")
In [14]:
ror_snp500.tail(30)
Out[14]:
SP500 05yrs 10yrs 20yrs 30yrs
Date
1984-01-01 3.125000 11.393572 11.011984 10.063832 8.305213
1985-01-01 21.328671 14.652799 10.488457 10.127091 NaN
1986-01-01 27.041306 9.348136 11.429059 9.500198 NaN
1987-01-01 -5.293006 9.483893 11.222533 8.781904 NaN
1988-01-01 13.932136 11.681752 14.418984 8.901547 NaN
1989-01-01 19.120533 10.631703 15.905181 5.704302 NaN
1990-01-01 -4.259199 6.475370 15.413179 6.159330 NaN
1991-01-01 27.831884 13.549582 15.163499 7.097176 NaN
1992-01-01 4.602480 12.988782 10.606538 5.863930 NaN
1993-01-01 8.675873 17.223304 7.485817 6.312177 NaN
1994-01-01 -1.636398 21.430029 9.123778 6.976679 NaN
1995-01-01 32.062332 25.101251 9.766907 NaN NaN
1996-01-01 24.706227 16.800355 7.604727 NaN NaN
1997-01-01 25.728903 8.274521 6.394831 NaN NaN
1998-01-01 29.626516 -1.442797 3.650167 NaN NaN
1999-01-01 14.159533 -1.935304 -3.598792 NaN NaN
2000-01-01 -6.310370 -3.687823 -2.352543 NaN NaN
2001-01-01 -14.631298 -0.866934 -0.404163 NaN NaN
2002-01-01 -21.432017 4.547773 1.324676 NaN NaN
2003-01-01 26.419896 9.006312 5.151352 NaN NaN
2004-01-01 4.316922 -5.234062 4.871826 NaN NaN
2005-01-01 8.237614 -0.998752 NaN NaN NaN
2006-01-01 11.373003 0.060768 NaN NaN NaN
2007-01-01 -3.187844 -1.799057 NaN NaN NaN
2008-01-01 -37.220401 1.432722 NaN NaN NaN
2009-01-01 29.806604 16.055411 NaN NaN NaN
2010-01-01 14.154755 NaN NaN NaN NaN
2011-01-01 1.400259 NaN NaN NaN NaN
2012-01-01 13.826139 NaN NaN NaN NaN
2013-01-01 23.099162 NaN NaN NaN NaN

So for 5 years beginning in 2009, the effective annual rate of return was 16.1%. For the 10 years beginning in 2004, it was only 4.87% per year.

In [15]:
ror_snp500.index = pd.to_datetime(ror_snp500.index)
In [116]:
SCALE = 3
def make_plot(df, title, ylabel, filename, new_fig=True, save_fig=True):
    # sn.axes_style("darkgrid")
    if new_fig:
        f = plt.figure()
    FONTSIZE = 15
    ax = df.plot(fontsize=FONTSIZE,
      sort_columns=True, title=title, figsize=(4*SCALE,3*SCALE))
    plt.ylabel(ylabel)  
    ax.set_title(title, fontsize=FONTSIZE)
    plt.legend(loc=0, prop={'size':FONTSIZE})
    ax.yaxis.label.set_size(FONTSIZE)
    plt.grid(True)
    if save_fig:
        fig = ax.get_figure()
        fig.savefig("plots/%s.png" % filename)
In [79]:
make_plot(ror_snp500[[x for x in ror_snp500.columns if x!="SP500"]], 
          "Historic annualized rate of return (rolling window)",
          "Annualized rate of return (%)", "ror_ls")
<matplotlib.figure.Figure at 0x7f2beefc4fd0>
In [80]:
for num_years in "30 20 10 05".split():
    make_plot(ror_snp500["%syrs" % num_years], 
              "%syr window" % num_years,
              "Annualized rate of return (%)",
             "ror_ls_%s" % num_years)

These charts look purely at the S&P 500 returns, and assume the dividends are not reinvested.

Reinvest Dividends

How should we account for the reinvestment of dividends? I used the approach in the link above where the dividends for each month are divided by 12, invested at the current S&P 500 "price", and then at the end of the year all these "shares" are added up and "sold" at the new price (Jan 1).

I'm assuming the Dividend column is the amount in Dollars one would receive if they owned 1 "share" of the S&P 500 fund?

In [19]:
nraw = raw[["SP500", "Dividend"]]
nraw["Dividend"] = nraw["Dividend"]/12.
nraw["Div Shares"] = nraw["Dividend"]/nraw["SP500"]
nraw["Div Sum"] = pd.rolling_sum(nraw["Div Shares"], 12).shift(1)
rord = (nraw.shift(-12)["SP500"]*(1+nraw.shift(-12)["Div Sum"])/nraw["SP500"]-1)*100
/usr/lib64/python2.7/site-packages/IPython/kernel/__main__.py:2: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

/usr/lib64/python2.7/site-packages/IPython/kernel/__main__.py:3: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

/usr/lib64/python2.7/site-packages/IPython/kernel/__main__.py:4: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

In [20]:
rord[1480:1500]
Out[20]:
Date
1994-05-01    19.408109
1994-06-01    21.858827
1994-07-01    26.851684
1994-08-01    23.684119
1994-09-01    27.250015
1994-10-01    28.988608
1994-11-01    32.529444
1994-12-01    38.454788
1995-01-01    35.356233
1995-02-01    38.079163
1995-03-01    34.358346
1995-04-01    30.421610
1995-05-01    29.166120
1995-06-01    26.783260
1995-07-01    18.167716
1995-08-01    21.186385
1995-09-01    19.203382
1995-10-01    22.998648
1995-11-01    26.241075
1995-12-01    23.561270
dtype: float64

This means that in 1995, we got an effective rate of return of 35.4% if we include dividends. Look at the 1995-01-01 number.

In [21]:
# Take every 12 entries.
arord = rord[::12]
arord = pd.DataFrame(arord)
arord.columns = ["SP500D"]
In [22]:
nwindow(arord, 5, "SP500D", "05yrs")
nwindow(arord, 10, "SP500D", "10yrs")
nwindow(arord, 20, "SP500D", "20yrs")
nwindow(arord, 30, "SP500D", "30yrs")
In [23]:
arord[110:125]
Out[23]:
SP500D 05yrs 10yrs 20yrs 30yrs
Date
1981-01-01 -7.374496 14.583149 13.783636 15.725125 10.760281
1982-01-01 30.045597 22.886735 17.884676 15.330400 11.166137
1983-01-01 20.334368 16.019957 15.683760 12.550949 10.749651
1984-01-01 7.829149 15.547594 14.824409 12.925126 10.908009
1985-01-01 26.372237 18.623213 14.088416 12.830541 NaN
1986-01-01 31.417088 12.989702 14.874652 12.056537 NaN
1987-01-01 -2.448462 13.086223 14.517339 11.232317 NaN
1988-01-01 17.904591 15.348538 17.664846 11.288423 NaN
1989-01-01 22.964587 14.105750 18.959270 7.964258 NaN
1990-01-01 -0.917357 9.726979 18.223921 8.402917 NaN
1991-01-01 31.979363 16.791048 17.699742 9.278864 NaN
1992-01-01 7.707534 15.966566 12.831469 7.951851 NaN
1993-01-01 11.688355 20.027669 9.502978 8.362084 NaN
1994-01-01 1.113588 24.019236 11.057259 9.000190 NaN
1995-01-01 35.356233 27.378842 11.586535 NaN NaN
In [24]:
arord.index = pd.to_datetime(arord.index)
In [81]:
make_plot(arord[[x for x in arord.columns if x!="SP500D"]], 
          "Historic annualized rate of return (rolling window) with reinvestment of dividends",
          "Annualized rate of return (%)", "ror_ls_div")
<matplotlib.figure.Figure at 0x7f2bee0f1210>
In [82]:
for num_years in "30 20 10 05".split():
    make_plot(arord["%syrs" % num_years], 
              "%syr window with dividends reinvested" % num_years,
              "Annualized rate of return (%)", "ror_ls_div_%s" % num_years)

Keep Dividends as Cash

Let's say I don't reinvest dividends and I keep them as cash. This avoids the loss in downtimes, but prevents large gains in the uptimes.

In [27]:
nnraw = raw[["SP500", "Dividend"]]
nnraw["Dividend"] = nnraw["Dividend"]/12.
/usr/lib64/python2.7/site-packages/IPython/kernel/__main__.py:2: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

In [28]:
def nwindow_c(df, years, new_label):
    """
    Compute the rolling rate of return over the last so many years.
    """
    # Add the dividend entries
    tmp = pd.rolling_sum(df["Dividend"], 12*years).shift(1)
    tmpdf = df.copy()
    tmpdf["Sum"] = tmpdf["SP500"]+tmp
    ror = tmpdf["Sum"]/tmpdf["SP500"].shift(12*years)
    ror = (np.power(ror, 1./years)-1) * 100
    df[new_label] = roll(ror, -(12*years))

nwindow_c(nnraw, 5, "05yrs")
nwindow_c(nnraw, 10,  "10yrs")
nwindow_c(nnraw, 20,  "20yrs")
nwindow_c(nnraw, 30,  "30yrs")
/usr/lib64/python2.7/site-packages/IPython/kernel/__main__.py:11: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

In [29]:
nnraw.index = pd.to_datetime(nnraw.index)
In [83]:
make_plot(nnraw[[x for x in nnraw.columns if x not in ("SP500", "Dividend")]], 
          "Historic annualized rate of return (rolling window): Dividends kept as cash",
          "Annualized rate of return (%)", "ror_ls_cash")
<matplotlib.figure.Figure at 0x7f2bef22ba50>
In [84]:
for num_years in "30 20 10 05".split():
    make_plot(nnraw["%syrs" % num_years], 
              "%syr window: Includes dividends kept as cash" % num_years,
              "Annualized rate of return (%)", "ror_ls_cash_%s" % num_years)

Was it ever better to keep the money instead of reinvesting it? Below I plot the ratio of the rate of returns: Keeping vs Investing. If the number is below 1, then reinvesting was better.

In [98]:
columns = [x for x in arord.columns if x!="SP500D"]
ratio = (nnraw[columns]/arord[columns]).dropna()
for num_years in "30 20 10 05".split():
    tmpdf = nnraw[["%syrs" % num_years]][::12]
    tmpdf.columns = ["Cash: %syrs" % num_years]
    tmpdf["Dividend: %syrs" % num_years] = arord["%syrs" % num_years]
    make_plot(tmpdf, 
              "%syr window" % num_years,
              "Hold vs Reinvest", "ror_ls_cash_div_%s" % num_years)
<matplotlib.figure.Figure at 0x7f2be77afb50>
<matplotlib.figure.Figure at 0x7f2be7a55310>
<matplotlib.figure.Figure at 0x7f2be7452f50>
<matplotlib.figure.Figure at 0x7f2be736f110>

For a 20 or 30 year period, it seems one should always reinvest. Even in the other cases, there were very few times in recent history where keeping the cash was a better idea. And don't be fooled by the extremes. That's often merely because the denominator was near zero.

What Does This All Mean?

If you're suddenly endowed with a lot of money, these plots show the effective annual rate of return if you invest in an S&P 500 index fund (ignoring any fund fees, of course).

It is noteworthy that over a 30 year period, the effective annual rate of return never dips below 5%. In recent times it rarely dipped below 10%. If you have that many years to live, and if your mortgage interest rate is below 5%, then never once in the history of the S&P 500 was it a better idea to put money into the house. The catch is that you should keep the money in the fund and not withdraw when the economy goes south.

At the other extreme is the 5 year window. If you plan to put in money and withdraw after only 5 years, it often does dip below 5%.

The 10 year window had a negative rate of return only twice: During the Great Depression and during the recent recession. It occasionally drops below 5%, though.

Inflation

Let's bring inflation into the picture.

In [99]:
inflation = pd.read_csv('inflation.csv', parse_dates=True, index_col=0)
In [100]:
inflation.head()
Out[100]:
Inflation
Year
1914-01-01 1.35
1915-01-01 0.92
1916-01-01 7.64
1917-01-01 17.80
1918-01-01 17.26
In [101]:
def avgrate_inflation(arr):
    """
    Given an array of annual returns, compute the equivalent annual 
    rate of return across all those years. The value it returns is 
    expressed as a percentage, not a decimal.
    """
    return (gmean(arr) - 1) * 100
In [102]:
def nwindow_inflation(df, inflation, years, src_label, new_label):
    """
    Compute the rolling rate of return over the last so many years, 
    adjusted for inflation.
    """
    tmpdf = (1+df[src_label]/100)#/(1+inflation/100)
    tmpinf = (1+inflation["Inflation"]/100)
    tmpdf = tmpdf/tmpinf
    tmpdf = tmpdf.dropna()
    ror = pd.rolling_apply(tmpdf, years, avgrate_inflation)
    df[new_label] = pd.Series(roll(ror, -(years-1)), index=tmpdf.index)
In [103]:
arordi = rord[::12]
arordi = pd.DataFrame(arordi)
arordi.columns = ["SP500DI"]
arordi.index = pd.to_datetime(arordi.index)
In [104]:
nwindow_inflation(arordi, inflation, 5, "SP500DI", "05yrs")
nwindow_inflation(arordi, inflation, 10, "SP500DI", "10yrs")
nwindow_inflation(arordi, inflation, 20, "SP500DI", "20yrs")
nwindow_inflation(arordi, inflation, 30, "SP500DI", "30yrs")
In [105]:
arordi.tail(31)
Out[105]:
SP500DI 05yrs 10yrs 20yrs 30yrs
1984-01-01 7.829149 11.643891 10.629747 9.510716 7.807358
1985-01-01 26.372237 14.499472 10.100354 9.504667 NaN
1986-01-01 31.417088 8.678293 10.938643 8.761887 NaN
1987-01-01 -2.448462 8.278391 10.483486 7.891930 NaN
1988-01-01 17.904591 10.579262 13.665700 7.988699 NaN
1989-01-01 22.964587 9.624814 15.199269 4.774685 NaN
1990-01-01 -0.917357 5.870251 14.779549 5.466750 NaN
1991-01-01 31.979363 13.246004 14.490895 6.511746 NaN
1992-01-01 7.707534 12.733488 9.905976 5.273646 NaN
1993-01-01 11.688355 16.838286 6.814022 5.723175 NaN
1994-01-01 1.113588 21.057187 8.403004 6.423281 NaN
1995-01-01 35.356233 24.438590 8.912202 NaN NaN
1996-01-01 27.386153 15.749471 6.627842 NaN NaN
1997-01-01 27.934878 7.149382 5.361162 NaN NaN
1998-01-01 31.536502 -2.350200 2.595233 NaN NaN
1999-01-01 15.571500 -2.928429 -4.706561 NaN NaN
2000-01-01 -5.223845 -4.676935 -3.090442 NaN NaN
2001-01-01 -13.492580 -1.774958 -0.911317 NaN NaN
2002-01-01 -20.163230 3.602786 0.836559 NaN NaN
2003-01-01 28.593233 7.791126 4.643469 NaN NaN
2004-01-01 6.037115 -6.452121 4.479713 NaN NaN
2005-01-01 10.118992 -1.477545 NaN NaN NaN
2006-01-01 13.377683 -0.040083 NaN NaN NaN
2007-01-01 -1.459636 -1.855808 NaN NaN NaN
2008-01-01 -35.711835 1.587727 NaN NaN NaN
2009-01-01 33.351939 16.689022 NaN NaN NaN
2010-01-01 16.386244 NaN NaN NaN NaN
2011-01-01 3.367617 NaN NaN NaN NaN
2012-01-01 16.196003 NaN NaN NaN NaN
2013-01-01 25.604721 NaN NaN NaN NaN
2014-01-01 NaN NaN NaN NaN NaN
In [106]:
inflation.plot()
Out[106]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f2bec2a2590>

The first 43 years are just NaN as I don't have inflation data for it.

In [107]:
arordi=arordi[43:]
In [108]:
make_plot(arordi[[x for x in arordi.columns if x not in ("SP500DI", "Dividend")]], 
          "Rate of Return with Dividends & Inflation Adjusted",
          "Annualized rate of return with dividends and inflation (%)",
         "ror_ls_div_inflation")
for num_years in "30 20 10 05".split():
    make_plot(arordi["%syrs" % num_years], 
              "%syr window" % num_years,
              "Annualized rate of return with dividends and inflation adjusted (%)",
             "ror_ls_div_inflation_%s" % num_years)
<matplotlib.figure.Figure at 0x7f2be70c2690>

Note that as I only have annual inflation data (as opposed to per month), the accuracy may be off a little. However, I suspect the true result will not be very far off.

Dollar Cost Averaging

Not many of us suddenly receive a large sum of money. More common is dollar cost averaging where we put in, say, an extra $100 towards our mortgage or into investments every month. Let's see what this method of investing gives us.

Given a set of annual rate of returns $r_{i}$, if I put in $1 each year, how much money will I have after all those years?

$$\sum_{k=1}^{k=n}\prod_{i=k}^{n}\left(1+r_{i}\right)$$

The index $k$ is the year.

Without Inflation

In [109]:
def periodicsum(arr):
    """
    Given an array of annual interest rates, assume you put in $1 
    each year. How much money will you have after all those years?
    """
    result = 0
    # Note: I start indexing from 0 as the way I defined my 
    # rates, Jan 1, 1990 has the rate for 1990!
    for x in xrange(0, len(arr/100)):
        result += np.prod(1+arr[x:]/100)
    return result

How do we define the effective annual rate of return in this case?

Basically, I want to know the value of $r$ that satisfies:

$\sum_{k=1}^{k=n}\left(1+r\right)^{k}=\sum_{k=1}^{k=n}\prod_{i=1}^{k}\left(1+r\right)=\sum_{k=1}^{k=n}\prod_{i=k}^{n}\left(1+r_{i}\right)$

In words: Assume the annual rate of return is $r$ every year. What value of $r$ will give me the same result?

In [110]:
def periodicrate(arr):
    """
    With dollar cost averaging, what is the effective annual rate of 
    return?
    """
    total = periodicsum(arr)
    def objective(r):
        tmp = 0
        for k in xrange(1, len(arr)+1):
            tmp += np.power(1+r, k)
        return tmp - total
    return fsolve(objective, 0.1) * 100
In [111]:
prord = rord[::12]
prord = pd.DataFrame(prord)
prord.columns = ["SP500PD"]
prord.index = pd.to_datetime(prord.index)
In [112]:
def nwindow_periodic(df, years, src_label, new_label):
    ror = pd.rolling_apply(df[src_label], years, periodicrate)
    df[new_label] = roll(ror, -(years-1))
nwindow_periodic(prord, 5, "SP500PD", "05yrs")
nwindow_periodic(prord, 10, "SP500PD", "10yrs")
nwindow_periodic(prord, 20, "SP500PD", "20yrs")
nwindow_periodic(prord, 30, "SP500PD", "30yrs")
In [113]:
make_plot(prord[[x for x in prord.columns if x!="SP500PD"]], 
          "Annualized rate of return (rolling window) (averaging)",
          "Annualized rate of return (%)", "ror_dca")
<matplotlib.figure.Figure at 0x7f2bed4e4e10>
In [114]:
for num_years in "30 20 10 05".split():
    make_plot(prord["%syrs" % num_years], 
              "%syr window: Dollar Cost Averaging with Reinvestment of Dividends" % num_years,
              "Annualized rate of return - Averaging (%)", "ror_dca_%s" % num_years)

Let's overlay this with lumped sum

In [119]:
for num_years in "30 20 10 05".split():
    make_plot(prord["%syrs" % num_years], 
              "%syr window: Dollar cost averaging vs lumped sum" % num_years,
              "Annualized rate of return - Averaging vs Lumped Sum(%)",
             "ror_dca_vs_ls_%s" % num_years, save_fig=False)
    make_plot(arord["%syrs" % num_years], 
              "%syr window" % num_years,
              "Annualized rate of return - Averaging vs Lumped Sum (%)", 
             "ror_dca_vs_ls_%s" % num_years, False, True)

With Inflation

In [120]:
def nwindow_periodic_inflation(df, inflation, years, src_label, new_label):
    """
    Compute the rolling rate of return over the last so many years, 
    adjusted for inflation.
    """
    tmpdf = (1+df[src_label]/100)#/(1+inflation/100)
    tmpinf = (1+inflation["Inflation"]/100)
    tmpdf = tmpdf/tmpinf
    tmpdf = (tmpdf - 1)*100 # Need this to convert 1.03 to 3%
    tmpdf = tmpdf.dropna()
    ror = pd.rolling_apply(tmpdf, years, periodicrate)
    df[new_label] = pd.Series(roll(ror, -(years-1)), index=tmpdf.index)
In [121]:
prordi = rord[::12]
prordi = pd.DataFrame(prordi)
prordi.columns = ["SP500PDI"]
prordi.index = pd.to_datetime(prordi.index)
In [122]:
nwindow_periodic_inflation(prordi, inflation, 5, "SP500PDI", "05yrs")
nwindow_periodic_inflation(prordi, inflation, 10, "SP500PDI", "10yrs")
nwindow_periodic_inflation(prordi, inflation, 20, "SP500PDI", "20yrs")
nwindow_periodic_inflation(prordi, inflation, 30, "SP500PDI", "30yrs")
In [123]:
prordi = prordi[43:]
In [124]:
make_plot(prordi[[x for x in prordi.columns if x!="SP500PDI"]], 
          "Annualized rate of return (rolling window) (averaging) (with inflation)",
          "Annualized rate of return (%)", "ror_dca_inf")
<matplotlib.figure.Figure at 0x7f2be63df150>
In [125]:
for num_years in "30 20 10 05".split():
    make_plot(prordi["%syrs" % num_years], 
              "%syr window: Dollar Cost Averaging with Reinvestment of Dividends and Inflation Adjusted" % num_years,
              "Annualized rate of return - Averaging (%)", "ror_dca_inf_%s" % num_years)

Monte Carlo Simulations

Arguably, the annual rates of return from year to year have poor correlation. So instead of looking at consecutive years, just treat the annual rates as a pool to draw from, and pick any 30 at random to calculate a 30 year effective rate of return.

Lump Sum with Inflation Adjustments

In [57]:
mcrordi = rord[::12]
mcrordi = pd.DataFrame(mcrordi)
mcrordi.columns = ["SP500MCI"]
mcrordi.index = pd.to_datetime(mcrordi.index)
mcrordi = mcrordi.dropna()
In [58]:
def adjust_for_inflation(df, inflation, src_label):
    """
    Take a dataframe with annual return values (not in decimal).
    Take a series of inflation rates for each year (not in decimal).
    Return the rates adjusted for inflation (not in decimal).
    """
    tmpdf = (1+df[src_label]/100)
    tmpinf = (1+inflation["Inflation"]/100)
    tmpdf = tmpdf/tmpinf
    tmpdf = (tmpdf - 1)*100 # Need this to convert 1.03 to 3%
    tmpdf = tmpdf.dropna()
    return tmpdf
mcrordi["SP500MCI"] = adjust_for_inflation(mcrordi, inflation, "SP500MCI")
In [59]:
mcrordi = mcrordi.dropna()
In [60]:
def mc_returns(df, num_years, num_trials, src_label):
    rates = []
    for x in xrange(num_trials):
        rates.append(avgrate(df[src_label].sample(num_years, replace=True)))
    return pd.Series(rates)
In [61]:
NUM_TRIALS = 20000
mcrordi_dist = pd.DataFrame(mc_returns(mcrordi, 5, NUM_TRIALS, "SP500MCI"))
mcrordi_dist.columns = ["05yrs"]
mcrordi_dist["10yrs"] = mc_returns(mcrordi, 10, NUM_TRIALS, "SP500MCI")
mcrordi_dist["20yrs"] = mc_returns(mcrordi, 20, NUM_TRIALS, "SP500MCI")
mcrordi_dist["30yrs"] = mc_returns(mcrordi, 30, NUM_TRIALS, "SP500MCI")
In [126]:
SCALE = 3
def make_hist(df, title, xlabel, filename, new_fig=True, xlim=None, ylim=None):
    if new_fig:
        f = plt.figure()
    FONTSIZE = 15
    ax = df.plot(kind='hist', bins=50, fontsize=FONTSIZE,
      sort_columns=True, title=title, figsize=(4*SCALE,3*SCALE),
                xlim=xlim, ylim=ylim)
    plt.xlabel(xlabel)  
    ax.set_title(title, fontsize=FONTSIZE)
    plt.legend(loc=0, prop={'size':FONTSIZE})
    ax.xaxis.label.set_size(FONTSIZE)
    ax.yaxis.label.set_size(FONTSIZE)
    plt.grid(True)
    fig = ax.get_figure()
    fig.savefig("plots/%s.png" % filename)
In [128]:
make_hist(mcrordi_dist,
         "Distribution of effective annual rate of returns (Dividends Invested) (Inflation Adjusted)",
         "Annualized rate of return (%)", "ror_mc_ls_inf")
<matplotlib.figure.Figure at 0x7f2bed46cb50>
In [129]:
for num_years in "30 20 10 05".split():
    make_hist(mcrordi_dist["%syrs" % num_years], 
              "%syr window: Dividends Invested and Inflation Adjusted" % num_years,
              "Annualized rate of return (%)", "ror_mc_inf_%s" % num_years, xlim=(-30,50))
In [65]:
mcrordi_dist.describe()
Out[65]:
05yrs 10yrs 20yrs 30yrs
count 20000.000000 20000.000000 20000.000000 20000.000000
mean 6.853995 6.660871 6.589037 6.544952
std 9.175585 6.467511 4.586195 3.728357
min -24.743745 -19.642585 -9.839428 -8.223338
25% 0.617879 2.274127 3.488647 4.000163
50% 6.987468 6.693657 6.656587 6.584169
75% 13.112201 11.083349 9.717398 9.049604
max 50.751993 29.903719 26.922536 21.116198

These plots, more than anything, show the volatility of short term investing. Although they all have the same expected rate of return, the spread is more the shorter the investing interval.

Dollar Cost Averaging, Inflation Adjusted

In [66]:
mcrordidca = rord[::12]
mcrordidca = pd.DataFrame(mcrordidca)
mcrordidca.columns = ["SP500MCIDCA"]
mcrordidca.index = pd.to_datetime(mcrordidca.index)
mcrordidca = mcrordidca.dropna()
In [67]:
mcrordidca["SP500MCIDCA"] = adjust_for_inflation(mcrordidca, inflation, "SP500MCIDCA")
In [68]:
mcrordidca = mcrordidca.dropna()
In [69]:
def mc_returns_dca(df, num_years, num_trials, src_label):
    rates = []
    for x in xrange(num_trials):
        rates.append(periodicrate(df[src_label].sample(num_years, replace=True))[0])
    return pd.Series(rates)
In [70]:
NUM_TRIALS = 20000
mcrordidca_dist = pd.DataFrame(mc_returns_dca(mcrordidca, 5, NUM_TRIALS, "SP500MCIDCA"))
mcrordidca_dist.columns = ["05yrs"]
mcrordidca_dist["10yrs"] = mc_returns_dca(mcrordidca, 10, NUM_TRIALS, "SP500MCIDCA")
mcrordidca_dist["20yrs"] = mc_returns_dca(mcrordidca, 20, NUM_TRIALS, "SP500MCIDCA")
mcrordidca_dist["30yrs"] = mc_returns_dca(mcrordidca, 30, NUM_TRIALS, "SP500MCIDCA")
In [130]:
make_hist(mcrordidca_dist,
         "Distribution of effective annual rate of returns (Dividends Invested) (Inflation Adjusted) (Dollar Cost Avg)",
         "Annualized rate of return (%)", "ror_mc_dca_inf")
<matplotlib.figure.Figure at 0x7f2be4a3ef90>
In [131]:
for num_years in "30 20 10 05".split():
    make_hist(mcrordidca_dist["%syrs" % num_years], 
              "%syr window: Dividends Invested, Inflation Adjusted, and Dollar Cost Avg" % num_years,
              "Annualized rate of return (%)", "ror_mc_dca_inf_%s" % num_years, xlim=(-30,50))
In [73]:
mcrordidca_dist.describe()
Out[73]:
05yrs 10yrs 20yrs 30yrs
count 20000.000000 20000.000000 20000.000000 20000.000000
mean 7.232822 6.891985 6.818483 6.824882
std 9.910812 7.197388 5.022567 4.048870
min -31.020148 -24.041342 -15.301114 -11.703804
25% 0.666543 2.212749 3.548100 4.161366
50% 7.500898 7.092738 6.909313 6.883050
75% 14.004130 11.751754 10.256557 9.631896
max 45.169947 31.498749 24.925648 21.015151