By Eugene F. Fama and Kenneth R. French
Our paper, "Luck versus Skill in the Cross Section of Mutual Fund Returns," examines the performance during 1984-2006 of actively managed US mutual funds that invest primarily in US equities. It is an academic paper with lots of technical detail. The purpose of this white paper is to provide a summary of the results that are relevant for investors. We begin by examining the overall α for aggregate wealth invested in actively managed mutual funds. We then turn to the performance of individual funds.
Aggregate Returns for US Equity Mutual Funds
Suppose we form a portfolio of actively managed US equity mutual funds, where each fund is weighted every month by assets under management. This is the aggregate portfolio of wealth invested in active mutual funds, and its performance is the aggregate experience of fund investors. For 1984-2006, the α (abnormal return) of the portfolio, measured with our three-factor model, is ‑0.81% per year. This α estimate is 2.50 standard errors below zero, which is rather strong evidence that active mutual funds as a whole provide returns to investors below those of an equivalent portfolio of the three passive benchmarks of the FF model.
The high management fees and expenses of active funds lower their returns. If we measure fund returns before fees and expenses - in other words, if we add back each fund's expense ratio - the α estimate for the aggregate fund portfolio rises to 0.13% per year, which is only 0.40 standard errors from zero. Thus, even before expenses, the overall portfolio of active mutual funds shows no evidence that active managers can enhance returns. After costs, fund investors in aggregate simply lose the fees and expenses imposed on them.
Adding insult to injury, the aggregate portfolio of active mutual funds looks a lot like the cap-weighted stock market portfolio. When we use the three-factor model to explain the monthly percent returns of the aggregate fund portfolio for 1984-2006, we get,RPt - Rft = -0.07 + 0.96(RMt - Rft) + 0.07SMBt - 0.03HMLt + eit,
where RPt is the return (net of costs) on the aggregate mutual fund portfolio for month t, Rft is the riskfree rate of interest (the one-month T-bill return for month t), RMt is the cap-weighted NYSE-Amex-Nasdaq market return, and SMBt and HMLt are the size and value/growth returns of the three-factor model.
The regression says that the aggregate mutual fund portfolio has almost full exposure to the market portfolio (a 0.96 dose, which is close to 1.0), but almost no exposure to the size and value/growth returns (0.07 and -0.03, which are close to zero). Moreover, the market alone captures 99% of the variance of month-by-month aggregate fund returns.
In short, the combined portfolio of all active mutual funds is close to the cap-weighted market portfolio, but with a return weighed down by the high fees and expenses of actively managed funds.
Are Winners Skilled or Lucky?
The fact that the aggregate portfolio of wealth invested in active mutual funds shows no evidence of manager skill does not mean no fund managers have skill. It simply means that if there are good managers who produce positive α, they must be balanced by bad managers who produce negative α. Can we find evidence of good and bad managers?
The challenge in answering this question is distinguishing skill from luck. Even if no fund manager is good or bad, many funds will do well and many will do poorly purely by chance. When we examine fund-by-fund performance, we need some way to determine whether the range of outcomes is wider than we expect if every manager's true α is zero. If the observed range of outcomes is wider (the distribution of outcomes is more disperse) than we expect by chance, we can infer that there are bad fund managers overpopulating the left tail of outcomes and good managers overpopulating the right tail.
Our mutual fund data are from CRSP. The data cover all funds, including funds that die. Dead funds are typically missing from other databases, but they are important for an unbiased picture of fund performance. The data are most reliable after 1983, and we focus on 1984-2006 (actually January 1984 through September 2006). Our sample includes US equity funds that existed during 1984-2006 and came on line prior to September 2001. We ignore funds that appear after September 2001 to avoid having lots of funds with short return histories. To avoid having lots of tiny funds, we only include funds in our tests after they pass $5 million in assets under management, but we do not exclude them if they later fall below $5 million. Since we estimate benchmark regressions for each fund, we limit the tests to funds that have at least eight months of returns after they pass an AUM bound, so there is a bit of survival bias. With the exception of this eight month requirement, our exclusion rules are forward-looking (an investor could use them in real time), so they do not bias our results.
It is easy to obtain the distribution of mutual fund performance. We simply estimate the three-factor model on reported (that is, net of cost) monthly returns for each fund in the sample. This gives us a three-factor α estimate for each fund. But we do not sort funds on α estimates. The problem with α estimates is that some are more reliable than others because (i) time in the sample varies across funds and (ii) the diversification of a fund affects the reliability of its α estimate. (Longer time in the sample and more diversification make an α estimate more precise.) As a result, the same α estimate for two different funds does not imply the same amount of information about performance because the statistical precision of the two estimates may differ.
It is more sensible to rank funds on the t-statistics for their α estimates. The t-statistic, t(α), is the ratio of an α estimate to its standard error, which is a measure of the precision or reliability of the α estimate. When precision is low, the standard error is high, and vice versa. Dividing each α estimate by its standard error gives us precision-adjusted α estimates that allow meaningful comparisons across funds.
Ranked from the lowest to the highest, the percentiles of the distribution of three-factor t(α) estimates for the funds in our sample are in the column labeled Actual in Table 1. For example, the first percentile of t(α) for actual returns is ‑3.87, which means that 1% of the 3156 funds in our sample (32 funds) have t(α) estimates at or below -3.87. The tenth percentile of t(α) for actual returns is -2.34, so 10% (316) of our funds have t(α) estimates at or below -2.34. At the other end of the distribution, the 90th percentile of t(α) for actual returns is 1.01. Equivalently, t(α) is 1.01 or above for 10% (316) of our funds. Finally, the 99th percentile of t(α) is 2.47, so t(α) is 2.47 or above for 32 funds.
If one ignores the effects of chance, the performance of the funds with high t(α) estimates seems spectacular. Remember that t(α) is the ratio of an α estimate to its standard error. The average standard error of the α estimates is 0.28. A value of t(α) of 1.00 thus typically translates to an α estimate of about 0.28% per month or about 3.36% per year — over the whole life of the fund. A manager who produces α of 3.36% or more per year will be anointed by the press and fund rating agencies as a brilliant stock picker. In our sample 10% of the managers do this well or better. Typically, we look for values of t(α) above 2.00 to infer statistical reliability. A bit more than 2% of our funds are above this hurdle, and their annualized α estimates are typically in excess of 6.5% per year — over the whole lives of their funds. These managers are sure to get multiple gold stars from the press and fund rating agencies, and they are sure to be inundated with inflows from investors.
There is, however, a nasty fly in the ointment — chance. With 3156 funds, many would produce extreme values of t(α) just by chance even if true α were zero for every fund. The question we address is whether more funds produce extreme values of t(α)
There are many problems in answering this question. For example, stock returns are fat-tailed relative to the normal distribution (there are too many extreme returns), and this spills over into fund returns and estimates of α. Returns are also correlated across funds, and this causes α estimates to be correlated across funds. The academic contribution of our longer paper, "Luck versus Skill in the Cross Section of Mutual Fund Returns," centers on solutions to these and other problems, and the interested reader can go to the paper for details. Here we give a bare bones outline of the approach. (The reader who is not interested in even these details can skip the next paragraph.)
The idea is to construct the chance distribution of t(α) estimates for a cloned population of funds that have the return characteristics (fat tails, correlation, etc.) of the actual population of funds, except that in the cloned population true α and thus true t(α) are zero for every fund. Setting up the cloned population of fund returns is easy. We estimate the three-factor model on each fund's actual returns and then subtract the resulting α estimate from the fund's returns. This gives us returns for each fund that have the properties of the fund's actual returns, except that true α and t(α) for the cloned returns are zero. To generate a chance distribution of α and t(α) estimates, we draw a random sample (with replacement) of months from the cloned population of fund returns. (Drawing the same random sample of months for every fund maintains the cross correlation of fund returns.) For every fund we then estimate the three-factor model on the random sample of returns. This gives us one chance distribution of α and t(α) estimates. To have many such samples on which to base inferences, we repeat this "bootstrap simulation" 10,000 times.
Each simulation run gives us a chance distribution of t(α) estimates from a world in which true α is zero for every fund. The Simulation column of Table 1 shows the averages of the percentile values of t(α) obtained from the 10,000 simulation runs. For example, the average of the first percentile values of t(α) from the 10,000 simulation runs is -2.50, and the average of the 99th percentile values of t(α) is 2.45.
The simulations say we expect lots of chance dispersion in t(α) estimates when true α and t(α) are equal to zero. Not surprisingly, high percentiles of t(α) estimates in the simulations are associated with stellar returns. In the simulations, the average t(α) estimate at the 90th percentile is 1.30. Recall that t(α) is the ratio of an α estimate to its standard error, and the average standard error is around 0.28. A t(α) of 1.30 thus translates to an α estimate of about 0.36% per month, or about 4.37% per year. And performance is more extreme for funds further into the right tail of t(α) estimates. These funds would be buried in praise from the investment media, and they would be buried in money from investors hoping for strong future returns. Conversely, low percentiles of t(α) in the simulations imply equally depressing investment outcomes. These funds would not have a happy future in the investment management business.
We in the cloning business know, however, that all dispersion in the α and t(α) estimates from the simulations is due to chance, since true α is zero for every fund. The great performance in the right tail of the simulation distribution is just good luck, and the poor performance in the left tail is bad luck. Our goal is to use the chance distribution of t(α) from the simulations to draw inferences about performance in the distribution of t(α) estimates for actual fund returns.
What are we looking for? There are several ways to say it. If there are funds that have positive true α, we should find more high values of t(α) in actual fund returns than we observe in the simulations. Conversely, if there are funds that have negative true α, we should find more low (extreme negative) values of t(α) in actual fund returns than in the simulations. Put differently, the worst performing funds should perform worse than we expect just by chance if every fund has a true α of zero, and the best performing funds should perform better than we expect by chance. Concretely, if there are funds with negative and positive true α, the negative values of t(α) at low percentiles should be more extreme for actual fund returns than for the simulations, and the positive values of t(α) at high percentiles should also be more extreme for actual fund returns than for the simulations.
Table 1 confirms that poorly performing funds indeed do worse than we expect if true α is zero for all funds. For every percentile below the 50th (the median), the t(α) estimate for actual fund returns is far below the average value from the simulations. For example, the first percentile of t(α) is ‑3.87 for actual fund returns, so 1% of actual funds have t(α) of -3.87 or lower. The simulated distribution has fewer funds whose performance is this bad; to include 1% of simulated funds, we have to raise the boundary to ‑2.50. The 10th percentile of t(α) for actual fund returns, -2.34, is far more extreme than the 10th percentile for simulated fund returns, -1.32. The 50th percentile of t(α) in Table 1 says that half the actual funds have t(α) of ‑0.62 or less, while the median t(α) from the simulations is almost exactly zero, ‑0.01. All this suggests that among poorly performing funds, there are many with negative true α. In other words, their poor performance is not entirely due to chance.
Unfortunately, the percentiles of t(α) for actual fund returns are also below the average percentiles from the simulations throughout most of the right tail of the t(α) estimates, which contains the strong performers. For example, Table 1 says that the 90th percentile of t(α) for actual fund returns is 1.01, versus 1.30 from the simulations. In other words, the seemingly impressive t(α) estimates of most of the best performers are actually low relative to what one would get in a world where true α is zero.
So far we have compared the distribution of t(α) produced by funds with the average distribution from the simulations. Since we do 10,000 simulation runs, you may want to know how often the t(α) estimates for actual fund returns beat those from the simulations. The relevant information is in the column
labeled %<Actual in Table 1, which shows the percent of the simulation runs in which the value of t(α) at a given percentile is less than the t(α) estimate at that percentile for actual fund returns. For example, the 10th percentile of t(α) from actual fund returns is -2.34, the average value of the 10th percentile of t(α) estimates from the 10,000 simulation runs is ‑1.32, and only 0.05% of the simulation runs (5 of 10,000) produce 10th percentile values of t(α) below the -2.34 observed for actual fund returns.
At every percentile below the 95th, less than 10% of the simulation runs produce a lower t(α) estimate than actual fund returns, which means more than 90% of the simulation runs beat the t(α) estimate for actual fund returns. In short, the simulations tell us that for the vast majority of actively managed funds, true α is probably negative; that is, the fund managers do not have enough skill to produce risk adjusted expected returns that cover their costs.
There is, however, a glimmer of hope, at least for a small fraction of active managers. Thus, the 97th, 98th, and 99th percentiles of the cross section of three-factor t(α) estimates from actual net fund returns in Table 1 are close to the average values at the same percentiles from the simulations, and 49.35% to 58.70% of the t(α) estimates from the 10,000 simulation runs are below those from actual net returns. In other words, the historical performance of the top funds is about as we would expect from the extremely lucky funds in a world where true α is zero for all funds. But this just means that our estimate of true α for the top three percentiles of historical performers is near zero - not great, but better than the negative true α estimates for the vast majority of actively managed funds.
Figure 1 provides a picture of the results in Table 1. One curve in the figure shows the percent of actual funds with t(α) estimates below each value from -4.0 to 4.0. The other curve shows the average percent of simulated t(α) estimates below each value. The figure shows that the distribution of t(α) from actual fund returns is to the left of the average from the simulations, except in the extreme right tail, where the two distributions coincide. In other words, even in the extreme right tail where performance is strong, funds look only about as good as would be expected in a world where true α is zero.
Figure 1: Simulated and Actual Cumulative Density Function of Three-Factor t(α) for Net Returns, 1984-2006
What mix of funds might generate the results in Table 1? Suppose there are two groups of active funds. Managers of good funds have just enough skill to produce zero expected α; bad funds have negative expected α. Together, the two groups are likely to produce a cross section of t(α) estimates entirely to the left of the average of the cross sections from the simulation runs (in which all managers have zero true α). Even the extreme right tail of the t(α) estimates for actual fund returns will be weighed down by bad managers who are extremely lucky but have smaller t(α) estimates than if they were extremely lucky good managers.
In our tests, most of the cross section of t(α) estimates for actual net fund returns is way left of what we expect if all managers have zero expected α. Thus most funds are probably in the negative expected α group: their managers do not have skill sufficient to generate risk adjusted expected returns that cover costs. The 97th, 98th, and 99th percentiles of the three-factor t(α) estimates for actual net fund returns are similar to the simulation averages. This suggests that buried in the results are fund managers with more than sufficient skill to cover costs (they have positive true α), and the lucky among them pull up the extreme right tail of the net return t(α) estimates. Unfortunately, these good funds are indistinguishable from the lucky bad funds that land in the top percentiles of the t(α) estimates but have negative true α. As a result, if we buy a portfolio of the top three percentiles of t(α) estimates, the expected three-factor net return α is zero; the positive true α of the lucky (but hidden) good funds is offset by the negative true α of the lucky bad funds.
Active managers are sure to chime in at this point that fund returns are net of costs, but the passive benchmark returns of the three-factor model are before all costs. It is thus possible that passive funds also perform poorly because of costs. Wrong. Our tests exclude index funds, but we can report that the three-factor α for 1984-2006 for the aggregate portfolio of index funds is quite close to zero, -0.16% (-16 basis points) per year. In other words, at least in aggregate, the three-factor benchmark-adjusted returns on passive funds come close to covering their costs. This is probably due to lower turnover and lower management fees for passive funds, and more opportunities to offset costs with revenues from securities lending.
Since large low cost index funds are not subject to the vagaries of active management, it seems reasonable to infer that true α for a portfolio of these funds is close to zero. In other words, going forward we expect that a portfolio of low cost index funds will perform about as well as a portfolio of the top three percentiles of past active winners, and better than the rest of the active fund universe.
If one is only interested in evaluating mutual fund performance from the perspective of investors, the discussion so far (which uses the net returns realized by investors) is the whole story, and there is no need to read further. But we are also interested in testing whether fund managers show evidence of skill if we ignore fund expenses. We conclude with a brief review of our evidence on this question.
Is There Evidence of Skill If We Ignore Expenses?
Recall that when we add back the costs in expense ratios, the three-factor α for the aggregate portfolio of wealth invested in active funds is 0.13% per year (13 basis points), which is quite close to zero. After costs, that is, in terms of net returns to investors, α drops to -0.81% per year. These results suggest that the poor performance of funds in Table 1 may be due to costs. If we add back costs, perhaps we will find that there are good managers with skill that leads to positive true α in gross (pre-cost) returns, and perhaps there are bad managers who have negative true α. The evidence is in Table 2, which is constructed in the same way as Table 1 except that Table 2 uses gross returns rather than net returns to investors. Gross returns are net returns plus the costs included in expense ratios.
The average simulation distributions of t(α) are quite similar in Table 1 (net returns) and Table 2 (gross returns). This is because true α is set to zero for every fund in the simulations for both net and gross returns. Setting true α to zero in the simulations for net returns is relevant for judging whether managers have sufficient skill to generate expected returns that cover costs, whereas setting true α to zero for gross returns is better for judging whether managers have any skill (bad or good) that results in non-zero true α in expected returns before costs. Because true α is always zero in the simulations, the distributions of t(α) in the simulations are quite similar for gross and net returns.
Adding back the costs in expense ratios pushes up t(α) for actual fund returns. Table 2 shows, however, that the left tail of three-factor t(α) estimates for actual gross returns is still to the left of the average from the simulations. For example, the simulations say that in the absence of skill, on average the fifth percentile of t(α) for gross returns is ‑1.71. The actual fifth percentile is lower, ‑2.19. Thus, the left tail of three-factor t(α) estimates again suggests that there are bad fund managers whose stock picks result in negative true α relative to passive benchmarks - even if we give them back their expenses.
Conversely, the right tail of three-factor t(α) estimates in Table 2 points to the existence of superior managers who, ignoring expenses, enhance expected returns relative to passive benchmarks. The t(α) estimates for actual gross fund returns move to the right of the average values from the simulations at about the 60th percentile. For example, the 95th percentile of t(α) averages 1.68 in the simulations, but the actual 95th percentile is higher, 2.04.
Figure 2 provides a picture. The figure shows that the left tail of the t(α) estimates for actual gross fund returns is to the left of the average from simulations. In other words, poorly performing funds do worse than we expect if true α is zero for all fund managers. And the right tail of the t(α) estimates for gross fund returns is to the right of the average from simulations, which means that funds with the best performance do better than we expect if all managers have zero true α, before costs.
Figure 2: Simulated and Actual Cumulative Density Function of Three-Factor t(α) for Gross Returns, 1984-2006
Table 2 also says that the evidence of pre-expense performance in the tests on gross returns is statistically reliable, at least in the extreme tails of the cross section of t(α) estimates. For the fifth and lower percentiles of t(α), the estimates from the simulations beat those from actual fund returns in more than 95% of the 10,000 simulation runs. This is rather strong evidence that some of the poor performance in the extreme left tail of the t(α) estimates is worse than would be expected if poor performance is just due to chance. In other words, there are some true losers in the population of fund managers. Conversely, for the 95th and higher percentiles of t(α), the estimates for actual gross fund returns beat those from the simulations in more than 90% of the simulation runs. This is rather strong evidence that some of the strong performance in the extreme right tail of the t(α) estimates is better than would be expected if performance was just due to chance. In other words, when returns are measured before the fund expenses borne by investors, we find evidence that there are some true winners in the population of fund managers.
Keep in mind, however, that after costs, that is, in terms of returns to investors, we are back in the realm of Table 1 and Figure 1, where the evidence says that for the vast majority of funds true α is negative. And even for the top percentiles of historical t(α), strong past performance is probably due to chance. Going forward, the estimates of true α for the top performers is close to zero - about the same as for an efficiently managed portfolio of passive funds.
Appendix - New Versus Old Results
The simulation results in this version of the paper are different from those of the previous version. There is now slightly more evidence of performance, which leads to more shaded conclusions. For example, in the earlier version, the simulation tests on net returns produced no evidence of managers with sufficient skill to cover their costs. The new results say that a small fraction of managers do have sufficient skill to cover costs. The new tests on gross returns also produce more evidence of skill, good and bad.
The stronger evidence of skill in the new results is not due to the t(α) estimates for actual fund returns. The cross sections of t(α) for actual fund returns in the new tests are similar to those in the earlier version of the paper. (The changes are due to dropping index funds in the current tests, to focus better on actively managed funds.) Rather the stronger evidence of skill in the new results is due to tighter simulation distributions of t(α).
What happened? In the tests on actual fund returns (new and old), we include a fund only for the months when its investment style is identified as primarily U.S. equity. This month-by-month screen is also used to determine the months a fund is eligible for inclusion in the new simulations. In the old simulations, however, once a fund passed an AUM bound, its entire subsequent return history was (mistakenly) used as long as the fund's investment style was ever primarily U.S. equity.
When we set a fund's α to zero in the simulation population of returns, we use full-period regression slopes. But a fund's true regression slopes change if it changes its investment style, for example, from stocks to bonds. Using its full-period slopes to set its full-period α to zero then in effect injects positive α into its returns when it is a lower risk fund (as measured by its true time varying regression slopes), and it injects negative α into its returns when it is a higher risk fund. The result is too much dispersion in the distribution of t(α) estimates in the simulations, which are likely to oversample either the high risk or the low risk months of a fund's return history. This leads us to understate the performance in the cross section of t(α) estimates, both the bad performance in the left tail and the good performance in the right tail.
Table 1 - Percentiles of t(α) Estimates for Actual and Simulated Net Fund Returns: January 1984 to September 2006
The table shows values of three-factor t(α) estimates at selected Percentiles of the distribution of t(α) estimates for Actual net fund returns. Net fund returns are the returns reported to investors. The table also shows the percent of the 10,000 simulation runs that produce lower values of t(α) at the selected percentiles than those observed for actual fund returns (%<Actual). Simulation is the average value of t(α) at the selected percentiles from the 10,000 simulation runs in which true α is zero for net returns.
Table 2 - Percentiles of t(α) Estimates for Actual and Simulated Gross Fund Returns: January 1984 to September 2006
The table shows values of three-factor t(α) estimates at selected Percentiles of the distribution of t(α) estimates for Actual gross fund returns. Gross fund returns are the net returns reported to investors plus the costs in expense ratios. The table also shows the percent of the 10,000 simulation runs that produce lower values of t(α) at the selected percentiles than those observed for actual fund returns (%<Actual). Simulation is the average value of t(α) at the selected percentiles from the 10,000 simulation runs in which true α is zero for gross returns.
Eugene Fama and Ken French are members of the Board of Directors for and provide consulting services to Dimensional Fund Advisors LP.