Week 2: What a Stock's Return Is Actually Made Of
Applying Machine Learning to Financial Data, Part 2 of 52
Last week we asked linear regression to predict tomorrow, and it declined, politely and instructively. The daily move of the market turned out to be very nearly unforecastable from its own recent past, which is exactly what a nearly efficient market should look like when you write it down in numbers. The lesson was not about regression at all. It was about the discipline that keeps you from deceiving yourself: a leakage-proof target, a split that respects the arrow of time, baselines set down before the model, and an evaluation that ends with costs rather than with a flattering R-squared.
This week we keep the same tool and change the question entirely. We are not going to forecast anything. We are going to take one stock and ask what its returns are made of. The right-hand side of the equation finally holds variables with a real economic claim on the left, and the regression stops being a failed oracle and becomes something it is genuinely good at: an instrument of explanation.
Two different jobs for the same arithmetic
It is worth being precise about the shift, because it runs through the whole of machine learning and is the source of a great deal of muddle. Ordinary least squares can answer two quite separate questions, and the same fitted line serves both.
The first question is prediction: given what I know today, what is my best guess for an outcome I have not yet seen. That was Week 1, and we judged it the only way prediction can honestly be judged, on data the model had never met. The second question is inference: what is the relationship between these quantities, in the data I have, and how confident may I be that it is real rather than an accident of the sample. That is this week. Prediction lives or dies out-of-sample. Inference is a statement about the sample in hand, and it is judged by the size of an effect and the tightness of the estimate, not by a forecast.
In code the division is clean. For prediction we lean on scikit-learn, with its fit, predict, and score. For inference we reach for statsmodels, which gives us coefficients, standard errors, and p-values. They share the same underlying least-squares engine, and they answer different questions with it. Keeping the two jobs distinct is the spine of this week.
The model with a real economic claim
The idea we are putting to work is one of the most durable in empirical finance. A stock’s return is not a lone thing. It moves, for the most part, because broad and identifiable forces move, and the stock is exposed to them in its own particular measure. Eugene Fama and Kenneth French gave the most influential accounting of those forces. Strip out the return you would have earned simply by lending at the risk-free rate, and what remains, the excess return, can be explained to a remarkable degree by three common factors.
The first is the market itself, the excess return of the whole market over the risk-free rate, written Mkt-RF. The second is size, the historical tendency of small companies to behave differently from large ones, captured by a portfolio that is long small firms and short large ones, written SMB for small minus big. The third is value, the tendency of cheap-looking firms, those with a high book value relative to their price, to diverge from expensive-looking ones, written HML for high minus low. The regression we run is the plainest possible statement of the idea:
excess return of the stock = alpha + (beta_market × Mkt-RF) + (beta_size × SMB) + (beta_value × HML) + noise.
Every term on the right is a number you can obtain for free. Kenneth French publishes the daily factor returns on his data library, updated and maintained, and the script fetches them directly. The three betas tell us how strongly this particular stock is wired into each force. The intercept, alpha, is the part of the average return that none of the three factors can account for. It is the residual that has made and unmade careers, and we will treat it with the suspicion it deserves.
The stock, and an honest word about data
I have chosen Apple. It is a useful subject precisely because its story is one most readers already hold an intuition about, and a factor regression is a fine way to see whether the intuition survives contact with the arithmetic. A large, much-loved technology company should look like a high-beta growth stock: more volatile than the market, the opposite of small, and the opposite of value. We shall see whether the numbers agree.
A word on provenance, in the same spirit of disclosure as last week. The factor data comes from the Ken French library, which is a different source from the price data of Week 1, and the script builds a dedicated loader for it. The live library is not reachable from every environment, so the loader falls back to a freely mirrored copy of the daily three-factor file on GitHub, and that mirror ends in the middle of 2019. For the stock’s own return the canonical figures come from a local run against real prices. Where no free price source is reachable, the script generates a stand-in stock from the real factors with known exposures, so that the whole apparatus runs anywhere and can be checked against an answer one already knows. The numbers quoted in this article that carry a true alpha come from the live run; any produced in a restricted environment are labelled as such in the results file and are there to prove the machinery works, not to describe Apple. The distinction is laboured on purpose. A figure you cannot trace to its source is a figure you cannot defend.
Why there is no leakage rule to break this week, and why we keep the time discipline anyway
Last week half the battle was preventing the future from leaking into the past. A factor regression of this kind does not face the same danger in its main task, and it is worth understanding why. The model explains today’s excess return using today’s factor returns. Both sides of the equation are measured over the same day. There is no shifting, no lagging, and no target hidden one step into the future, because we are not predicting a later outcome from earlier information. We are decomposing a contemporaneous quantity into contemporaneous parts. It is an act of accounting, not of forecasting.
That contemporaneity is the very thing that makes the high R-squared we are about to see honest as a description and useless as a forecast, and the distinction matters so much that the latter half of this article is devoted to it.
What the regression says
Run over the available daily history, the three-factor model accounts for a great deal of Apple’s day-to-day movement. The R-squared comes in at roughly 0.58. Set that beside last week’s figure, where the in-sample R-squared on the index was under two-hundredths, and the contrast is the entire point. There, lagged returns explained almost nothing of tomorrow. Here, three contemporaneous factors explain well over half of today. The difference is not that the tool grew cleverer. It is that we asked it a question it can actually answer.
The loadings tell the story one would hope to read. The market beta lands a little above 1.25, comfortably greater than one, with a t-statistic so large it leaves no room for doubt that the exposure is real: Apple amplifies the market’s moves, as a volatile large-cap should. The size loading is negative, around minus 0.28, which is simply the regression confirming in its own language that Apple is about as far from a small company as a company can be. The value loading is negative too, near minus 0.45, and significantly so, marking Apple firmly as a growth stock rather than a value one, exactly the wired-in opposite of the cheap-and-unloved firms the HML factor is built to track. None of this will surprise anyone who has followed the company. The merit of the exercise is that the intuition is no longer a story. It is a measured quantity, with a standard error attached, that you could have disagreed with had the data refused to cooperate.
There is one technical care worth naming, because it is part of the discipline. Daily financial residuals are not the tidy, independent, constant-variance errors that the textbook standard-error formula assumes. They cluster: volatile days arrive in runs, and the residuals inherit that clustering. Reading the ordinary standard errors off such data tends to make every estimate look more certain than it is. So we compute Newey-West standard errors, known as HAC for heteroskedasticity and autocorrelation consistent, which widen the error bars to account for exactly this. It is the same instinct as last week’s paranoia about leakage, pointed at a different failure mode: do not let the machinery report a confidence the data has not earned.
The intercept, and the discipline of not celebrating it
Now to alpha, the intercept, the part of Apple’s average excess return that the three factors leave unexplained. This is the number an incautious reader most wants to seize upon, because a positive, significant alpha is the holy grail it appears to be: a return that is not merely compensation for bearing common risk, but something extra, something the factors cannot account for.
Here the honest answer depends on the window, and saying so is part of the lesson. Over a long modern history that includes Apple’s extraordinary decade, the regression does report a positive alpha, and the company has indeed delivered a great deal that simple factor exposure cannot explain. But the same intercept estimated over a shorter or noisier window can fall well within the range one would expect from chance, with a t-statistic too small to take to the bank. In the restricted run that accompanies this article, for instance, the intercept sits a touch above five per cent annualised but with a t-statistic of only about 1.1, which is to say: not distinguishable from zero. A positive point estimate is not a positive finding. The width of the error bar is doing essential work, and a person who reads only the central number and skips the standard error has learned to lie to themselves with a calculator.
This is the inference-week echo of last week’s most important warning. There we saw that an in-sample relationship can be statistically real and economically worthless. Here we see that an alpha can look handsome as a point estimate and dissolve the moment you ask how sure you are of it. In both cases the antidote is the same: never read the headline without the uncertainty attached to it.
A robustness aside
Three factors are not the only accounting available. Fama and French later extended their model to five, adding profitability, the tendency of robustly profitable firms to outperform, written RMW, and investment, the tendency of conservatively investing firms to behave differently from aggressive ones, written CMA. A separate and well-evidenced factor, momentum, captures the tendency of recent winners to keep winning over the medium term. The script refits the regression with all of these where the live library can be reached, and the exercise is a useful check on whether the three-factor story was hiding something. For a name like Apple one typically finds the market, size, and value loadings barely move, which is reassuring: it means the original three were not silently standing in for something else. The profitability and momentum loadings often pick up real and interpretable exposure of their own, and any alpha tends to shrink as the richer model claims a little more of the return as compensation for known risks. That shrinkage is the honest direction of travel. The more sources of common risk you allow the model to see, the less of the return is left over to call skill.
The part that ties back to Week 1
A factor model that explains 58 per cent of a stock’s daily variation is a powerful thing, and it would be the most natural error in the world to conclude that we have found something to trade. We have not, and seeing why is the bridge from this week to the last.
The R-squared is high because the model uses today’s factor returns to explain today’s stock return. But on the morning you would actually have to act, you do not know today’s factor returns. They are realised over the very same day as the thing you are trying to anticipate. The model’s explanatory power is entirely contemporaneous, and contemporaneous knowledge is precisely the knowledge a trader does not possess in advance.
To make this concrete rather than merely assert it, the script runs a short coda that does the obvious naive thing. It takes the fitted factor loadings and uses them to form a position for the next day, proxying tomorrow’s unknown factors by today’s, then goes long whenever the resulting prediction is positive and sits flat otherwise, charging a basis point on every change of position. This is exactly the Week 1 apparatus, baselines and walk-forward validation and a costed long-flat backtest, pointed at a factor model instead of a return model. The result is the one the theory demands. The out-of-sample R-squared turns negative, around minus 0.008, worse than predicting nothing. The directional hit rate sits within a hair of one half, beaten by the trivial rule that always guesses up. And the traded strategy earns a Sharpe ratio well below simply buying Apple and holding it, roughly 0.87 against 1.66 in the restricted run, because it spends long stretches in cash, missing the upward drift that is the stock’s most dependable feature, and pays a toll for the privilege.
So we end with two numbers that look like a contradiction and are not. An R-squared of 0.58 in-sample and contemporaneous. An R-squared below zero out-of-sample and forecast-shifted. The first is a true and useful description of what Apple’s return is made of. The second is the correct verdict on whether that description is a trading signal. Both are right at once, and holding them together without confusion is the whole skill.
What carries forward
The technique was again disposable; the habits are not. This week adds three to the collection. Be explicit about whether you are doing prediction or inference, because the same regression serves both and the standards for each are different. When you are doing inference, never quote a coefficient without the uncertainty that surrounds it, and prefer standard errors that respect the messiness of financial data over the textbook ones that flatter it. And never let a high contemporaneous R-squared masquerade as foresight, because a model that explains the present perfectly may know nothing whatever about the future.
Next week we stay with regression but give it more to handle than it can comfortably manage, fifty or more macro indicators competing to explain Treasury yields, and meet ridge and lasso, the two disciplines that decide which variables deserve to survive.
The complete, runnable script accompanies this article. It fetches the Ken French factors live where it can, falls back to a freely mirrored daily file, and constructs a factor-driven stand-in stock where no free price source is reachable, so that it runs anywhere. Figures quoted with a genuine alpha come from a live run on real prices; the exact decimals will shift with the window you choose, but the shape of the result, a high contemporaneous fit and no out-of-sample edge, will not. Code available here.
A note on how this was made
The prose here is mine. The code that produced every number in it was built in collaboration with Claude, an AI assistant from Anthropic, and that collaboration is worth describing plainly because it is part of the method rather than a footnote to it.
The shared mlfin package, the leakage-safe scaffolding established in Week 1, was reused without modification: the time-respecting split, the walk-forward validation, the naive baselines, and the costed long-flat backtest all carried straight over. What Week 2 added was a dedicated loader for the Ken French factor library, with the same four-tier fallback philosophy as the rest of the series, and a single-stock excess-return loader that degrades to a factor-driven synthetic when no free price source is reachable. The working pattern was conversational: I set the constraints and the disciplines, Claude drafted the loader and the regression script against them, and I read every line back against the rule that no figure should outrun its source. The synthetic stock carries deliberately known factor loadings precisely so that the inference pipeline can be checked against an answer one already has, and a small test asserts the regression recovers them.
The judgement calls stayed with me: which stock, which factor set, where the live data ends and the fallback begins, and how loudly to label a number that came from a restricted run rather than from real prices. The article’s canonical figures come from my own run against the live library. The code is open under the MIT licence in the series repository; the writing is mine.


