Thursday, January 17, 2013

Hurst Exponent and Financial Analysis - Intro

I've combined my interest in the predictability of financial variables (the time series of stocks and indices), mathematical programming, complexity and self-organizing systems, and the graphical display of information into an exercise over the past week examining the behavior of a time-series statistical measure called the Hurst exponent. But, first, a little background.

Benoit Mandelbrot was one of the most interesting thinkers of the 20th century. He passed away in 2010, but his work spanned over fifty years and a broad range of topics. He is known best as the inventor of the term 'fractal', and the description of a two-dimensional object called the Mandelbrot set. There is much to be said about Mandelbrot, but for my purposes we need to focus on his financial analysis and his book, "The Misbehavior of Markets".

Mandelbrot was a quantitative financial analyst in the 1960s, and tried a simple experiment: he applied well-known statistical techniques on historical price data of a traded commodity (cotton). His thoughts are at once controversial and widely accepted. You have to read the book to see the rich interplay of ideas and the fertile tension between Mandelbrot's mathematical approach to stock price statistics and the various 'accepted' theories.

I focus here on one idea Mandelbrot had begun to develop in the final decade of his life. Among the many statistical quirks of market prices -- be they the cotton prices Mandelbrot wrote about in a controversial 1963 paper or prices of IBM stock -- is their long-range dependence. That is, when you measure the autocorrelation of the stock price time series, you find that the price today is heavily dependent on the price over the past several days. Mandelbrot was beginning to explore a measure of this long-range dependency (actually, months and years instead of days) called the Hurst exponent. I won't reproduce the math here, but the Hurst exponent is a parameter in a 'data generating function' . The data generating function is a mathematical relationship that a theoretician proposes as a surrogate for the processes that generate a time series. It is often the basis of quantitative analysis of the open market in traded stocks (or currencies, commodities, government bonds, tulips, etc.)

The ultimate goal of analyzing any time series is to determine the underlying data generating function. It often occurs that we never know the actual function, but that our proposed function provides a close enough imitation that we can do predictions. These predictions might be about a range of values for future prices, or they may be about risk: the probability that a stock will fall below a particular value.

Long-term dependency is, itself, an indicator of the propensity of the system to go through large, unexpected 'phase-change' type shifts. That is, LTD helps us define risk. The concept was first observed by the civil engineers who watch the flow of rivers and must determine the proper capacity for dams and other flood control measures. Hurst himself was an engineer designing such systems for the Nile in the 1950s.

One does not calculate the actual Hurst exponent of a time series. This is because, as I stated before, we don't know the real data generating function. But, one can estimate the Hurst exponent. There are several formulas for this. Hurst used an estimator called the "rescaled range" or R/S. More recently, researchers working with a synthetic data set (in which we know the DGF) have published a more accurate estimator, the rescaled variance or V/S.

One would think that the wide array of quantitative tools provided in the various computer languages would mean that somebody would have coded up the R/S and V/S estimators in Java, R, or Python. One would be wrong. There is a R/S estimator as part of an importable module in R, but there doesn't seem to be much discussion of its behavior or even validation of the algorithm. And, nobody seems to have extended it to the V/S estimator.

So I wrote one. I created, in Python, a module that calculates both an R/S and a V/S estimator given a time series. This includes a rescaling process that segments the time series and makes a rolling H calculation at several scales, and reports H as the gradient as the result of a log-log regression. I'll spare you the math but you'll have to take my word for the fact that this is how the literature describes the proper way to estimate H.

The Hurst Exponent measures long-range dependency, and varies between 0 an 1. A random, white-noise time series will have no long-term dependency and the H will be around 0.5. Highly dependent streams of data will have H approaching 1.0. Some odd time series are anti-correlated -- very sharp and volatile -- will have an H approaching 0. 

That's what I did in December 2012. Over the past week I downloaded the past three years of data from all 30 DJIA stocks, and explored the value of H. But, before I describe my results, I need to get the non-financial analyst on the same page as financial analysts.

Actual closing prices of a stock like Alcoa or IBM are highly correlated. There's no real surprise here: there is an underlying value in a company and future profits fall within a pretty well-defined and stable range. The Hurst exponent of the prices is well above 0.9, and for many of the DJIA 30 stocks is near 0.99. Most of that is uninteresting.

Stock market quantitative analysts actually focus on 'returns'. These can be calculated in a variety of ways. Assume that we have a price at time, t, called P(t). And, assume we have a time difference d. The return can be defined as R(d) = (P(t + d) - P(t))/P(t). Often this leads to exponential results as d gets larger. For this and other reasons, stock market analysts take the logarithm of R(d). As the log(1) = 0, the above relationship becomes log R = log (P(t+d)/P(t)) = log P(t+d) - log P(t).

That sets it up. In my next post, I'll show the results of H using R/S and V/S for the daily, weekly DJIA averages and the 30 industrials.

No comments: