Thursday, January 17, 2013

Hurst Exponent and Financial Analysis - Intro

I've combined my interest in the predictability of financial variables (the time series of stocks and indices), mathematical programming, complexity and self-organizing systems, and the graphical display of information into an exercise over the past week examining the behavior of a time-series statistical measure called the Hurst exponent. But, first, a little background.

Benoit Mandelbrot was one of the most interesting thinkers of the 20th century. He passed away in 2010, but his work spanned over fifty years and a broad range of topics. He is known best as the inventor of the term 'fractal', and the description of a two-dimensional object called the Mandelbrot set. There is much to be said about Mandelbrot, but for my purposes we need to focus on his financial analysis and his book, "The Misbehavior of Markets".

Mandelbrot was a quantitative financial analyst in the 1960s, and tried a simple experiment: he applied well-known statistical techniques on historical price data of a traded commodity (cotton). His thoughts are at once controversial and widely accepted. You have to read the book to see the rich interplay of ideas and the fertile tension between Mandelbrot's mathematical approach to stock price statistics and the various 'accepted' theories.

I focus here on one idea Mandelbrot had begun to develop in the final decade of his life. Among the many statistical quirks of market prices -- be they the cotton prices Mandelbrot wrote about in a controversial 1963 paper or prices of IBM stock -- is their long-range dependence. That is, when you measure the autocorrelation of the stock price time series, you find that the price today is heavily dependent on the price over the past several days. Mandelbrot was beginning to explore a measure of this long-range dependency (actually, months and years instead of days) called the Hurst exponent. I won't reproduce the math here, but the Hurst exponent is a parameter in a 'data generating function' . The data generating function is a mathematical relationship that a theoretician proposes as a surrogate for the processes that generate a time series. It is often the basis of quantitative analysis of the open market in traded stocks (or currencies, commodities, government bonds, tulips, etc.)

The ultimate goal of analyzing any time series is to determine the underlying data generating function. It often occurs that we never know the actual function, but that our proposed function provides a close enough imitation that we can do predictions. These predictions might be about a range of values for future prices, or they may be about risk: the probability that a stock will fall below a particular value.

Long-term dependency is, itself, an indicator of the propensity of the system to go through large, unexpected 'phase-change' type shifts. That is, LTD helps us define risk. The concept was first observed by the civil engineers who watch the flow of rivers and must determine the proper capacity for dams and other flood control measures. Hurst himself was an engineer designing such systems for the Nile in the 1950s.

One does not calculate the actual Hurst exponent of a time series. This is because, as I stated before, we don't know the real data generating function. But, one can estimate the Hurst exponent. There are several formulas for this. Hurst used an estimator called the "rescaled range" or R/S. More recently, researchers working with a synthetic data set (in which we know the DGF) have published a more accurate estimator, the rescaled variance or V/S.

One would think that the wide array of quantitative tools provided in the various computer languages would mean that somebody would have coded up the R/S and V/S estimators in Java, R, or Python. One would be wrong. There is a R/S estimator as part of an importable module in R, but there doesn't seem to be much discussion of its behavior or even validation of the algorithm. And, nobody seems to have extended it to the V/S estimator.

So I wrote one. I created, in Python, a module that calculates both an R/S and a V/S estimator given a time series. This includes a rescaling process that segments the time series and makes a rolling H calculation at several scales, and reports H as the gradient as the result of a log-log regression. I'll spare you the math but you'll have to take my word for the fact that this is how the literature describes the proper way to estimate H.

The Hurst Exponent measures long-range dependency, and varies between 0 an 1. A random, white-noise time series will have no long-term dependency and the H will be around 0.5. Highly dependent streams of data will have H approaching 1.0. Some odd time series are anti-correlated -- very sharp and volatile -- will have an H approaching 0. 

That's what I did in December 2012. Over the past week I downloaded the past three years of data from all 30 DJIA stocks, and explored the value of H. But, before I describe my results, I need to get the non-financial analyst on the same page as financial analysts.

Actual closing prices of a stock like Alcoa or IBM are highly correlated. There's no real surprise here: there is an underlying value in a company and future profits fall within a pretty well-defined and stable range. The Hurst exponent of the prices is well above 0.9, and for many of the DJIA 30 stocks is near 0.99. Most of that is uninteresting.

Stock market quantitative analysts actually focus on 'returns'. These can be calculated in a variety of ways. Assume that we have a price at time, t, called P(t). And, assume we have a time difference d. The return can be defined as R(d) = (P(t + d) - P(t))/P(t). Often this leads to exponential results as d gets larger. For this and other reasons, stock market analysts take the logarithm of R(d). As the log(1) = 0, the above relationship becomes log R = log (P(t+d)/P(t)) = log P(t+d) - log P(t).

That sets it up. In my next post, I'll show the results of H using R/S and V/S for the daily, weekly DJIA averages and the 30 industrials.

Wednesday, January 02, 2013

An Agent-Based Model of Insurgency

Back in 2007, I created a model in NetLogo that reproduces the research presented in "Modeling civil violence: An agent-based computational approach" by Santa Fe researcher Joshua M. Epstein. The full citation is:


Epstein, J. M. “Modeling Civil Violence: An Agent-based Computational Approach.” Proceedings of the National Academy of Sciences 99, no. 90003 (May 7, 2002): 7243–7250. doi:10.1073/pnas.092080199

Below is a screen shot of my model (click to enlarge). It postulates a toroidal 'world' in which there are civilians and cops. The civilians can become active revolutionaries (red) or remain passive (blue). The cops patrol the landscape and arrest the active revolutionaries, one per turn. The plots on the screen show the number of citizens that are active, and the number that are in jail.

In this model I can vary many of the parameters that were discussed in the 2001 article. But, for the purposes of my research, I have also put in a selector to change the activation pattern.I have been exploring two different activation methods. Epstein used random activation: agents were chosen at random to execute their methods: move and act (cops would arrest, citizens would choose whether they would be active). An additional, typical activation scheme is "uniform", in which all agents get one turn, but their sequence is shuffled each turn. These two methods are analogous to sampling with and without replacement, respectively.

My next problem is: how do I characterize the output? Epstein used the 'pulses' of revolt that appear in the model. So, in order to quantify these pulses I needed to build a 'pulse detector'. This is simply a counting algorithm (written in Python) that creates a sequence of data points for each pulse: the time since the last pulse and the height of the pulse.

This, in turn, requires a definition of a pulse. Epstein arbitrarily chose a value of 50. That is, a 'revolt' event occurs in his data when over fifty agents have converted to 'active' status. The revolt (or pulse) ends when that number drops below 50.

I needed something a little less arbitrary, that could be used for a variety of 'revolt' time series across a broad parameter space. Thus, I chose a threshold of one standard deviation. That is, once the number of 'active' agents goes above the average number of actives plus the standard deviation of the number of actives, there would be a 'pulse'. I used the standard deviation of the whole sample, so this required a completion of the run in order to define a pulse. It could not be computed real-time.

A distribution of these inter-arrival times is shown in the first histogram. This is a model run using random activation, and executed for 25,208 'ticks'. The majority of peaks happen after a wait less than 25 ticks. But, in one instance, a gap of over 200 ticks occurred between peaks. I found that this distribution held for all activation methods and for all the parameter changes I instituted.

Additionally, a similar distribution can be found for the maxima of the peaks. The second histogram shows the distribution of peak heights -- the largest number of active citizens in each peak -- for the same large run. Most revolts involved less than 10 individuals, while a very few exceeded 60. The largest was over 80.

These two output values -- inter-arrival average and average peak height -- represent the 'model behavior parameters' that can be further examined. Thus, with the output quantified, I can then proceed to evaluate the impact of activation schemes on these values. 

I also examined what input parameters could be changed to change the averages of the inter-arrival times and the revolt peaks. The candidates were: citizen vision, cop vision, threshold (for activation), maximum jail term, and a constant, k, that is supposed to create a plausible arrest probability. Citizen vision appears to be one input parameter that makes a difference, so I began with this. Citizens will make several decisions based on what they see. They will chose to become active, in part, based on whether there are 'cops' within their vision, and based upon how many other citizens are already active within their vision. Epstein reported results for a citizen vision of 7.0. I found that interesting results occurred as vision is increased. (Note that, in my explorations, 'cop' vision remains fixed at 7.0. I did not find that varying 'cop' vision changed the output very much. People just got swept into jail more efficiently.)

The results -- the change in inter-arrival time and peak height as a function of different citizen visions -- are shown in the next two scatter plots.






There is, to be sure, quite a variance in the outcomes. But, it can be seen that peak height is highly dependent on activation at all levels of citizen vision, and inter-arrival times of revolts seems to be dependent on activation at higher levels of citizen vision. For random activation, the inter-arrival times seem to become chaotic at a citizen vision of 8.6. Runs at this setting (citizen vision = 8.6, random activation) result in an average gap between peaks that can be as low as 17.9 and as high as 70.9.

I'll discuss all this later, but I thought it would be interesting to post the results. I'm also conducting more runs to help better characterize variation in output. (I'll say more about the behavior of the model as the sample size gets larger in a subsequent post.)

One thing appears to be clear from this data, however. The choice of activation makes a significant difference in quantitative model outcome. 






Blogging to Resume

This has always been intended to be a blog in which I discuss my research into the world of complex adaptive systems. I've been spending the past several months reinvigorating my modeling skills, as well as developing my quantitative analysis capabilities (both the skills and the software).

In the coming year, I expect there will be much of interest here. In my next post, I plan to present some results in one of the three modeling areas I am investigating. The next few entries will document:

- The basic precepts of my dissertation research
- Preliminary results for an agent-based model of civil revolt.
- Progress and results of a model of the labor market.
- Progress and results of a model of stock market trading.
- Underlying questions that ABM research can help with.
- The companion statistical techniques that should accompany ABM research and help to understand its output.
- The question of how to build 'valid' agent-based models. (I think there needs to be an article written on standardizing the validation process.)

Monday, November 26, 2012

Reinvigorating My Blog

I last posted on my complexity blog just as I began a challenging new assignment. For the past five years, I've been far too busy to blog or conduct research. I retired in August, and I've been getting back into the swing of reading, researching, and analyzing complex adaptive systems.

As you might tell from my posts from 2007, I believe that agent-based models are the tool of analysis for complex adaptive systems. At some point, I need to write an extended discussion of the role ABMs play in this analysis, and how we can validate the fact that we are using appropriate model designs -- those that match the actual systems we are to analyze. But, for right now, I need to get the discussion started.

The focus of my research is the implementation of agent-based models. These models, in which agents are set up to imitate the self-organizing elements of the system, are a natural byproduct of modern object-oriented computing. Specifically, I'm investigating the choices a model designer must make in the early stages. Sometimes the choices are subliminal, but they have proven to be quite important as the model matures.

Saturday, September 22, 2007

Complexity Demonstrated

My first step in developing a research topic is to recreate one of the classic works of the field. In this case, I need to reproduce the results of an agent-based model of agents playing prisoners dilemma with one another on a two-dimensional grid. The article by Nowak and May was published in 1992:

Nowak, Martin A., and Robert M. May. "Evolutionary games and spatial chaos". Nature 359 (29 October 1992), p. 826-829.

I have successfully re-done their model in NetLogo. Compare my image of interlaced networks of defectors with their figure 1a.

This is the result of setting b = 1.799. The red regions are those agents who chose to defect. Blue represents those agents choosing to cooperate (with other agents), and the green and yellow patches are those agents that are switching sides every turn.

In 92 this was an interesting game theory agent-based model. But, today we only need to change "defect" to "support coaltion" and "cooperate" to "join insurgency" and we have an interesting model of the game theory of insurgency. If we change the payoff for 'defecting', we get very different outcomes. This illustration was taken just before one of those 'tipping points'. Increasing the reward to b = 1.800 will send the model into complete complex behavior. It won't be in chaos -- there will still be patches and strutuctures -- but it certainly won't be stable, either.

Monday, September 03, 2007

My Plans (and Conference List for 2007)

My Way Ahead: Research and Exploration

I'm back from completing preparations for my dissertation. My exams are complete, the coursework is complete, and I' m beginning work on a dissertation proposal.

My topic is validation of agent-based models. In particular, I'm going to examine the impact of different agent-activation schemes on the outcomes (and the subsequent decision recommendations) for agent-based models. I'm surprised that there is not much written on this subject, as ABMs are becoming more and more prominent throughout the world of M&S and complexity theory.

At the beginning, I need to include a survey of the literature. Much of the literature occurs in conjunction with conferences. What I've found is that there are a gigantic number and variety of conferences on the general topic of complex adaptive systems and agent-based models. Many of them have no connections with one another. So, the first step (or, I guess I'm back to step 0.5), is to actually list these conferences. Here is a partial list, just for this fall:

I will attend the European Social Simulation Association's conference in Toulouse next week (Sep 10 - 14, 2007).

There is also the Central and Eastern European Conference on Multi-agent Systems later in Sept 07.

From October 2 - 5 in Dresden, there's the European Conference on Complex Systems.

On October 22-24 there will be the 2007 Engineering Societies in the Agent World (ESAW) conference.

October 28 - Nov 2, in Quincy, MA (USA), there will be the International Conference on Complex Systems (ICCS2007).

From 1-3 November the European Association for Evolutionary Political Economy will meet in Portugal.

On November 12-16, 2007, Agents 2007 will convene at Northwestern Unversity in Evanston (near Chicago), IL. I'll be there.

On Nov 21-23 the Pacific Rim International Workshop on Multi-Agents will be held in Bangkok. (Maybe next year...)

I'll also be at the December 9-12 Winter Simulation Conference, mostly because it's here in Washington, DC.

From Dec 17-19 in Pune, India, the
3rd Indian International Conference on Artificial Intelligence will host a multi-agent systems workshop.

I plan to add many, many more to this list. Watch this space.

Thursday, August 24, 2006

The Third Way. My academic background is in the arcane field of operations research. Often, this is combined with systems engineering. I sometimes find myself perusing the college curricula for this field. (I know, I'm trying to get out more!)

There are normally two "basic" courses in OR. One centers on determininstic methods (linear programming and optimization of nonlinear systems). It teaches methods that really made their appearance in the 1950s and 60s. The second focuses on stochastic methods (Monte Carlo simulations, Markov Processes, and dynamic programming). This field had to wait for improved computer capabilties of the 1960s and 70s before the stochastic decision theory could be widely adopted. Except for improvements in software, the methods have not seen much revision since then.

Let's compare this with the typical physics curriculum. Physics begins with mechanics. Here, we meet Sir Isaac Newton, and learn concepts of force and acceleration. Date of origin: the 60s. The 1660s, that is. This is follwed by electricity and magnetism, much of which was derived and develped in the time of Michael Faraday. E&M's emergence can be dated around 1800. But, physics has a third "basic" course. It's often called 'modern physics' or 'quantum physics'. Here, we learn about the weird and wonderful world of tunneling electrons, wave functions, and Heisenbergian uncertainty. The period of development can probably be placed within a few decades of 1930.

Operations Research will, someday, also have a third course. It will encompass chaos and complexity. It will focus on priniciples of adaptability, non-equilibrium systems control, scale-free distributions, and Bayesian anaysis. The primary tools, the equivalent of LINDO and discrete event simulations, will be agent-based models and genetic algorithms. When the historians look back, they will probably place the "birthday" within a few years of, well, 2006.

Oh brave, new world
That has such people in't!

Wednesday, August 23, 2006

Robust Design. I've done a lot of thinking lately about the concept of robustness. Many decision-makers, when describing key requirements of a new system, insist that it must be robust. The DoD's Office of Force Transformation treats robustness in the military force as the opposite of optimal. That is, the US should be striving to build a military (in the next generation) that can achieve national security goals despite the wide range of unforseen challenges it and we will face.

Now, that's a tall order. For one thing, military affairs are, by their very nature, a complex adaptive system. Your adversary will seek to identify your strengths and weaknesses. He will adapt to them, as you will adapt to his. Robustness in this context means a never-ending evaluation of your emerging systems and your adversary's strategy and capabilities. Like all complex adaptive systems, national security never reaches equilibrium. Thus, the "Force" in the Office of Force transformation's lexicon will never really fit within modern "systems engineering" principles and processes.

If you google "robust design", you will become immersed in Taguchi quality methods. This has been further absorbed in the modern organizational "borg" called "Six Sigma". (That's "borg" as in "resistance is futile, you will be six sigmilated".) Taguchi, in the 1950s, defined robustness in designs as resistance to random changes in the environment. Any reading of the modern Six Sigma doctrine will show that this theme is constant and pervasive. In fact, the very name 'six sigma' assumes that your primary design challenge is random variation.

In case anybody hasn't noticed, terrorists do not strike randomly. In fact, none of our adversaries have attacked us at random. Would that military designs needed only to respond to environmental variation! Thus, when the mavens of defense programmatics call for robustness, they are asking for something completely different.

Measuring robustness of complex adaptive systems is a challenge. If you look at the Santa Fe Institute research area on "robustness" (yes, they have one!), you'll find it devoted mostly to biological robustness. This fits Santa Fe's own predeliction for academic and observational research.

So, my search continues. To be specific, how do we measure--to any degree of confidence--the robustness of a complex adaptive system. And, are there ways to manipulate an existing system to make it more robust? Are there pitfalls in intervention process that will rob robustness from a well-functioning system?