Data Scientist

# Discrete Time Markov Chains

## Markov chains

Markov chains, named after Andrey Markov, are mathematical systems that hop from one “state” (a situation or set of values) to another. See here for an excellent introduction.

The second time I used a Markov chain method resulted in a publication (the first was when I simulated Brownian motion with a coin for GCSE coursework). At the time I was completing it as part of the excellent and highly recommended Sysmic course so that it could contribute to a chapter in my thesis and as a journal article.

As I was new to programming, I naively wrote the function myself, to simulate the transition between states of injected insect embryos to model the development of transgenic insects, wherby each injected egg had a change of either living or dying and then, if alive, a chance of incorporating the injected DNA into its genome (simplified explanation).

When using R it is always wiser to find a package that has already been developed and documented, as it is likely to have experienced more testing than anything you would produce as an individual (it’s also quicker). We investigate the markovchain package.

## Discrete Time Markov Chain

We define a discrete time markov chain that matches our statistics derived from my review published this year, we consider the diamond back moth summary statistics, Plutella xylostella (survival and transformation efficiency).

First we create our three different states of our Discrete Time Markov Chain (DTMC). Embryo, injection survivor and transgenic adult (egg, g0, dead). We keep things simple to learn how the process works, if injection survivors are non-transgenic then we discard them (their state changes to dead).

## Transition matrix

We use a “transition matrix” to tally the transition probabilities. Every state in the state space is included once as a row and again as a column, and each cell in the matrix tells you the probability of transitioning from its row’s state to its column’s state. The rows of the transition matrix must total to 1. There also has to be the same number of rows as columns.

Based on our summary statistics in my paper we describe our transition probability between states for Plutella xylostella as:

Compare this matrix to the matrix dtmcA assigned below to elucidate the meaning of the proportions in the matrix.

Eggs operate as the input to the model, we put egg in, we then do injections, 0.87058 of the egg die, 0.12942 survive to produce injection survivors (g0), of these g0, only 1% give rise to transgenics, the non-transgenics are considered dead, or waste insects.

We can use the igraph package to plot the Markov Chain object.

Or there are diagram ways to plot.

## Probabilistic analysis

• It is possible to access transition probabilities and to perform basic operations.
• Similarly, it is possible to access the conditional distribution of states.

We can use a variety of methods to subset S4 objects. Further examples can be found in Hadley Wickham’s Advanced R.

Excitingly it is possible to simulate states’ distributions after n steps, where n is the number of eggs to be injected, our input to the Markov Chain. If we run one step or one discrete time period then our egg are injected and it is determined using matrix multiplication (tmA by initialState)

For the full process of insect transgenesis (injection and setting up wild-type crosses to assess if DNA has been integrated) we need 2 or more steps. What would happen if we specified 10 steps?

If we specified 2 or greater steps the finalState would be identical, as transgenic feeds into itself with 100% chance. This is emphasised using the steadyStates() function and revealed by the summary().

## Estimating a transition matrix from data

The package permits to fit a DTMC estimating the transition matrix from a sequence of data. createSequenceMatrix() returns a function showing previous vs actual states from the pairs in a given sequence.
The markovchainFit() function allows us to obtain the estimated transition matrix and the confidence levels (using elliptic Maximum Likelihood Estimate hyphothesis).

So we can call our estimated transition probability matrix using myFit\$estimate. We could then use it to make predicitons for hypothetical set-ups.

If we took the time to understand what the rain data was about then this might provide additional insight. The key point is we could develop a discrete time Markov Method based on estimates of a transition matrix from the data.

## Waste water pipe deterioration

Here we propose a hypothetical situation where we are responsible for the condition of our sewer waste water pipes in a region. We use a survey to describe the current condition with a four-point scale, inspired by Baik et al., (2006). The ranking goes from Brand New to Bad, A to D.

Correct estimation of transition probabilities in a Markov chain based deterioration model is a key ingredient for successful and cost-effective proactive management of wastewater systems. Imagine our company’s estate (all of its pipes) were allowed to deteriorate for one year with an element transition matrix of:

provided by expert domain knowledge from a consultancy or preferably from the data.

We could then use the transition matrix to model what would happen to our pipes through the next 50 years without any investment. We can set up a data frame to contain labels for each timestep, and a count of how many pipes are in each state at each timestep. Then, we fill that data frame with the results after each timestep i, calculated by initialState * dtmcC ^ i:

A plot may be preferred to visualise the condition of our pipes without investment.

We can calculate the timestep when all pipes break, or a steady state is reached at the absorbing state d by

As all but state d are transitional, we are interested in when all are equal to zero and “d” is equal to the sum of the frequency of initial states, 70.

## Extension

We can use this code to change the various transition probabilities to see what the effects are on the outputs (sensitivity analysis) through visual inspection and consider what transition probabilities would strike the right balance (repair and maintenance allows state reversion, from states c and d back to a). Also, there are methods we could use to perform uncertainty analysis, e.g. putting confidence intervals around the transition probabilities. We won’t do either of these here. This type of decision modelling scenario is perfect for Shiny app implementation!

## References

• Gregory, M., Alphey, L., Morrison, N. I., & Shimeld, S. M. (2016). Insect transformation with piggyBac : getting the number of injections just right. Insect Molecular Biology, http://doi.org/10.1111/imb.12220
• Baik, H., Seok, H., Jeong, D., & Abraham, D. M. (2006). Deterioration Models for Management of Wastewater Systems. Journal of Water Resources Planning and Management, 132(February), 15-24. http://doi.org/10.1061/(ASCE)0733-9496(2006)132:1(15)
• http://www.r-bloggers.com/a-discrete-time-markov-chain-dtmc-sir-model-in-r/