Random Sequence of Heads and Tails: For R Users

By | October 10, 2013

Rick Wicklin on the SAS blog made a post today on how to tell if a sequence of coin flips were random.  I figured it was only fair to port the SAS IML code over to R.  Just like Rick Wicklin did in his example this is the Wald-Wolfowitz test for randomness.  I tried to match his code line-for-line.

flips = matrix(c('H','T','T','H','H','H','T','T','T','T','T','T','T','H','H','H','T','H','T','H','H','H','T','H','H','H','T','H','T','H'))

RunsTest = function(flip.seq){
  u = unique(flip.seq) # unique value (should be two)
  
  d = rep(-1, nrow(flip.seq)*ncol(flip.seq)) # recode as vector of -1, +1
  d[flip.seq==u[1]] = 1
  
  n = sum(d > 0) # count +1's
  m = sum(d < 0) # count -1's
  
  dif = c(ifelse(d[1] < 0, 2, -2), diff( sign(d) )) # take the lag and find differences
  
  R = sum(dif==2 | dif==-2) # count up the number of runs
  
  ww.mu = 2*n*m / (n+m) + 1 # get the mean
  ww.var = (ww.mu-1)*(ww.mu-2)/(n+m-1) # get the variance
  sigma = sqrt(ww.var) # standard deviation
  
  # compute test statistics
  if((n+m) > 50){
    Z  = (R-ww.mu) / sigma
  } else if ((R-ww.mu) < 0){
    Z = (R-ww.mu+0.5) / sigma
  } else {
    Z = (R-ww.mu-0.5)/sigma
  }
  
  
  pval = 2*(1-pnorm(abs(Z))) # compute a two-sided p-value 
  
  ret.val = list(Z=Z, p.value=pval)
return(ret.val)
}

runs.test = RunsTest(flips)
runs.test


> runs.test
$Z
[1] -0.1617764

$p.value
[1] 0.8714819

Category: Uncategorized

6 thoughts on “Random Sequence of Heads and Tails: For R Users

  1. Rick Wicklin

    Nice. I often convert programs from R to SAS/IML. The languages are similar enough that most computations port with minimal effort, as you’ve shown here.

    By typing
    runs Test in R
    into a search engine, I found at least two R packages that support the runs test. Searching for “Wald-Wolfowitz test” might give more. Best wishes!

    Reply
  2. Carl Witthoft

    Well… Define “random.” You guys are checking that it fits a binomial, but you failed to use the critical word “fair” to define your coin. Ask any mathematician — he’ll tell you that an unfair coin can do anything; similarly tests with actual coins show a bias because they are not perfectly symmetric (center of mass and all that). So “true random” might require fitting to the expected outcome of a Not_Fair coin.
    (BTW, my captcha was (?)*9 = 72, which based on HHGTTG should be more like 10.3 than 8 🙂 )

    Reply
  3. Owe

    I run into a little problem: After changing ret.val = pval and replicating i get a nearly uniform distribution of p-values. What went wrong (assuming that the sampler is random)

    loops = 1000
    mcRes = rep(0, loops)
    flips= matrix(sample(c(“H”,”T”), 1000*loops, T), ncol=1000)
    for(iter in 1:loops)
    {
    mcRes[iter] = RunsTest(matrix(flips[iter,]))
    }

    summary(mcRes)

    Reply
  4. Owe

    ps: summary(mcRes)
    Min. 1st Qu. Median Mean 3rd Qu. Max.
    0.0007974 0.2688000 0.5321000 0.5145000 0.7565000 0.9999000

    Reply
  5. Dale

    This is screaming out for an application of the “rle” (run length encoding) function. Especially so considering the ‘number of runs’ is explicitly mentioned. From a quick search, it doesn’t look like SAS IML has an “rle” function, which might explain why it wasn’t used in the original code.

    Reply
    1. Carl Witthoft

      Yes! My favorite R-function! So much so that (shameless plug) I published a version which finds linear sequences as well as runs. See cgwtools::seqle at CRAN

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *