Monthly Archives: April 2013

A Brief Tour of the Trees and Forests

Tree methods such as CART (classification and regression trees) can be used as alternatives to logistic regression. It is a way that can be used to show the probability of being in any hierarchical group. The following is a compilation of many of the key R packages that cover trees and forests.  The goal here… Read More »

Category: Uncategorized

Amazon AWS Summit 2013

I was fortunate enough to have been able to attend the Amazon AWS Summit in NYC and to listen to Werner Vogels give the keynote.  I will share a few of my thoughts on the AWS 2013 Summit and some of my take-aways.  I attended sessions that focused on two products in particular: Redshift and… Read More »

Category: Uncategorized

Simulating the Gambler’s Ruin

The gambler’s ruin problem is one where a player has a probability p of winning  and probability q of losing. For example let’s take a skill game where the player x can beat player y with probability 0.6 by getting closer to a target. The game play begins with player x being allotted 5 points and player y… Read More »

Category: Uncategorized

Finding the Distribution Parameters

This is a brief description on one way to determine the distribution of given data. There are several ways to accomplish this in R especially if one is trying to determine if the data comes from a normal distribution. Rather than focusing on hypothesis testing and determining if a distribution is actually the said distribution… Read More »

Category: Uncategorized

Dirichlet Process, Infinite Mixture Models, and Clustering

The Dirichlet process provides a very interesting approach to understand group assignments and models for clustering effects.   Often time we encounter the k-means approach.  However, it is necessary to have a fixed number of clusters.  Often we encounter situations where we don’t know how many fixed clusters we need.  Suppose we’re trying to identify… Read More »

Category: Uncategorized