# # PRACTICAL 1-5: Likelihood # --------------------------------- # # We have seen in lectures that the MLEs for the parameters of several common distributions # can be found in closed form. However, in general, the problem is not guaranteed to be so # easy with a simple closed-form result. Therefore, in these situations, we will have to find # and maximise the likelihood numerically. # # For this practical, we will focus on the Poisson distribution, which is particularly appropriate # for counts of events occuring randomly in time or space. Suppose we have a simple problem with # one piece of data, $X$, and we know that $X\sim\text{Po}(lambda)$ for some unknown parameter # value lambda. # # You will need the following skills from previous practicals: # * Basic R skills with arithmetic, functions, etc # * Manipulating and creating vectors: `c`, `seq`, # * Calculating data summaries: `mean`, `sum`, `length` # * Plotting with `plot` and `hist`, and adding lines to plots with `lines` and `abline` # * Repeatedly applying a function to a vector using `sapply` # New R techniques: # * Optimising a function with `optim` # * Extracting named elements of a list using the `$` operator # 1. The Poisson Likelihood # ================================================================================== # # # Our likelihood calculations begin with writing down the likelihood function of our data. # For a single observation X=x from a Poisson distribution, the likelihood is the the probability # of observing X=x written as a function of the parameter lambda: # # e^{-lambda} lambda^x # l(lambda) = P[X=x | lambda] = -------------------- # x! # # To begin with, suppose we observe a count of X=5. # Exercise: # ~~~~~~~~~~~~~ # * Write a simple function called `poisson` that takes arguments `lambda` which evaluates the # Poisson probability P[X=5 | lambda]. # _Hint_: You will need the `exp` and `factorial` functions. # * Evaluate the Poisson probability for X=5 when lambda=7 - you should get a value of `0.1277167`. # What we have just created is the likelihood function, l(lambda), for the Poisson parameter # lambda given a single observation of X=5. # # Before we consider a larger data problem, let's briefly explore this simplest case of a Poisson # likelihood. # Exercise: # ~~~~~~~~~~~~~ # * Create a `seq`uence of `length` 200 from `0.1` to `20`. Save this as `lambdavals`. # * Use `sapply` to evaluate the Poisson likelihood function for every value of lambda in # `lambdavals`. Save this as `like5`. # * Draw a `plot` of the Poisson likelihood as a function of lambda. Use `ty='l'` to draw a # line graph. # * By eye, what is the value lambdaHat of lambda which maximises the likelihood? # Typically, we usually work with the log-likelihood L(lambda)=log l(lambda), rather than # the likelihood itself. This can often make the mathematical maximisation simpler, the numerical # computations more stable, and it is also intrinsically connected to the idea of *information* # (which we will consider later). # Exercise: # ~~~~~~~~~~~~~ # * Modify your `poisson` function to compute the logarithm of the Poisson probability, and # call this `logPoisson`. You will need to use the `log` function. # * Evaluate the log likelihood function for all lambda in `lambdavals` and plot the log-likelihood # against lambda. Do you find the same maximum value lambdaHat? # # 2. Likelihood methods for real data # ================================================================================== # # The Poisson distribution has been used by traffic engineers as a model for light traffic. # This assumption is based on the rationale that if the rate is approximately constant and # the traffic is light (so the individual cars move independently of each other), then the # distribution of counts of cars in a given time interval or space area should be nearly # Poisson (Gerlough and Schuhl 1955). # # The following table shows the number of right turns during 300 3-minute intervals at a # specific junction. # # N 0 1 2 3 4 5 6 7 8 9 10 11 12 # Frequency 14 30 36 68 43 43 30 14 10 6 4 1 1 traffic <- rep(0:12, c(14,30,36,68,43,43,30,14,10,6,4,1,1)) # Exercise: # ~~~~~~~~~~~~~ # * Evaluate the piece of code above to create the vector `traffic` corresponding to the above data set. # * Plot a `hist`ogram of the data. Does an assumption of a Poisson distribution appear reasonable? # If we suppose a Poisson distribution might be a good choice of distribution for this # dataset, we still need to find out which particular Poisson distribution is best by # estimating the parameter lambda. # # Let's begin by constructing the likelihood and log-likelihood functions for this data set # of 300 observations. Assuming that the observations $x_1, \dots, x_{300}$ can be treated as # 300 i.i.d. values from $\text{Po}(lambda)$, the log-likelihood function for the entire sample is: # # # L(lambda) = -n lambda + (sum_{i=1}^n x_i ) log(lambda), # # # up to an additive constant from the factorial terms (which we can ignore). # Exercise: # ~~~~~~~~~~~~~ # # * Write a new functions `logLike` with a single parameter `lambda`, which evaluates the # log-likelihood expression above for the `traffic` data and the supplied values of `lambda`. # _Hint_: Use `length` and `sum` to compute the summary statistics. # * Evaluate the data log-likelihood for all lambda in `lambdavals` and plot the log-likelihood # against lambda. Without further calculations, use the plot to have a guess at the maximum # likelihood estimate $lambdaHat$ -- we'll check this later! # 2.1 Maximising the likelihood # ---------------------------------------------------------------------------------- # # The maximum likelihood estimate (MLE), lambdaHat, for the lambda parameter of the Poisson # distribution is the value of lambda which maximises the likelihood # # lambdaHat = argmax_{lambda in Omega} l(lambda) = argmax_{lambda in Omega} L(lambda). # # Therefore to find the MLE, we must maximise l or L. We know we can do this analytically for # certain common distributions, but in general we'll have to optimise the function numerically # instead. For such problems, `R` provides various functions to perform a numerical optimisation # of a give function (even one we define ourself). The function we will focus on is `optim`: # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ # ^ TECHNIQUE: # ^ # ^ `R` offers several optimisation functions, however the one we shall be using today is # ^ `optim` which is one of the most general optimisation functions. This generality comes at # ^ the price of having a lot of optional parameters that need to be specified. # ^ # ^ For our purposes, we're doing a 1-dimensional optimisation of a given function named `logLike`. # ^ So the syntax we will use to maximise this function is as follows # ^ # ^ `optim(start, logLike, lower=lwr, upper=upr, method='Brent', control=list(fnscale=-1))` # ^ # ^ The arguments are: # ^ # ^ * `start` : an initial value for lambda at which to start the optimisation. # ^ *Replace this with a sensible value.* # ^ * `logLike` : this is the function we're optimising. In general, we can replace this with # ^ any other function. # ^ * `lower=lwr, upper=upr` : lower and upper bounds to our search for $\est{lambda}$. # ^ *Replace `lwr` and `upr` should be replaced with appropriate numerical values*; an obvious # ^ choice for this problem is the min and max of our data. # ^ * `method='Brent'` : this selects an appropriate optimisation algorithm for a 1-D optimisation. # ^ * `control=list(fnscale=-1)` : this looks rather strange, but is simply telling `optim` to # ^ *max*imise the given function, rather than *min*imise (which is the default). # ^ # ^ `optim` returns a number of values in the form of a list. The relevant components are: # ^ # ^ * `par` : the optimal value of the parameter # ^ * `value` : the optimum of the function being optimised. # ^ # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ # Exercise: # ~~~~~~~~~~~~~ # * Use `optim` to maximise the `logLike` function. You should choose an appropriate initial # value for lambda, as well as sensible `upper` and `lower` bounds. # * What value of lambdaHat do you obtain? How does this compare to your guess at the MLE? # * We know from lectures that the lambdaHat=Xbar -- evaluate the sample mean of the `traffic` # data. Does this agree with your results from directly maximising the log-likelihood? # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ # ^ TECHNIQUE: # ^ # ^ The output of `optim` is in the form of a `list`. Unlike vectors, `list`s can combine many # ^ different variables together (numbers, strings, matrices, etc) and each element of a list # ^ can be given a name to make it easier to access. # ^ # ^ For example, we can create our own lists using the `list` function: # ^ # ^ ` test <- list(Name='Donald', IQ=-10, Vegetable='Potato')` # ^ # ^ Once created, we can access the named elements of the list using the `$` operator. For example, # ^ # ^ ` test$IQ` # ^ # ^ will return the value saved to `IQ` in the list, i.e. -10. RStudio will show auto-complete # ^ suggestions for the elements of the list after you press `$`, which can help make life easier. # ^ # ^ We can use this to extract the MLE value from `optim` by first saving the results to a variable, # ^ called `results` say, and then extracting the list element named `par`. # ^ # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ # Exercise: # ~~~~~~~~~~~~~ # * Extract the optimal value of lambda from the output of `optim` and save it to a new variable # called `mle`. # * Re-draw your plot of the log-likelihood function, and use `abline` to: # * Add a red solid vertical line at lambda=lambdaHat # * Add a blue dashed vertical line at lambda=xbar. # * Do the results agree with each other, and with your guess at lambdaHat? # Let's return to the original distribution of the data. The results of our log-likelihood analysis # suggest that a Poisson distribution with parameter $lambdaHat$ is the best choice of Poisson # distributions for this data set. # Exercise: # ~~~~~~~~~~~~~ # * Re-draw the `hist`ogram of the `traffic` data. # * For x=0, 1, ..., 16 evaluate the Poisson probability P[X=x] when X ~ Po(lambdaHat). # Save these probabilities as `probs`. # * Multiply `probs` by the sample size to obtain the expected Poisson counts under this # Po(lambdaHat) distribution. # * Use the `lines` function to overlay the expected Po(lambdaHat) counts as a curve on top # of your histogram. Does this look like a good fit to the data? # We will return to the question of fitting distributions to data and the "goodness of fit" later # next term. # 2.2 Information and inference # ---------------------------------------------------------------------------------- # # For an iid sample X1, ..., Xn from a sufficiently nicely-behaved density function f(xi | theta) # with unknown parameter theta, the sampling distribution of the MLE thetaHat is such that # # thetaHat ~~> N(theta, 1/I_n(theta) ), # # as n -> infinity. Whilst we may be used to finding Normal distributions in unusual places, this # result is noteworthy as we never assumed any any particular distribution of our data here. This # result therefore provides us with a general method for making inferences about parameters of # distributions and their MLEs for problems involving large samples! # # In particular, we can apply this to our Poisson data problem above to construct a large-sample # approximate 100(1-alpha)% confidence interval for lambda in the form: # # lambdaHat +/- z*_{alpha/2} 1 /[sqrt(-L''(lambdaHat))], # # where z*_{alpha/2} is the alpha/2 critical value of a standard Normal distribution and we have # estimated the expected Fisher information I_n(theta) by the observed information # I(lambdaHat) = -L''(lambdaHat). # # Exercise: # ~~~~~~~~~~~~~ # # * We can request that `R` computes -1 times the the second derivative at the maximum, # -L''(lambdaHat), as part of the optimisation process. To do this, re-run your optimisation, # but add the additional argument `hessian=TRUE`. # * The output from `optim` will now contain an additional element named `hessian`. Extract this # element, and save it as `obsInfo`. # * Irritatingly, `obsInfo` will be in the form of a (1x1) matrix rather than a scalar. Run the # following code to flatten it into a scalar so we can use it in our confidence interval: # # obsInfo <- c(obsInfo) # Exercise: # ~~~~~~~~~~~~~ # # * Use `qnorm` or your standard Normal tables to find the critical z*_{alpha/2} value for # a 95% confidence interval. Save this as `zcrit`. # * Combine `mle`, `zcrit`, and `obsInfo` to evaluate an approximate 95% confidence interval # for lambda. # * Return (and possibly re-draw) the plot of your log-likelihood function, with the MLE # indicated as a vertical line. Add the limits of the confidence interval to your plot as # vertical green lines. # * Comment on the reliability of your estimate.