programming, research resources

Understanding Maximum Likelihood Estimation MLE

MLE: Given a set of random samples; and a model with unknown parameters,  find the most likely parameters that can fit the data best.

离散分布,离散有限参数空间

考虑一个抛硬币的例子。假设这个硬币正面跟反面轻重不同。我们把这个硬币抛80次(即,我们获取一个采样x_1=\mbox{H}, x_2=\mbox{T}, \ldots, x_{80}=\mbox{T}并把正面的次数记下来,正面记为H,反面记为T)。并把抛出一个正面的概率记为p,抛出一个反面的概率记为1-p(因此,这裡的p即相当于上边的\theta)。假设我们抛出了49个正面,31个反面,即49次H,31次T。假设这个硬币是我们从一个装了三个硬币的盒子里头取出的。这三个硬币抛出正面的概率分别为p=1/3p=1/2p=2/3.这些硬币没有标记,所以我们无法知道哪个是哪个。使用最大似然估计,通过这些试验数据(即采样数据),我们可以计算出哪个硬币的可能性最大。这个似然函数取以下三个值中的一个:

\begin{matrix}<br /><br /><br />
\mathbb{P}(\mbox{H=49, T=31 }\mid p=1/3) & = & \binom{80}{49}(1/3)^{49}(1-1/3)^{31} \approx 0.000 \\<br /><br /><br />
&&\\<br /><br /><br />
\mathbb{P}(\mbox{H=49, T=31 }\mid p=1/2) & = & \binom{80}{49}(1/2)^{49}(1-1/2)^{31} \approx 0.012 \\<br /><br /><br />
&&\\<br /><br /><br />
\mathbb{P}(\mbox{H=49, T=31 }\mid p=2/3) & = & \binom{80}{49}(2/3)^{49}(1-2/3)^{31} \approx 0.054 \\<br /><br /><br />
\end{matrix}

我们可以看到当\widehat{p}=2/3时,似然函数取得最大值。这就是p的最大似然估计。

离散分布,连续参数空间

现在假设例子1中的盒子中有无数个硬币,对于0\leq p \leq 1中的任何一个p, 都有一个抛出正面概率为p的硬币对应,我们来求其似然函数的最大值:

\begin{matrix}<br /><br /><br />
\mbox{lik}(\theta) & = & f_D(\mbox{H=49,T=80-49}\mid p) = \binom{80}{49} p^{49}(1-p)^{31} \\<br /><br /><br />
\end{matrix}

其中0\leq p \leq 1. 我们可以使用微分法来求最值。方程两边同时对p微分,并使其为零。

\begin{matrix}<br /><br /><br />
0 & = & \frac{d}{dp} \left( \binom{80}{49} p^{49}(1-p)^{31} \right) \\<br /><br /><br />
  &   & \\<br /><br /><br />
  & \propto & 49p^{48}(1-p)^{31} - 31p^{49}(1-p)^{30} \\<br /><br /><br />
  &   & \\<br /><br /><br />
  & = & p^{48}(1-p)^{30}\left[ 49(1-p) - 31p \right] \\<br /><br /><br />
\end{matrix}

在不同比例参数值下一个二项式过程的可能性曲线t = 3, n = 10;其最大似然估计值发生在其众数并在曲线的最大值处。

其解为p=0p=1,以及p=49/80.使可能性最大的解显然是p=49/80(因为p=0p=1这两个解会使可能性为零)。因此我们说最大似然估计值\widehat{p}=49/80.

这个结果很容易一般化。只需要用一个字母t代替49用以表达伯努利试验中的被观察数据(即样本)的“成功”次数,用另一个字母n代表伯努利试验的次数即可。使用完全同样的方法即可以得到最大似然估计值:

\widehat{p}=\frac{t}{n}

对于任何成功次数为t,试验总数为n的伯努利试验。

连续分布,连续参数空间

最常见的连续概率分布正态分布,其概率密度函数如下:

f(x\mid \mu,\sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}

现在有n个正态随机变量的采样点,要求的是一个这样的正态分布,这些采样点分布到这个正态分布可能性最大(也就是概率密度积最大,每个点更靠近中心点),其n个正态随机变量的采样的对应密度函数(假设其独立并服从同一分布)为:

f(x_1,\ldots,x_n \mid \mu,\sigma^2) = \left( \frac{1}{2\pi\sigma^2} \right)^\frac{n}{2} e^{-\frac{ \sum_{i=1}^{n}(x_i-\mu)^2}{2\sigma^2}}

或:

f(x_1,\ldots,x_n \mid \mu,\sigma^2) = \left( \frac{1}{2\pi\sigma^2} \right)^{n/2} \exp\left(-\frac{ \sum_{i=1}^{n}(x_i-\bar{x})^2+n(\bar{x}-\mu)^2}{2\sigma^2}\right),

这个分布有两个参数:\mu,\sigma^2.有人可能会担心两个参数与上边的讨论的例子不同,上边的例子都只是在一个参数上对可能性进行最大化。实际上,在两个参数上的求最大值的方法也差不多:只需要分别把可能性\mbox{lik}(\mu,\sigma) = f(x_1,,\ldots,x_n \mid \mu, \sigma^2)在两个参数上最大化即可。当然这比一个参数麻烦一些,但是一点也不复杂。使用上边例子同样的符号,我们有\theta=(\mu,\sigma^2).

最大化一个似然函数同最大化它的自然对数是等价的。因为自然对数log是一个连续且在似然函数的值域严格递增的上凸函数。[注意:可能性函数(似然函数)的自然对数跟信息熵以及Fisher信息联系紧密。]求对数通常能够一定程度上简化运算,比如在这个例子中可以看到:

\begin{matrix}<br /><br />
0 & = & \frac{\partial}{\partial \mu} \log \left( \left( \frac{1}{2\pi\sigma^2} \right)^\frac{n}{2} e^{-\frac{ \sum_{i=1}^{n}(x_i-\bar{x})^2+n(\bar{x}-\mu)^2}{2\sigma^2}} \right) \\<br /><br />
  & = & \frac{\partial}{\partial \mu} \left( \log\left( \frac{1}{2\pi\sigma^2} \right)^\frac{n}{2} - \frac{ \sum_{i=1}^{n}(x_i-\bar{x})^2+n(\bar{x}-\mu)^2}{2\sigma^2} \right) \\<br /><br />
  & = & 0 - \frac{-2n(\bar{x}-\mu)}{2\sigma^2} \\<br /><br />
\end{matrix}

这个方程的解是\widehat{\mu} = \bar{x} = \sum^{n}_{i=1}x_i/n .这的确是这个函数的最大值,因为它是\mu里头惟一的一阶导数等于零的点并且二阶导数严格小于零。

同理,我们对\sigma求导,并使其为零。

\begin{matrix}<br /><br />
0 & = & \frac{\partial}{\partial \sigma} \log \left( \left( \frac{1}{2\pi\sigma^2} \right)^\frac{n}{2} e^{-\frac{ \sum_{i=1}^{n}(x_i-\bar{x})^2+n(\bar{x}-\mu)^2}{2\sigma^2}} \right) \\<br /><br />
  & = & \frac{\partial}{\partial \sigma} \left( \frac{n}{2}\log\left( \frac{1}{2\pi\sigma^2} \right) - \frac{ \sum_{i=1}^{n}(x_i-\bar{x})^2+n(\bar{x}-\mu)^2}{2\sigma^2} \right) \\<br /><br />
  & = & -\frac{n}{\sigma} + \frac{ \sum_{i=1}^{n}(x_i-\bar{x})^2+n(\bar{x}-\mu)^2}{\sigma^3}<br /><br />
\\<br /><br />
\end{matrix}

这个方程的解是\widehat{\sigma}^2 = \sum_{i=1}^n(x_i-\widehat{\mu})^2/n.

因此,其关于\theta=(\mu,\sigma^2)最大似然估计为:

\widehat{\theta}=(\widehat{\mu},\widehat{\sigma}^2) = (\bar{x},\sum_{i=1}^n(x_i-\bar{x})^2/n).
Standard
programming, research resources

Understanding Likelihood function

This example is to illustrate the likelihood function of a binomial coin toss experiment.

 

 

 

两次投掷都正面朝上时的似然函数

考虑投掷一枚硬币的实验。通常来说,已知投出的硬币正面朝上和反面朝上的概率各自是p_H = 0.5,便可以知道投掷若干次后出现各种结果的可能性。比如说,投两次都是正面朝上的概率是0.25。用条件概率表示,就是:

P(\mbox{HH} \mid p_H = 0.5) = 0.5^2 = 0.25

其中H表示正面朝上。

在统计学中,我们关心的是在已知一系列投掷的结果时,关于硬币投掷时正面朝上的可能性的信息。我们可以建立一个统计模型:假设硬币投出时会有p_H  的概率正面朝上,而有1 - p_H 的概率反面朝上。这时,条件概率可以改写成似然函数:

L(p_H =  0.5 \mid \mbox{HH}) = P(\mbox{HH}\mid p_H = 0.5) =0.25

也就是说,对于取定的似然函数,在观测到两次投掷都是正面朝上时,p_H = 0.5 的似然性是0.25(这并不表示当观测到两次正面朝上时p_H= 0.5 的概率是0.25)。

如果考虑p_H = 0.6,那么似然函数的值也会改变。

L(p_H = 0.6 \mid \mbox{HH}) = P(\mbox{HH}\mid p_H = 0.6) =0.36

三次投掷中头两次正面朝上,第三次反面朝上时的似然函数

注意到似然函数的值变大了。这说明,如果参数p_H 的取值变成0.6的话,结果观测到连续两次正面朝上的概率要比假设p_H = 0.5 时更大。也就是说,参数p_H 取成0.6 要比取成0.5 更有说服力,更为“合理”。总之,似然函数的重要性不是它的具体取值,而是当参数变化时函数到底变小还是变大。对同一个似然函数,如果存在一个参数值,使得它的函数值达到最大的话,那么这个值就是最为“合理”的参数值。

在这个例子中,似然函数实际上等于:

L(p_H = \theta  \mid \mbox{HH}) = P(\mbox{HH}\mid p_H = \theta) =\theta^2 , 其中0 \le p_H  \le 1

如果取p_H = 1,那么似然函数达到最大值1。也就是说,当连续观测到两次正面朝上时,假设硬币投掷时正面朝上的概率为1是最合理的。

类似地,如果观测到的是三次投掷硬币,头两次正面朝上,第三次反面朝上,那么似然函数将会是:

L(p_H = \theta  \mid \mbox{HHT}) = P(\mbox{HHT}\mid p_H = \theta) =\theta^2(1 - \theta) , 其中T表示反面朝上,0 \le p_H  \le 1

这时候,似然函数的最大值将会在p_H = \frac{2}{3}的时候取到。也就是说,当观测到三次投掷中前两次正面朝上而后一次反面朝上时,估计硬币投掷时正面朝上的概率p_H = \frac{2}{3}是最合理的。

Standard
programming, research resources

An R Time Series Tutorial

http://www.stat.pitt.edu/stoffer/tsa2/R_time_series_quick_fix.htm

An R Time Series Tutorial

Here are some examples that may help you become familiar with analyzing time series using R. You can copy-and-paste the R commands (multiple lines are ok) from this page into R. Printed output is blue. I suggest that you have R up and running before you start this tutorial.

Please note that this is not a lesson in time series analysis. Also, the analyses performed on this page are simply demonstrations, they are not meant to be optimal or complete in any way. This is done intentionally so as not to spoil the fun you’ll have working on the problems in the text.

If you’re new to R/Splus, I suggest reading R for Beginners (a pdf file) first. Another good read for exploring time series is Econometrics in R (a pdf file). You may also want to poke around the QuickR website.

◊ Baby steps… your first R session. Get comfortable, then start her up and try some simple addition:

2+2 [1] 5

Ok, now you’re an expert useR. It’s time to move on to time series. What you’ll see in the following examples should be enough to get you through the first four chapters of the text.

Let’s play with the Johnson & Johnson data set. Download jj.dat to a directory called mydata (or wherever you choose … the examples below and in the text assume the data are in that directory).

jj = scan("/mydata/jj.dat")        # read the data
jj <- scan("/mydata/jj.dat")       # read the data another way
scan("/mydata/jj.dat") -> jj       # and another

The R people (yes, they exist) prefer that you use the second [<-] or third [->] assignment operator, but your wrists and health care professionals prefer that you use the simpler first [=] method if you can.

Next, print jj (to the screen)

jj  [1] 0.71 0.63 0.85 0.44 [5] 0.61 0.69 0.92 0.55 . . . . . . . . . . [77] 14.04 12.96 14.85 9.99 [81] 16.20 14.67 16.02 11.61

and you see that jj is a collection of 84 numbers called an object. You can see all of your objects by typing

objects()

If you’re a Matlab (or similar) user, you may think jj is an 84 × 1 vector, but it’s not. It has order and length, but no dimensions (no rows, no columns). R call these objects vectors so you have to be careful. In R, matrices have dimensions but vectors do not. To wit:

jj[1]       # the first element
 [1] 0.71
jj[84]      # the last element
 [1] 11.61
jj[1:4]     # the first 4 elements
 [1] 0.71 0.63 0.85 0.44
jj[-(1:80)] # everything EXCEPT the first 80 elements
 [1] 16.20 14.67 16.02 11.61
length(jj)  # the number of elements 
 [1] 84
dim(jj)     # but no dimensions ...
 NULL
nrow(jj)    # ... no rows
 NULL
ncol(jj)    # ... and no columns
 NULL
#-- if you want it to be a column vector (in R, a matrix), an easy way to go is:
jj = as.matrix(jj)
dim(jj)   
 [1] 84 1 

Now, let’s make jj a time series object.

jj = ts(jj, start=1960, frequency=4)

Note that the data are quarterly earnings, hence the frequency=4 statement. One nice thing about R is you can do a bunch of stuff (technical term) in one line. For example, you can read the data into jj and make it a time series object at the same time:

jj = ts(scan("/mydata/jj.dat"), start=1960, frequency=4)

In the lines above, you can replace scan by read.table. Inputting data using read.table is an easy way to read a data file that is laid out as a matrix and may have headers (column descriptions). At this point, you might want to find out about read.table, data frames, and time series objects:

jj = ts(read.table("/mydata/jj.dat"), start=1960, frequency=4)  
help(read.table)
help(ts)
help(data.frame)

There is a difference between scan and read.table. The former produces a vector (no dimensions) while the latter produces a data frame (and has dimensions).

One final note on reading the data. If the data started on the third quarter of 1960, say, then you would have something like ts(x, start=c(1960,3), frequency=4) and so on. If you had monthly data that started from June, 1984, then you would have ts(x, start=c(1984,6), frequency=12).

Let’s view the data again as a time series object:

jj                Qtr1 Qtr2 Qtr3 Qtr4 1960 0.71 0.63 0.85 0.44 1961 0.61 0.69 0.92 0.55 . . . . . . . . . . 1979 14.04 12.96 14.85 9.99 1980 16.20 14.67 16.02 11.61

Notice the difference? You also get some nice things with the ts object, for example, the corresponding time values:

time(jj)    Qtr1 Qtr2 Qtr3 Qtr4 1960 1960.00 1960.25 1960.50 1960.75 1961 1961.00 1961.25 1961.50 1961.75 . . . . . . . . . . . . 1979 1979.00 1979.25 1979.50 1979.75 1980 1980.00 1980.25 1980.50 1980.75

By the way, you could have put the data into jj and printed it at the same time by enclosing the command:

(jj = ts(scan("/mydata/jj.dat"), start=1960, frequency=4))

Now try a plot of the data:

plot(jj, ylab="Earnings per Share", main="J & J")

with the result being:

jj

Try these and see what happens:

plot(jj, type="o", col="blue", lty="dashed")
plot(diff(log(jj)), main="logged and diffed")

and while you’re here, check out plot.ts and ts.plot:

x = -5:5                  # sequence of integers from -5 to 5
y = 5*cos(x)              # guess
par(mfrow=c(3,2))         # multifigure setup: 3 rows, 2 cols
#---  plot:
plot(x, main="plot(x)")
plot(x, y, main="plot(x,y)")
#---  plot.ts:
plot.ts(x, main="plot.ts(x)")
plot.ts(x, y, main="plot.ts(x,y)")
#---  ts.plot:
ts.plot(x, main="ts.plot(x)")
ts.plot(ts(x), ts(y), col=1:2, main="ts.plot(x,y)")  # note- x and y are ts objects 
#---  the help files [? and help() are the same]:
?plot.ts
help(ts.plot)
?par        # might as well skim the graphical parameters help file while you're here
plots

Note that if your data are a time series object, plot() will do the trick (for a simple time plot, that is). Otherwise,plot.ts() will coerce the graphic into a time plot.

How about filtering/smoothing the Johnson & Johnson series using a two-sided moving average? Let’s try this:
fjj(t) = ⅛ jj(t-2) + ¼ jj(t-1) + ¼ jj(t) + ¼ jj(t+1) + ⅛ jj(t+2)
and we’ll add a lowess fit for fun.

k = c(.5,1,1,1,.5)            # k is the vector of weights
(k = k/sum(k))       
  [1] 0.125 0.250 0.250 0.250 0.125
fjj = filter(jj, sides=2, k)  # ?filter for help [but you knew that already]
plot(jj)
lines(fjj, col="red")         # adds a line to the existing plot
lines(lowess(jj), col="blue", lty="dashed")

… and the result:

filter

Let’s difference the logged data and call it dljj. Then we’ll play with dljj:

dljj = diff(log(jj))        # difference the logged data
plot(dljj)                  # plot it if you haven't already
shapiro.test(dljj)          # test for normality 
 Shapiro-Wilk normality test data: dljj W = 0.9725, p-value = 0.07211

Now a histogram and a Q-Q plot, one on top of the other:

par(mfrow=c(2,1))        # set up the graphics 
hist(dljj, prob=TRUE, 12)   # histogram 
lines(density(dljj))     # smooth it - ?density for details 
qqnorm(dljj)             # normal Q-Q plot 
qqline(dljj)             # add a line 

and the results:

qq

Let’s check out the correlation structure of dljj using various techniques. First, we’ll look at a grid of scatterplots ofdljj(t-lag) vs dljj(t) for lag=1,2,...,9.

lag.plot(dljj, 9, do.lines=FALSE)  # why the do.lines=FALSE? ... try leaving it out 

Notice the large positive correlation at lags 4 and 8 and the negative correlations at a few other lags:

lag plot

Now let’s take a look at the ACF and PACF of dljj:

par(mfrow=c(2,1)) # The power of accurate observation is commonly called cynicism 
                  # by those who have not got it. - George Bernard Shaw
acf(dljj, 20)     # ACF to lag 20 - no graph shown... keep reading
pacf(dljj, 20)    # PACF to lag 20 - no graph shown... keep reading
# !!NOTE!! acf2 on the line below is NOT available in R... details follow the graph below
acf2(dljj)        # this is what you'll see below
acf and pacf

Note that the LAG axis is in terms of frequency, so 1,2,3,4,5 correspond to lags 4,8,12,16,20 because frequency=4here. If you don’t like this type of labeling, you can replace dljj in any of the above by ts(dljj, freq=1); e.g.,acf(ts(dljj, freq=1), 20)◊ Ok- here’s the story on acf2. I like my ACF and PACF a certain way, which is not the default R way. So, I wrote a little script called acf2.R that you can read about and obtain here: Examples (there are other goodies there). 

Moving on, let’s try a structural decomposition of log(jj) = trend + season + error using lowess. Note, this example works only if jj is dimensionless (i.e., you didn’t read it in using read.table … thanks to Jon Moore of the University of Reading, U.K., for pointing this out.)

plot(dog <- stl(log(jj), "per"))

Here’s what you get:

stl plot

If you want to inspect the residuals, for example, they’re in dog$time.series[,3], the third column of the resulting series (the seasonal and trend components are in columns 1 and 2). Check out the ACF of the residuals,acf(dog$time.series[,3]); the residuals aren’t white- not even close. You can do a little (very little) better using a local seasonal window, plot(dog <- stl(log(jj), s.win=4)), as opposed to the global one used by specifying"per". Type ?stl for details. There’s also something called StructTS that will fit parametric structural models. We don’t use these functions in the text when we present structural modeling in Chapter 6 because we prefer to use our own programs.◊ This is a good time to explain $. In the above, dog is an object containing a bunch of things (technical term). If you type dog, you’ll see the components, and if you type summary(dog) you’ll get a little summary of the results. One of the components of dog is time.series, which contains the resulting series (seasonal, trend, remainder). To see this component of the object dog, you type dog$time.series (and you’ll see 3 series, the last of which contains the residuals). And that’s the story of $ … you’ll see more examples as we move along. 

And now, we’ll do some of Problem 2.1. We’re going to fit the regression
 log(jj)= β*time + α1*Q1 + α2*Q2 + α3*Q3 + α4*Q4 + ε
where Qi is an indicator of the quarter i = 1,2,3,4. Then we’ll inspect the residuals.

Q = factor(rep(1:4,21))     # make (Q)uarter factors [that's repeat 1,2,3,4, 21 times]
trend = time(jj)-1970   # not necessary to "center" time, but the results look nicer
reg = lm(log(jj)~0+trend+Q, na.action=NULL)  # run the regression without an intercept
#-- the na.action statement is to retain time series attributes
summary(reg) Call: lm(formula = log(jj) ~ 0 + trend + Q, na.action = NULL) Residuals: Min 1Q Median 3Q Max -0.29318 -0.09062 -0.01180 0.08460 0.27644 Coefficients: Estimate Std. Error t value Pr(>|t|) trend 0.167172 0.002259 74.00 <2e-16 *** Q1 1.052793 0.027359 38.48 <2e-16 *** Q2 1.080916 0.027365 39.50 <2e-16 *** Q3 1.151024 0.027383 42.03 <2e-16 *** Q4 0.882266 0.027412 32.19 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.1254 on 79 degrees of freedom Multiple R-squared: 0.9935, Adjusted R-squared: 0.9931 F-statistic: 2407 on 5 and 79 DF, p-value: < 2.2e-16 

You can view the model matrix (with the dummy variables) this way:

model.matrix(reg) trend Q1 Q2 Q3 Q4 (remember trend is time centered at 1970) 1 -10.00 1 0 0 0 2 -9.75 0 1 0 0 3 -9.50 0 0 1 0 4 -9.25 0 0 0 1 5 -9.00 1 0 0 0 6 -8.75 0 1 0 0 7 -8.50 0 0 1 0 8 -8.25 0 0 0 1 . . . . . . . . . . . . 81 10.00 1 0 0 0 82 10.25 0 1 0 0 83 10.50 0 0 1 0 84 10.75 0 0 0 1

Now check out what happened. Look at a plot of the observations and their fitted values:

plot(log(jj), type="o")   # the data in black with little dots 
lines(fitted(reg), col=2) # the fitted values in bloody red - or use lines(reg$fitted, col=2)

you get:

fit plot

… and a plot of the residuals and the ACF of the residuals:

par(mfrow=c(2,1))
plot(resid(reg))      # residuals - reg$resid is same as resid(reg) 
acf(resid(reg),20)    # acf of the resids

and you get:

res plot

Do those residuals look white? [Ignore the 0-lag correlation, it’s always 1.]

You have to be careful when you regress one time series on lagged components of another using lm(). There is a package called dynlm that makes it easy to fit lagged regressions, and I’ll discuss that right after this example. If you use lm(), then what you have to do is “tie” the series together using ts.intersect. If you don’t tie the series together, they won’t be aligned properly. Here’s an example regressing weekly cardiovascular mortality (cmort.dat) on particulate pollution (part.dat) at the present value and lagged four weeks (about a month). For details about the data set, see Chapter 2.

mort = ts(scan("/mydata/cmort.dat"),start=1970, frequency=52)  # make these time series objects
  Read 508 items                        
part = ts(scan("/mydata/part.dat"),start=1970, frequency=52)    
  Read 508 items
ded = ts.intersect(mort,part,part4=lag(part,-4), dframe=TRUE) # tie them together in a data frame
fit = lm(mort~part+part4, data=ded, na.action=NULL)         # now the regression will work
summary(fit)                            
  Call: lm(formula = mort ~ part + part4, data = ded, na.action = NULL) Residuals: Min 1Q Median 3Q Max -22.7429 -5.3677 -0.4136 5.2694 37.8539 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 69.01020 1.37498 50.190 < 2e-16 *** part 0.15140 0.02898 5.225 2.56e-07 *** part4 0.26297 0.02899 9.071 < 2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 8.323 on 501 degrees of freedom Multiple R-Squared: 0.3091, Adjusted R-squared: 0.3063 F-statistic: 112.1 on 2 and 501 DF, p-value: < 2.2e-16 

Note: There was no need to rename lag(part,-4) to part4, it’s just an example of what you can do.

An alternative to the above is the package dynlm, which has to be installed [for details, in R type help(INSTALL) orhelp("install.packages") ]. After the package is installed, you can do the previous example as follows:

library(dynlm)                         # load the package
fit = dynlm(mort~part + lag(part,-4))  # assumes mort and part are ts objects
# fit = dynlm(mort~part + L(part,4)) is the same thing.
summary(fit)
  Call: dynlm(formula = mort ~ part + lag(part, -4)) Residuals: Min 1Q Median 3Q Max -22.7429 -5.3677 -0.4136 5.2694 37.8539 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 69.01020 1.37498 50.190 < 2e-16 *** part 0.15140 0.02898 5.225 2.56e-07 *** lag(part, -4) 0.26297 0.02899 9.071 < 2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 8.323 on 501 degrees of freedom Multiple R-Squared: 0.3091, Adjusted R-squared: 0.3063 F-statistic: 112.1 on 2 and 501 DF, p-value: < 2.2e-16 

Well, it’s time to simulate. The workhorse for ARIMA simulations is arima.sim(). Here are some examples; no output is shown here so you’re on your own.

# some AR1s 
x1 = arima.sim(list(order=c(1,0,0), ar=.9), n=100) 
x2 = arima.sim(list(order=c(1,0,0), ar=-.9), n=100)
par(mfrow=c(2,1))
plot(x1, main=(expression(AR(1)~~~phi==+.9)))      # ~ is a space and == is equal  
plot(x2, main=(expression(AR(1)~~~phi==-.9)))
x11()                                         # open another graphics device if you wish
par(mfcol=c(2,2))
acf(x1, 20)
acf(x2, 20)
pacf(x1, 20)
pacf(x2, 20)
# you could have, for example, used acf2(x1)  
# to get the ACF and PACF of x1 (or x2)... if you had acf2.R, of course.

# an MA1  
x = arima.sim(list(order=c(0,0,1), ma=.8), n=100)
par(mfcol=c(3,1))
plot(x, main=(expression(MA(1)~~~theta==.8)))
acf(x,20)
pacf(x,20)

# an AR2 
x = arima.sim(list(order=c(2,0,0), ar=c(1,-.9)), n=100) 
par(mfcol=c(3,1))
plot(x, main=(expression(AR(2)~~~phi[1]==1~~~phi[2]==-.9)))
acf(x, 20)
pacf(x, 20)

# an ARIMA(1,1,1) 
x = arima.sim(list(order=c(1,1,1), ar=.9, ma=-.5), n=200)
par(mfcol=c(3,1))
plot(x, main=(expression(ARIMA(1,1,1)~~~phi==.9~~~theta==-.5)))
acf(x, 30)  # the process is not stationary, so there is no population [P]ACF ... 
pacf(x, 30) # but look at the sample values to see how they differ from the examples above

◊ Next, we’re going to do some ARIMA estimation. This gets a bit tricky because R is not useR friendly when it comes to fitting ARIMA models. Much of the story is spelled out in our R Issues page. I’ll be as gentle as I can at first.

First, we’ll fit an ARMA model to some simulated data (with diagnostics and forecasting):

x = arima.sim(list(order=c(1,0,1), ar=.9, ma=-.5), n=100) # simulate some data
(x.fit = arima(x, order = c(1, 0, 1)))   # fit the model and print the results
 Call: arima(x = x, order = c(1, 0, 1)) Coefficients: ar1 ma1 intercept <-- NOT the intercept - see R Issue 1 0.8465 -0.5021 0.5006 s.e. 0.0837 0.1356 0.3150 sigma^2 estimated as 1.027: log likelihood = -143.44, aic = 294.89

… diagnostics:

tsdiag(x.fit, gof.lag=20) # you know the routine- ?tsdiag for details

… and the output

diag plot

… forecast 10 ahead:

x.fore = predict(x.fit, n.ahead=10)  
# plot the forecasts
U = x.fore$pred + 2*x.fore$se
L = x.fore$pred - 2*x.fore$se
minx=min(x,L)
maxx=max(x,U)
ts.plot(x,x.fore$pred,col=1:2, ylim=c(minx,maxx))
lines(U, col="blue", lty="dashed")
lines(L, col="blue", lty="dashed")

… and here’s the plot of the data and the forecasts (with error bounds):

fore plot

That wasn’t too bad… but hold on. We’re going to work with the global temperature data from Chapter 3. The data are in the file globtemp2.dat. There are three columns in the file, the second column has the yearly global temperature deviations from 1880 to 2004. If you download the data to the mydata directory, you can pull out the global temperatures like this:

u = read.table("/mydata/globtemp2.dat")    # read the data
gtemp = ts(u[,2], start=1880, freq=1)      # yearly temp in col 2
plot(gtemp)                                # graph the series (not shown here) 

… long story short, the data appear to be an ARIMA(1,1,1) with a drift of about +.6 oC per century (and hence the global warming hypothesis). Let’s fit the model:

arima(gtemp, order=c(1,1,1)) Coefficients: ar1 ma1 0.2545 -0.7742 s.e. 0.1141 0.0651 

So what’s wrong? …. well, there’s no estimate of the drift!! With no drift, the global warming hypothesis is kaput (technical term)… that is, the temps are just basically taking a random walk. How do you get the estimate of drift?… do this:

arima(diff(gtemp), order=c(1,0,1))  # diff the data and fit an arma to the diffed data Coefficients: ar1 ma1 intercept 0.2695 -0.8180 0.0061 s.e. 0.1122 0.0624 0.0030

What happened? The two runs should have given the same results, but the default models for the two cases are different. I won’t go into detail here because the details can be found on the R Issues page. And, of course, this problem continues if you try to do forecasting. There are remedies. One remedy is to do the following:

drift = 1:length(gtemp)
arima(gtemp, order=c(1,1,1), xreg=drift) Coefficients: ar1 ma1 drift 0.2695 -0.8180 0.0061 s.e. 0.1122 0.0624 0.0030

and then make sure you continue the along these lines when you forecast. Another remedy is to use the scripts called sarima.R for model fitting, and sarima.for.R for forecasting. You can get those scripts with some details on this page: Examples.

◊ You may not have understood all the details of this example, but at least you should realize that you may get into trouble when fitting ARIMA models with R. In particular, you should come away from this realizing that, in R,arima(x, order=c(1,1,1)) is different than arima(diff(x), order=c(1,0,1)) and arima calls the estimate of the mean the intercept. Again, much of the story is spelled out in our R Issues page. 

And now for some regression with autocorrelated errors. This can be accomplished two different ways. First, we’ll use gls() from the package nlme, which you have to load. We’re going to fit the model Mt = α + βt + γPt + et where Mt and Pt are the mortality and particulates series from a previous example, and et is autocorrelated error.

library(nlme)            # load the package
trend = time(mort)       # assumes mort and part are there from previous examples
fit.lm = lm(mort~trend + part)  # ols
acf(resid(fit.lm))       # check acf and pacf of the resids     
pacf(resid(fit.lm))      # or use acf2(resid(fit.lm)) if you have acf2
# resids appear to be AR(2) ... now use gls() from nlme:
fit.gls = gls(mort~trend + part, correlation=corARMA(p=2), method="ML")
# take 5 ........................................ #................................................ #................................................ #................................................ # done: 
summary(fit.gls)  Parameter estimate(s): Phi1 Phi2 0.3980566 0.4134305 Coefficients: Value Std.Error t-value p-value (Intercept) 3131.5452 857.2141 3.653166 3e-04 trend -1.5444 0.4340 -3.558021 4e-04 part 0.1503 0.0210 7.162408 0e+00 

# resid analysis- we assumed et = φ1 et-1 + φ2 et-2 + wt where wt is white.
w = filter(resid(fit.gls), filter=c(1,-.3980566, -.4134305), sides=1)  # get resids   
w = w[-2:-1]                   # first two are NA                                         
Box.test(w, 12, type="Ljung")  # check whiteness via Ljung-Box-Pierce statistic                 
  X-squared = 8.6074, df = 12, p-value = 0.736                            
pchisq(8.6074, 10, lower=FALSE)    # the p-value (they are resids from an ar2 fit)   
  [1] 0.569723

Now, we’ll doing the same thing using arima(), which is easier and a little quicker.

(fit2.gls = arima(mort, order=c(2,0,0), xreg=cbind(trend, part))) Coefficients: ar1 ar2 intercept trend part 0.3980 0.4135 3132.7085 -1.5449 0.1503 s.e. 0.0405 0.0404 854.6662 0.4328 0.0211 sigma^2 estimated as 28.99: log likelihood = -1576.56, aic = 3165.13 

Box.test(resid(fit2.gls), 12, type="Ljung")   # and so on ... 

◊ ARMAX: If you want to fit an ARMAX model you have to do it via a state space model… more details will follow on the Chapter 6 page when I have the time. As seen above, using xreg in arima() does NOT fit an ARMAX model, which is too bad, but the help file (?arima) didn’t say it did. For more info, head on over to the R Issues page and check out Issue 2. 

Finally, a spectral analysis quicky:

x = arima.sim(list(order=c(2,0,0), ar=c(1,-.9)), n=2^8) # some data
 (u = polyroot(c(1,-1,.9)))            # x is AR(2) w/complex roots                          
  [1] 0.5555556+0.8958064i 0.5555556-0.8958064i
 Arg(u[1])/(2*pi)                      # dominant frequency around .16:
  [1] 0.1616497 
par(mfcol=c(3,1))
plot.ts(x)                             
spec.pgram(x, spans=c(3,3), log="no")  # nonparametric spectral estimate; also see spectrum()
?spec.pgram                            # some help 'spec.pgram' calculates the periodogram using a fast Fourier transform, and optionally smooths the result with a series of modified Daniell smoothers (moving averages giving half weight to the end values).
spec.ar(x, log="no")                   # parametric spectral estimate

and the graph:

spec plot

Finally, note that R tapers and logs by default, so if you simply want the periodogram of a series, the command isspec.pgram(x, taper=0, fast=FALSE, detrend=FALSE, log="no"). If you just asked for spec.pgram(x), you wouldn’t get the RAW periodogram because the data are detrended, possibly padded, and tapered in spec.pgram, even though the title of the resulting graphic would say Raw Periodogram. An easier way to get a raw periodogram is this: per=abs(fft(x))^2 …. duh. The moral of this story … and the bottom line: pay special attention to the defaults of the functions you’re using.

Standard
tech

互联网女皇D10演讲, 2012

华尔街著名证券分析师和投资银行家、有“网络女皇”之称的玛丽·米克尔(Mary Meeker)今天在D10数字大会上发布了其著名的互联网趋势报告。Mitbbs.com

以下为其演讲环节要点:(玛丽·米克尔2012互联网趋势报告PPT)Mitbbs.com

——互联网增长仍旧表现强劲;移动设备使用率迅速增长,而且仍处于早期阶段;Mitbbs.com

http://www.mitbbs.com/ad_file/M.1338548288_2.00.gif

 

——2011年全球互联网用户总数为23亿人;新兴市场正在推动互联网用户人数增长;Mitbbs.com

——全球移动3G用户人数已达11亿人,仍在实现非常迅速的增长;Mitbbs.com

——iPad用户人数的增长速度超过iPhone,而后者的增长速度则快于iPod;Mitbbs.com

——Android设备的用户人数增长速度超过iPhone;Mitbbs.com

——29%的美国人拥有平板电脑或是电子书阅读器;Mitbbs.com

——全球移动互联网流量迅速增长,在互联网总流量中所占比例升至10%,比2008年底增长2%。美国市场上,移动电子商务在总电子商务市场上所占比重为8%。移动互联网业务的商业化进程正在改善,但大多数来自于应用而非广告;Mitbbs.com

——互联网广告支出与用户在互联网上所花费的时间大致对等,但移动互联网广告支出仍旧落后于用户在移动互联网上所花费的时间。广告收入仍主要来自于桌面互联网;Mitbbs.com

——印度市场上5月份的移动互联网使用量首次超越桌面互联网使用量;大多数市场上都将出现这种趋势;移动互联网的CPM远低于桌面互联网;Mitbbs.com

——1995年全球互联网收入为5500万美元,现在则为730亿美元;Mitbbs.com

——所有一切都在被重新构想:生活方式、新闻和信息流、做笔记、画画、照相、剪贴簿、杂志、音乐、视频创造和制作、人才的商业化、家庭娱乐、电视、导航和实时交通、体育信息、家庭装修、打电话叫出租车、团购、现金出纳机、个人服务、想法构建、个人借款 、招募和招聘、焦点讨论、签名、学习、回报和满意度、通信、恒温控制器。诸如此类;Mitbbs.com

——按市值计算,全球市场拥有36万亿美元的市场机会;Mitbbs.com

——经济趋势表现不一,带有负面的倾向。在股票市场上,10周走势表现不佳。消费者信心接近于4年高点,但仍远低于30年平均值。49%的美国人认为经济状况疲弱,好于一年以前的52%,但仍旧是个糟糕的数字。预计欧洲经济将陷入衰退;Mitbbs.com

——在技术方面,有很多东西让人激动,如iOS、Android和Windows Mobile等;Mitbbs.com

——美国应得权益项目和利息支出将在15年时间里超过收入;Mitbbs.com

——互联网泡沫已经产生?最近以来的IPO表现并不抢眼,许多股票的价格都接近于最初的IPO定价区间。与私募市场投资者相比,公开市场投资者更加持有怀疑态度。职业社交网站LinkedIn表现良好,2011年5月份以每股45美元的IPO定价筹集了3.53亿美元资金,去年11 月份股价达到71美元。Mitbbs.com

报告显示,去年第四季度全球移动3G用户总数已达11亿人,比上年同期增长37%,但渗透率仅为18%;与此相比,第四季度全球互联网用户总数为23亿人,比上年同期仅增长8%。Mitbbs.com

 

 

KPCB合伙人玛丽·米克在D10大会上演讲Mitbbs.com

以下为其演讲内容与现场幻灯:Mitbbs.com

 

 

图一:全球3G用户数已达11亿,同比增长37%,3G渗透率18%。Mitbbs.com

智能设备新使用率的增长速度超出以往任何时候,其中iPad和Android设备的增长曲线远比iPhone的增长曲线呈现出急剧增长的趋势。但未来仍有很长一段路要走,原因是全球智能手机使用量仅为9.53亿部,而手机使用量则为61亿部。Mitbbs.com

 

 

图二:虽然到目前为止已实现庞大增长,但智能手机用户使用率仍有很大上升空间。Mitbbs.com

在今年5月份,移动网络使用量在互联网总流量中所占比重达到了10%,而去年同期仅为5%左右。Mitbbs.com

 

 

图三:全球移动流量迅速增长,在互联网总流量中所占比例升至10%。Mitbbs.com

在移动网络的商业化问题上,目前已不仅限于广告。美国市场上,移动电子商务在总电子商务市场上所占比重为8%。应用以及应用内付款在移动网络商业化总收入中所占比例已高达71%,而移动广告则仅占29%。米克尔认为,移动广告拥有“巨大的上升空间”,原因是从目前看来,移动广告收入百分比与总媒体消费百分比脱轨。Mitbbs.com

 

 

图四:与移动网络使用量相比,移动广告支出大幅增长。Mitbbs.com

与此同时,移动互联网使用量正在取代桌面互联网使用量。本月,印度市场上的总移动互联网使用量首次超越桌面互联网使用量。Mitbbs.com

 

 

图五:移动互联网使用量迅速增长,在2012年5月份的印度市场上超越高度商业化的桌面互联网使用量。Mitbbs.com

许多领域中都有证据能表明移动互联网商业化与桌面互联网商业化之间的差距。在美国市场上,有效的桌面CPM(每千人成本)为3.50美元,而移动互联网的CPM仅为0.75美元,相当于后者的五倍。流媒体音乐服务提供商Pandora、腾讯和社交游戏公司Zynga报告的数据都表明,其移动业务的每用户平均收入最高要低五倍。谷歌 和Facebook的业绩结果表明,移动业务正在限制其营收增长。Mitbbs.com

米克尔并未提供解决这些问题的方案,但她带来了一线曙光。举例来说,在更加成熟一些的日本市场上,移动游戏厂商GREE的每用户平均收入取得了迅速的增长,在2012年初的每年每用户平均收入达到了24美元。另一家日本移动游戏厂商CyberAgent也取得了类似的增长 曲线,其移动业务用户的每付费用户平均收入现在已经上升至418美元,超出桌面业务的表现。Mitbbs.com

 

 

图六:日本移动游戏厂商GREE数据表明,移动ARPU(每用户平均收入)可迅速增长。Mitbbs.com

 

 

图七:日本移动游戏厂商CyberAgent数据表明,移动ARPU(每用户平均收入)应可超越桌面ARPU。Mitbbs.com

基于来自日本市场的数据,米克尔预计称,美国市场上的移动互联网商业化水平可能会在一到三年时间里超过桌面互联网业务。她认为,这种趋势是无可避免的,但“只是需要时间”。Mitbbs.com

米克尔随后现场接受了华尔街日报两位知名记者的访谈。Mitbbs.com

莫斯伯格称,他对广告为移动设备体验作出的贡献之少感到惊讶。Mitbbs.com

米克尔表示,现在移动广告还处于发展的早期阶段,移动设备的显示屏还比较小。她指出,本地化广告和社交广告还在发展初期,未来将有更大发展,问题只在于是未来几个月还是几年。Mitbbs.com

米克尔还补充称,来自日本的数据表明,移动用户是可以被商业化的。过渡期大概将在三到六个月时间内开始,然后六到十二个月左右走出过渡期。她表示,公司正处于商业化的最早期阶段,而商业化趋势正开始以非常有力的方式上行。对公司和商业化活动来说,整体 市场状况非常健康。在私募市场上,估值处于较高水平。上一季度中,米克尔麾下基金并未从事任何交易。Mitbbs.com

米克尔看好社交地图和导航服务应用Waze,称其速度很快、简便、有趣而且有用,用户都喜欢使用这个应用。Mitbbs.com

在估值过高的问题上,米克尔表示,公开市场正在给那些好到令人难以置信的互联网公司泼冷水,这一事实表明,如果你是一名私募市场的投资者,那么就需要一项退出计划,需要知道自己的计划是什么;如果你计划上市,那么就需要能够证明自己的估值是合理的。有 一种情况是,有些私人公司以极高的估值筹集到了资金,然后对这些公司来说,想要上市和证明这种估值是合理的就变得非常困难。Mitbbs.com

如何才能纠正这种情况呢?如果估值在私募市场上持续上升,那么私募市场投资者就会亏钱。Mitbbs.com

以Facebook IPO(首次公开招股)为例,这家公司在二级市场上的估值达到了1040亿美元。从估值来看,Facebook的IPO交易很可能是世界历史上规模最大的一桩互联网公司IPO交易,其IPO首日的股票成交量超过历史上的任何IPO交易,与整个纽约证券交易所(NYSE)的日均成交量大致持平。但这项IPO交易最终是一场“海啸”。纳斯达克正试图处理这些问题。对Facebook来说,这家公司的股价正在接近于最初的IPO定价区间下限。Mitbbs.com

米克尔指出,Facebook的定价过高,这家公司及其承销商每分钟都在测量机构投资者的需求,不停地接受回馈信息。下单买入Facebook股票的投资者甚至都不知道自己已经成功买入。市场信心和动量是很重要的,而Facebook丧失了动量。Mitbbs.com

米克尔表示,以前从来都没有发生过这种情况。她认为,Facebook是一家伟大的公司,随着时间的推移未来将可做到很好。但就目前而言,还无法确定是应买入和卖出这家公司的股票。她还指出,银行家基于既有信息而做到的事情已经是最好的;如果进行一场拍卖的话 ,那么Facebook股票原本还会被抬得更高。

Standard