Java Mailing List Archive

http://www.junlu.com/

Subjects
Home
mod jk2 https
Donation of JAXP 1 3 Sources to Apache
R annoyances
RE: Finding out when the aspnet admin worker process has recycled
Favorite Linux Distribution
eigenvalues of a circulant matrix
Apache Install
Reachin apache from outside
Ant should have an ext directory
Warning: Documentroot doesn 't exist
Can this be Done?
RE: Multilanguage Application
RE: Simple Question On setting up Sub Domain site
Lack of independence in anova()
How to close connection instead of sending 403?
winning the case for ANT
Re: adding php
New Ant GUI 'Ant 's Nest '
Narrowing Down A Strange Problem
Ant Task: sshexec
R Graph Gallery : categorization of the graphs
I 've been hacked, I need some help please
RE: Anyone working with DotNetNuke?
RE: Exception Handling Opinion
hex format
RE: IIS stopped working :(
<for > Build Failed:problem
RE: Separation of Objects from Logic
RE: Tracking pages with long request execution time
sending email to multiple destination
Web Site
ant UI
Easy cut & paste from Excel to R?
Win32 Apache Restart
Improving Tasks
HELP! PLEASE!
RE: Adding Controls to a Page
read table
RE: ASPNET account doesn 't exist!
Best way to uninstall Apache2 on red hat
from win to linux how to web page
XMLParseException changes and creation of XMLLocator2
Re Post: rewrite backslash to forward slash
Target or macrodef?
Page display problem XPSP2
Authentication problems
Dynamic Dictionary Data Type?
Newbie unable access my www from outside
off topic question: Latex and R in industries
Conflict between xtable and Hmisc when using Sweave?
Very old problem without any new solution
mod rewrite help
Basic Authentication question
RE: Code Security
calling ant from java program
prevent double signing
Re: Controlling Copy/Paste/Print
Using R to illustrate the Central Limit Theorem
web server slow too much slow
access to user directories
Links
Home
Official R Project Site
 
Search:  
Power your search with and, or, +, -, or "some phrase" operators.
Lack of independence in anova()

Lack of independence in anova()

2005-07-06       - By Phillip Good
Reply:     1     2     3     4     5     6     7     8     9     10     >>  

Do you or Lumley have a citation for this conclusion?   Most people go
forward with the ANOV on the basis that the various tests are independent.

Phillip Good
P.S.  Tests based on the method of synchronized permutations are
independent.

-----Original Message-----
From: Douglas Bates [mailto:dmbates@(protected)]
Sent: Monday, July 04, 2005 1:44 PM
To: pigood@(protected)
Cc: r-help
Subject: Re: [R] Lack of independence in anova()


I wrote up a note on how to do a more efficient simulation of the
p-values from anova then discovered to my surprise that the chi-square
test for independence of the significance of the F-tests indicated
that they were not independent.  I was stumped by this but fortunately
Thomas Lumley came to my rescue with an explanation.  There is no
reason why the results of the F tests should be independent.  The
numerators are independent but the denominator is the same for both
tests.  When, due to random variation, the denominator is small, then
the p-values for both tests will tend to be small.  If, instead of
F-tests you use chi-square tests then you do see independence.

Here is the note on the simulation.

There are several things that could be done to speed the simulation of
the p-values of an anova like this under the null distribution.

If you examine the structure of a fitted lm object (use the function
str()) you will see that there are components called `qr', `effects'
and `assign'.  You can verify by experimentation that `qr' and
`assign' depend only on the experimental design.  Furthermore, the
`effects' vector can be reproduced as qr.qty(qrstr, y).

The sums of squares for the different terms in the model are the sums
of squares of elements of the effects vector as indexed by the assign
vector.  The residual sum of squares is the sum of squares of the
remaining elements of the assign vector.  You can generate 10000
replications of the calculations of the relevant sums of squares as

> set.seed(1234321)
> vv <- data.frame(c = gl(3,3,18), r = gl(2,9,18))
> vv
  c r
1  1 1
2  1 1
3  1 1
4  2 1
5  2 1
6  2 1
7  3 1
8  3 1
9  3 1
10 1 2
11 1 2
12 1 2
13 2 2
14 2 2
15 2 2
16 3 2
17 3 2
18 3 2
> fm1 <- lm(rnorm(18) ~ c*r, vv)
> fm1$assign
[1] 0 1 1 2 3 3
> asgn <- c(fm1$assign, rep(4, 12))
> system.time(res <- replicate(10000, tapply(qr.qty(fm1$qr, rnorm(18))^2,
asgn, sum)))
[1] 20.61  0.01 20.61  0.00  0.00
> res[,1:6]
      [,1]      [,2]     [,3]      [,4]       [,5]        [,6]
0 0.4783121 0.3048634 0.713689 0.6937838 0.03649023  2.63392426
1 0.5825213 1.4756395 1.127018 0.5209751 1.18697199  3.32972093
2 0.2612723 3.6396106 0.547506 1.1641910 0.37843963  0.03411672
3 2.6259806 3.5504584 1.645215 0.1197238 0.85361018  4.53895212
4 9.1942755 8.2122693 4.863392 5.4413571 2.03715439 22.94815118

The rows of that array correspond to the sum of squares for the
Intercept (which we generally ignore), the factor 'c', the factor 'r',
their interaction and the residuals.

As you can see that took about 21 seconds on my system, which I expect
is a bit faster than your simulation ran.

Because I set the seed to a known value I can reproduce the results
for the first few simulations to check that the sums of squares are
correct.  Remember that the original fit (fm1) is not included in the
table.

> set.seed(1234321)
> fm1 <- lm(rnorm(18) ~ c*r, vv)
> anova(fm2 <- lm(rnorm(18) ~ c*r, vv))
Analysis of Variance Table

Response: rnorm(18)
         Df Sum Sq Mean Sq F value Pr(>F)
c          2 0.5825  0.2913  0.3801 0.6917
r          1 0.2613  0.2613  0.3410 0.5701
c:r        2 2.6260  1.3130  1.7137 0.2215
Residuals 12 9.1943  0.7662

You can continue this process if you wish further verification that
the results correspond to the fitted models.

You can get the p-values for the F-tests as

> pvalsF <- data.frame(pc = pf((res[2,]/2)/(res[5,]/12), 2, 12, low =
FALSE),
+                      pr = pf((res[3,]/1)/(res[5,]/12), 1, 12, low =
FALSE),
+                      pint = pf((res[4,]/2)/(res[5,]/12), 2, 12, low =
FALSE))

Again you can check this for the first few by hand.

> pvalsF[1:5,]
         pc         pr      pint
1 0.69171238 0.57006574 0.2214847
2 0.37102129 0.03975286 0.1158059
3 0.28634939 0.26771167 0.1740633
4 0.57775850 0.13506561 0.8775828
5 0.06363138 0.16124100 0.1224806

If you wish you could then check marginal distributions using
techniques like an empirical density plot.

> library(lattice)
> densityplot(~ pc, pvals)

At this point I would recommend checking the joint distribution but if
you want to choose a specific level and check the contingency table
that could be done as

> xtabs(~ I(pr < 0.16) + I(pint < 0.16), pvalsF)
           I(pint < 0.16)
I(pr < 0.16) FALSE TRUE
      FALSE  7204 1240
      TRUE   1215  341

The summary method for an xtabs object provides a test of independence

> summary(xtabs(~ I(pr < 0.16) + I(pint < 0.16), pvalsF))
Call: xtabs(formula = ~I(pr < 0.16) + I(pint < 0.16), data = pvalsF)
Number of cases in table: 10000
Number of factors: 2
Test for independence of all factors:
  Chisq = 51.6, df = 1, p-value = 6.798e-13

for which you can see the puzzling result.  However, if you use the
chisquared test based on the known residual variance of 1, you get

> pvalsC <- data.frame(pc = pchisq(res[2,], 2, low = FALSE),
+                      pr = pchisq(res[3,], 1, low = FALSE),
+                      pint = pchisq(res[4,], 2, low = FALSE))
> pvalsC[1:5,]
        pc         pr      pint
1 0.7473209 0.60924741 0.2690144
2 0.4781553 0.05642013 0.1694446
3 0.5692081 0.45933855 0.4392846
4 0.7706757 0.28059805 0.9418946
5 0.5523983 0.53843951 0.6525907
> xtabs(~ I(pr < 0.16) + I(pint < 0.16), pvalsC)
           I(pint < 0.16)
I(pr < 0.16) FALSE TRUE
      FALSE  7121 1319
      TRUE   1324  236
> summary(xtabs(~ I(pr < 0.16) + I(pint < 0.16), pvalsC))
Call: xtabs(formula = ~I(pr < 0.16) + I(pint < 0.16), data = pvalsC)
Number of cases in table: 10000
Number of factors: 2
Test for independence of all factors:
  Chisq = 0.25041, df = 1, p-value = 0.6168



On 7/4/05, Phillip Good <pigood@(protected)> wrote:
> To my surprise, the R functions I employed did NOT verify the independence
> property.  I've no quarrel with the theory--it's the R functions that
worry
> me.
>
> pG
>
> -----Original Message-----
> From: Douglas Bates [mailto:dmbates@(protected)]
> Sent: Monday, July 04, 2005 9:13 AM
> To: pigood@(protected); r-help
> Subject: Re: [R] Lack of independence in anova()
>
>
> I have already had email exchanges off-list with Phillip Good pointing
> out that the independence property that he wishes to establish by
> simulation is a consequence of orthogonality of the column span of the
> row contrasts and the interaction contrasts.  If you set the contrasts
> option to a set of orthogonal contrasts such as contr.helmert or
> contr.sum, which has no effect on the results of the anova, this is
> easily established.
>
> > build
> function(size, v = rnorm(sum(size))) {
>     col=c(rep(0,size[1]),rep(1,size[2]),rep(2,size[3]),rep(3,size[4]),
>     rep(0,size[5]),rep(1,size[6]),rep(2,size[7]),rep(3,size[8]))
>     row=c(rep(0,size[1]+size[2]+size[3]+size[4]),rep(1,size[5]+size[6]
>     +size[7]+size[8]))
>     return(data.frame(c=factor(col), r=factor(row),yield=v))
> }
> > size
> [1] 3 3 3 0 3 3 3 0
> > set.seed(1234321)
> > vv <- build(size)
> > vv
>    c r      yield
> 1  0 0  1.2369081
> 2  0 0  1.5616230
> 3  0 0  1.8396185
> 4  1 0  0.3173245
> 5  1 0  1.0715115
> 6  1 0 -1.1459955
> 7  2 0  0.2488894
> 8  2 0  0.1158625
> 9  2 0  2.6200816
> 10 0 1  1.2624048
> 11 0 1 -0.9862578
> 12 0 1 -0.3235653
> 13 1 1  0.2039706
> 14 1 1 -1.4574576
> 15 1 1  1.9158713
> 16 2 1 -2.0333909
> 17 2 1  1.0050272
> 18 2 1  0.6789184
> > options(contrasts = c('contr.helmert', 'contr.poly'))
> > crossprod(model.matrix(~c*r, vv))
>             (Intercept) c1 c2 r1 c1:r1 c2:r1
> (Intercept)          18  0  0  0     0     0
> c1                    0 12  0  0     0     0
> c2                    0  0 36  0     0     0
> r1                    0  0  0 18     0     0
> c1:r1                 0  0  0  0    12     0
> c2:r1                 0  0  0  0     0    36
>
>
> On 7/4/05, Phillip Good <pigood@(protected)> wrote:
> >  If the observations are normally distributed and the 2xk design is
> > balanced,  theory requires that the tests for interaction and row
effects
> be
> > independent.  In my program, appended below, this would translate to
cntT
> > (approx)= cntR*cntI/N if all R routines were functioning correctly.
They
> > aren't.
> >
> > sim2=function(size,N,p){
> >   cntR=0
> >   cntC=0
> >   cntI=0
> >   cntT=0
> >   cntP=0
> >   for(i in 1:N){
> >     #generate data
> >      v=gendata(size)
> >     #analyze after build(ing) design containing data
> >      lm.out=lm(yield~c*r,build(size,v))
> >      av.out=anova(lm.out)
> >     #if column effect is significant, increment cntC
> >      if (av.out[[5]][1]<=p)cntC=cntC+1
> >    #if row effect is significant, increment cntR
> >      if (av.out[[5]][2]<=p){
> >            cntR=cntR+1
> >            tmp = 1
> >            }
> >      else tmp =0
> >      if (av.out[[5]][3]<=p){
> >            #if interaction is significant, increment cntI
> >             cntI=cntI+1
> >         #if both interaction and row effect are significant, increment
> cntT
> >             cntT=cntT + tmp
> >             }
> >      }
> >     list(cntC=cntC, cntR=cntR, cntI=cntI, cntT=cntT)
> > }
> >
> > build=function(size,v){
> > #size is a vector containing the sample sizes
> > col=c(rep(0,size[1]),rep(1,size[2]),rep(2,size[3]),rep(3,size[4]),
> > rep(0,size[5]),rep(1,size[6]),rep(2,size[7]),rep(3,size[8]))
> > row=c(rep(0,size[1]+size[2]+size[3]+size[4]),rep(1,size[5]+size[6]
> > +size[7]+size[8]))
> > return(data.frame(c=factor(col), r=factor(row),yield=v))
> > }
> >
> > gendata=function(size){
> >   ssize=sum(size);
> >   return (rnorm(ssize))
> > }
> >
> > #Example
> >  size=c(3,3,3,0,3,3,3,0)
> >  sim2(size,10000,10,.16)
> >
> >
> >
> > Phillip Good
> > Huntington Beach CA
> >
> >
> >
> > ______________________________________________
> > R-help@(protected) mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
> >
> >
>
> ______________________________________________
> R-help@(protected) mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>

______________________________________________
R-help@(protected) mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

©2008 junlu.com - Jax Systems, LLC, U.S.A.