You're correct. I misread this problem.
I have a new problem in that I have
to guess what the Excel syntax does,
but it looks fairly transparent.
You should -reshape-, I suggest.
. reshape long affrxr2tag, i(array_id) string
Put the controls in a variable, e.g. with
. egen control = fill(0.25 0.5 1 2 4 0.25 0.5 1 2 4)
or with -repeat()- from -egenmore- on SSC
. egen control = repeat(), v(0.25 0.5 1 2 4)
then
. bysort array_id : regress affrxr2tag control
-statsby- could be vital here.
Alternatively,
1. Jeroen Weesie wrote a -slope()- for -egen-.
. findit _gslope
I don't think it's what your problem quite needs.
2. Nick Winter wrote a -corr()- for -egen-. That's
in -egenmore- from SSC.
I'd still check the linearity carefully by
looking at a series of graphs.
Nick
[email protected]
Wallace, John
>
> Thanks for your reply, Nick
> I was trying to keep my examples general in the belief that
> it would be more
> broadly useful for others, but for clarity's sake, here's a
> more explicit
> example.
>
> Some of the developmental arrays made by my company have probes
> complementary (in the DNA sense) to control reagents at specific
> concentrations in the sample fluid. One way to measure the
> quality of the
> arrays is to perform a regression of signal for those
> probes against the
> known concentration of the control reagents in the sample.
> I've found that
> the slope and r-squared of the least-squares linear
> regression correlates
> nicely with other measures of array quality, but computing
> the fit isn't
> trivial. At the moment I export the probe intensities from
> the analysis
> software into excel, line them up against the
> concentrations for the control
> reagents, and use Excel's Slope(y,x) and Rsq(y,x) functions
> to get the
> parameters I'm looking for.
> I would prefer to do that in Stata, for all the reasons we
> love Stata. The
> data looks like:
>
> array_id a~a_x_at a~b_x_at a~c_x_at a~d_x_at
> a~e_x_at
> 1. 930877 12.4 22.7 51.5 108
> 293.5
> 2. 930878 7.6 13 53.1 99
> 244.2
> 3. 930898 17.7 37 90.4 198
> 436.6
> 4. 930879 11.5 18.2 55.7 114
> 277.8
> 5. 930884 11.3 24.1 56.6 126.7
> 301.3
> 6. 930885 13.3 19.8 57 139
> 270.1
>
> the variable names are truncated from affxr2taga_x_at,
> affxr2tagb_x_at, etc
>
> The Controls are at the following concentrations
> TagA: 0.25 E-12M (i.e. 250 femtomolar)
> TagB 0.5 E-12M
> TagC 1.0 E-12M
> TagD 2.0 E-12M
> TagE 4.0 E-12M
>
> So, in Excel I would have cells like
> A B C D E
> R1 0.25 0.5 1.0 2.0 4.0
> R2 12.4 22.7 51.5 108 293.5
>
> And in column F I would use =SLOPE(A2:E2,A1:E1) to get the
> slope of the
> linear regression and =RSQ(A2:E2,A1:E1) to get the coefficient of
> determination.
>
> In stata terms, each observation would get a value in new
> variables "slope"
> and "fit". I've seen some egen commands like rmean() or
> rsd() that works at
> the observation level like that; calculating values in new
> variables from a
> function performed "across" variables for each observation.
>
> One approach I thought about was using -xpose- to switch
> observations with
> variables, then generating a new variable "conc" and doing
> a plain ol'
> regression of array_id vs conc. That's less attractive
> though, because
> xpose mangles your dataset (even using the ,varnames
> option, you can't get
> the original variable names back by running -xpose- again)
>
> It seems to me, from reading your earlier replies that you
> think I'd like
> to, for example, calculate how much the 6 measures of
> a~a_x_at correlate
> with a constant of 0.25. That's not the case; I'm
> interested in how the
> slope of (a-e vs pM) varies from array to array.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/