Sorry if this is a beginner�s question. I want to investigate the
likelihood of a firm i investing in an industry s at time t: P(Yist=1)=f(X,
Z). My explanatory variables X are some investing firm-level
characteristics; Z are some characteristics of the invested industries.
About 1000 firms have made 12000 investments in 200 industries at 3-digit
SIC level over 20 years’ time.
I want to specify an xtlogit model and use Compustat public firm data as
control. Since the control size is huge (over 1 million), I am thinking of
using a 1:10 or 1:20 sample-control ratio. I have two questions:
1. Is this kind of ‘choice-based sampling’ proper for my purpose? I
would appreciate any suggestions or references about the sampling and model
specification?
2. What’s the best way to generate the controls in Stata/SE 8?