I'm not a geographer, but I think this is an interesting question.
You could just regress wage on a full set of dummies twice, once for
LAD and once for TTWA, and compare the R-squared values, though that
is unlikely to convince you or anyone else that one division is more
useful than another. I guess I would start by calculating mean and
standard deviation of log wage for each LAD and TTWA, and population
for each, and then I would make two graphs of the StdDevs against the
means with marker size given by population, just to get a sense of
what kind of variation in wages the divisions capture. A picture can
give you a better sense of the data than numerous tabular results,
sometimes.
I think your criterion is really a kind of entropy-minimizing one,
since you don't want to have geocode categories to 8 decimal places
(one category for each worker produces very little variation within
cells, and a lot of categories) or a country identifier (one cell with
a lot of variation within cell). So the size of the grid, in terms of
population in each LAD/TTWA, is important, not just how homogenous
people are within each LAD/TTWA.
I'll be interested in what others with more experience in this area
have to say on how they would approach this problem. Nick--how would
you measure minimal structure in residuals here?
On 3/8/06, Nick Cox <[email protected]> wrote:
> I am a geographer but I don't know much about (what is
> usually called human) geography. I regarded it as my main field
> of interest between 1968 and 1969, but no longer. There aren't
> many geographers on this list, I think.
>
> However, your question is not really geographical. I guess
> from this that you are using lots of dummies in each case
> and for once the answer is whichever set of dummies gives
> you a better model, according to your criteria of model
> excellence (my favourite criterion is usually minimal
> structure in residuals).
>
> In broad terms both LADs and TTWAs are fairly heterogeneous
> as both spring from a idea of an area functioning together
> rather than formal similarity of anything. So knowing the
> area might not help enormously in predicting wage. But
> whichever spatial subdivision has a finer mesh should
> prove better.
>
> Nick
> [email protected]
>
> Ada Ma
>
> > I have a bunch of wage observations and all the observations are
> > attached with two geographical identifiers - local authority districts
> > (LADs) and travel to work areas (TTWAs). I want to find out how wages
> > vary across different areas in UK.
> >
> > Now I can run wage estimations using either one of the two categorical
> > variables as explanatory variable. I would however like to find out
> > which categorical variable fits the data better. How do I compare the
> > two sets of results given that the explanatory variables are quite
> > different?
> >
> > Could you recommend what kind of tests I should use and if you are a
> > geographer, could you tell me are there any criteria that are used by
> > geographers to choose between different definitions of geographies
> > (regions, as opposed to LADs, as opposed to TTWAs, etc.)
>
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/