I have two data sets that I want to put together to analyze with a
nested model.
The first data set looks like:
group var
1 2
1 3
1 2
2 5
2 4
3 3
3 4
3 5
3 5
Thus there will be roughly balanced data by group. Group in this case
is really different inbred strains of mice. Each row above is an
animal. Thus there are 3 animals for group 1, 2 for group 2, and 4 for
group 3. Var will be a continuously distributed dependent variable.
The other data set looks like:
marker group_1 group_2 group_3
1 aa aa bb
2 aa aa bb
3 aa bb bb
In this data set, each row is a genetic marker. The second to fourth
columns are genetic information for each marker for each group.
I want the get the data together such that it looks like:
group var marker_1 marker_2 marker_3
1 2 aa aa aa
1 3 aa aa aa
1 2 aa aa aa
2 5 aa aa bb
2 4 aa aa bb
3 3 bb bb bb
3 4 bb bb bb
3 5 bb bb bb
3 5 bb bb bb
Notice that each animal in a group has the same genetic information for
any particular marker, but var may differ between animals. The basic
model to analyze these data by marker will nest animal in marker to
predict var.
My problem is how to write a program that is smart enough to properly
repeat the genetic information by group in bringing the two files
together. The number of animals per group may change from file to file
and the number of markers may change also, but each group will have
genetic information at each marker (no missing genetic information).
Thanks much for any help or example code on similar problems.