> The issue: I have two data sets, one large (gigabytes) with
> individual data including a unique id and a date of occurrence
> of a service being provided.
>
> I need the status of the provider on the date of occurrence
> for each case. I have a second data set that includes, for
> each provider their status as of a period of time (e.g.,
> a beginning and ending date for a contract).
>
> I need to merge the two so that the provider's status is
> included in the individual data set.
>
> The only solution I currently see involves expanding the
> provider data set so each possible data (from the beginning
> of the contract to the end) is covered and then merging the
> two data sets (using the nokeep option). However, this
> strikes me as inefficient.
>
I presume the individual dataset indicates which provider was involved
with the individual record. If so, why not just merge the two datasets
on provider, which should leave you with something like
Then you can just generate status = ( providerStart <= dateService <=
ProviderEnd )
and drop the start and end variables... (Yes, that is not a Stata
statement, but you know what I mean)