Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: reading a txt file that loops
From
Robert Picard <[email protected]>
To
[email protected]
Subject
Re: st: reading a txt file that loops
Date
Sun, 17 Apr 2011 12:47:33 -0400
I suspect that the data you are describing can be found at:
http://www.census.gov/population/cencounts/in190090.txt
Since there are only 3 segments of data, you could simply use an
editor to copy and paste each part into different files. You are still
stuck having to input fixed-format data into Stata. Here's an approach
that processes the hole thing all at once:
*--------------------------- begin example -----------------------
* original data from:
* http://www.census.gov/population/cencounts/in190090.txt
clear
#delimit ;
infix str5 fips 1-5
str11 y1 6-16
str11 y2 17-27
str11 y3 28-38
str11 y4 39-49
str20 place 51-70
using "in190090.txt";
#delimit cr
compress
drop if fips == ""
gen segment = sum(fips=="FIPS")
drop if segment == 0
tempfile main
save "`main'"
* Loop over each segment and rename data vars
sum segment, meanonly
local n = r(max)
forvalues i = 1/`n' {
use "`main'", clear
keep if segment == `i'
drop segment
forvalues j = 1/4 {
if y`j'[1] != "" rename y`j' pop`=y`j'[1]'
else drop y`j'
}
sort fips
tempfile part`i'
save "`part`i''"
}
use "`part1'", clear
forvalues i = 2/`n' {
merge 1:1 fips using "`part`i''", nogen
}
drop if fips == "FIPS"
destring pop*, replace
order pop*, last alpha
*--------------------- end example --------------------------
On Sat, Apr 16, 2011 at 8:35 AM, Sears Generic <[email protected]> wrote:
> Are there any shortcuts to reading a data file that has the following format
> other than to reorganize the data before importing? The data file is for
> population by year by geographic location (e.g. United States, Indiana, then
> 3 counties in Indiana). "FIPS" is a unique identifier for each county. The
> problem is that the text file loops (i.e. only provides 4 decades of data
> before starting over) on a new line. In the example below I've reduced the
> issue to the United States, Indiana, and 3 counties, but the full dataset
> has every county for every state so the looping does not recur in a
> consistent way. Any suggestions would be appreciated.
>
>
> FIPS 1990 1980 1970 1960
> 00000 248709873 226545805 203211926 179323175 United States
>
> 18000 5544159 5490224 5193669 4662498 Indiana
> 18001 31095 29619 26871 24643 Adams County
> 18003 300836 294335 280455 232196 Allen County
> 18005 63657 65088 57022 48198 Bartholomew County
>
> FIPS 1950 1940 1930 1920
> 00000 151325798 132164569 12320262 106021537 United States
>
> 18000 3934224 3427796 3238503 2930390 Indiana
> 18001 22393 21254 19957 20503 Adams County
> 18003 183722 155084 146743 114303 Allen County
> 18005 36108 28276 24864 23887 Bartholomew County
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/