Hi Simon,
try this:
*** Import: As far as I see the two vars are tab-seperated. Put the data in
a single text file and use insheet
insheet using "C:\temp\new1.txt", tab clear
** As a result you get two string vars: v1 and v2
*** Generate a group-variable for each food code
*** I'm sorry, my approach is a bit complicated: First I compute the running
number (_n) of each "***"-case,
*** then I give the running number to each case of the following data group
and at least I exchange with a group number
*** There'll be much more elegant ways, but it works.
gen test = _n
gen stars = 0
replace stars = test if v1 =="***"
replace stars = stars[_n-1] if v1 !="***" & test > 1
egen group = group(stars)
drop test stars
* get rid of unneccessary data
drop if v1 =="***" | v1 == "Rectype"
** reshape from long to wide
reshape wide v2, i(group) j(v1) string
*** Delete the v2 stub before every varname
renpfix v2
*The given dataset contains only string vars. Change all "numerical" vars
from string to numerical format
for var CELL- GLUC INSOLNCP -LACT MALT-NA NUMB-SOUTHFIB SUCR-WATER:
destring X, gen(numX) \ drop X
renpfix num
* Now you can "drop", "rename" or "move" the vars as you need it.
>
> Dear All
>
> I have recently purchased a dataset which contains food codes
> in 7-bit
> ASCII format from the UK Food Standards Agency. However,
> they are not
> yet in a format that I can use for any meaningful analysis and their
> technicial support is unable to assist.
>
> The data as they stand fall into 2 columns: varname|value.
> NUMB is the
> food code - i.e. the variable that I want to sort the data by.
>
> I would like to have all of the other variables along the top as
> separate columns, so that I end up with individual food codes
> and their
> nutrient values - see below. With these data I can then do dietary
> analysis.
>
> This format goes on for several thousand different food
> codes, over and
> over. Codes are separated by the line ***
>
> How it is now:
>
> RECTYPE 1
> NUMB 14002
> CONST 50852
> PREV 50852
> GROUP FA
> NAME Apples, cooking, raw, peeled
> EDPOR 1
> WATER 87.7
> TOTNIT 0.05
> PROT 0.3
> FAT 0.1
> CHO 8.9
> KCALS 35
> KJ 151
> STAR Tr
> TOTSUG 8.9
> GLUC 2
> FRUCT 5.9
> SUCR 1
> MALT 0
> LACT 0
> SOUTHFIB 2.2
> ENGFIB 1.6
> CELL 0.6
> SOLNCP 0.6
> INSOLNCP 0.4
> LIGNIN Tr
> NA 2
> ***
> RECTYPE 1
> NUMB 14003
> CONST 50853
> PREV 50853
> GROUP FA
> NAME Apples, cooking, weighed with skin and core
> EDPOR 0.73
> WATER 63.1
>
> etc
>
> What I would like to have:
>
> Code | Description | Group | Water | Fat | CHO | Protein |
> Sodium |
> Potassium
>
> 14001 | Apples | Fruit | 26 | 5.1 | 4.2 | 3.6 |
> 0.1 | 0.04
> 14002 | Oranges | Fruit | etc
>
>
> Does anyone have any suggestions as to how to arrange the data into
> something I can use?
>
> Thanks,
>
> Simon Wheeler
>
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/