The DHS Program User Forum
Discussions regarding The DHS Program data and results
Home » Data » Dataset use in Stata » Clarification on Variables for svyset in STATA and generating stunting variable
Re: Clarification on Variables for svyset in STATA and generating stunting variable [message #4274 is a reply to message #4273] Wed, 29 April 2015 13:44 Go to previous messageGo to previous message
user-rhs is currently offline  user-rhs
Messages: 132
Registered: December 2013
Senior Member
I'll answer the Stata and more general parts of your question, and leave the anthropometry-specific indicators for the experts:

Quote:
Where could I check what the strata details are specifically for my country of interest (ETHIOPIA)?

Check the DHS final report for that country and survey year

Quote:
The following four lines confuse me.
* the cluster id is hv001=hv021
* stata will normalize the weights; no need to divide by 1000000
* hv022 may be the strata variable but always check
* always a good idea to include one of the singleunit options
svyset hv001 [pweight=hv005], strata(hv022) singleunit(centered)

Why is the psu, weight and strata different from what is used in other examples, as illustrated in #1?
In what combination do we divide weights by 1000000 or not divide?


I assume the PSU question has to do with why HV001/HV021 was used instead of V001/V021. Recall that in the DHS datasets, different datasets have different prefixes for the variables. For example, in the women's recode dataset, the woman variables start with V (but different modules have different prefixes, e.g. B for birth hx questions). In the household recode, the prefixes are generally H-something (HV, HC, HW). Therefore, you will have the same variables with different prefixes between the datasets, i.e. HV001==V001, HV002==V002. You will need to create a uniform name for these variables when you want to merge the datasets together.

Re: the weights. The DHS final report for the survey year will have the instructions what to divide the weight variable by (I think I've seen instructions to divide by 100,000 somewhere before). In the grand scheme of things, using the weight variable as-is will give you the same means and LSEs, proportions, parameter estimates, tests of significance, etc. as using weight/100000. The only difference you will note is in the number of observations, where it will be 1,000,000 more than the cell size, number of obsns (in a regression) than if you had divided by 1,000,000.

Example from one of the Indonesia datasets:

Unweighted:
tab v025 rural

   Type of |
  place of |         rural
 residence |         0          1 |     Total
-----------+----------------------+----------
     Urban |     6,994          0 |     6,994 
     Rural |         0      8,268 |     8,268 
-----------+----------------------+----------
     Total |     6,994      8,268 |    15,262 



Weighted with weight-as is (svyset [pw=v005]):
svy: tab v025 rural,count format(%12.1g)
(running tabulate on estimation sample)

Number of strata   =         1               Number of obs      =        15262
Number of PSUs     =      1827               Population size    =  14782036227
                                             Design df          =         1826

----------------------------------------------
Type of   |
place of  |               rural               
residence |          0           1       Total
----------+-----------------------------------
    Urban | 7357950946           0  7357950946
    Rural |          0  7424085281  7424085281
          | 
    Total | 7357950946  7424085281     1.5e+10
----------------------------------------------
  Key:  weighted counts




Weighted with weight/1000000 (svyset [pw=wt]):
svy: tab v025 rural,count format(%12.1g)
(running tabulate on estimation sample)

Number of strata   =         1                  Number of obs      =     15262
Number of PSUs     =      1827                  Population size    = 14782.036
                                                Design df          =      1826

-------------------------------
Type of   |
place of  |        rural       
residence |     0      1  Total
----------+--------------------
    Urban |  7358      0   7358
    Rural |     0   7424   7424
          | 
    Total |  7358   7424  14782
-------------------------------
  Key:  weighted counts


[Updated on: Wed, 29 April 2015 14:47]

Report message to a moderator

 
Read Message
Read Message
Read Message
Read Message
Previous Topic: Difference between v006 and hw18 in the child recode files (KR)
Next Topic: Creating Groups within datasets
Goto Forum:
  


Current Time: Sat Oct 19 18:24:27 Coordinated Universal Time 2024