The DHS Program User Forum
Discussions regarding The DHS Program data and results
Home » Countries » India » Primary sampling units (PSU)
Primary sampling units (PSU) [message #1313] Tue, 11 February 2014 23:49 Go to next message
joyp is currently offline  joyp
Messages: 5
Registered: January 2014
Member
Dear users,

I would like to check the number of PSUs in round 1, round 2 and round 3. I was looking at variable (V021).

In NHFS3, it corresponds to 3850 PSUs reported in the final report (I understand that it has been randomised, but it takes as many unique values).

But in earlier rounds (NFHS 2 and NFHS 1) I have much fewer unique values for this variable.
According to the final reports, NFHS 2 should have 3165 PSUs and NFHS 1 should have 3522 area units. So I am thinking that V021 should have as many unique values.

Can anyone please confirm this? Or am I missing something?

Many thanks,
Jaai
Re: Primary sampling units (PSU) [message #1363 is a reply to message #1313] Mon, 17 February 2014 23:24 Go to previous messageGo to next message
Liz-DHS
Messages: 1516
Registered: February 2013
Senior Member
Dear User,
Here is a response from one of our experts, Fred Arnold:
The PSU numbers are unique within a single state, but they repeat across states, so you should look at the state code in conjunction with the PSU numbers when trying to determine how many PSUs there are.
Re: Primary sampling units (PSU) [message #1415 is a reply to message #1363] Sun, 23 February 2014 23:50 Go to previous messageGo to next message
joyp is currently offline  joyp
Messages: 5
Registered: January 2014
Member
Many thanks, Liz and special thanks to Fred Arnold.
It is very good to get confirmation from the experts. Thanks.
Re: Primary sampling units (PSU) [message #1417 is a reply to message #1415] Mon, 24 February 2014 07:32 Go to previous messageGo to next message
Liz-DHS
Messages: 1516
Registered: February 2013
Senior Member
you're welcome!
Re: Primary sampling units (PSU) [message #3042 is a reply to message #1363] Mon, 06 October 2014 16:55 Go to previous messageGo to next message
vega25 is currently offline  vega25
Messages: 14
Registered: April 2014
Location: United States
Member
May I ask what "looking at state code in conjunction with the PSU numbers" means for analysis. If I want to cluster at the level of the PSU, would the option cluster(psu) in Stata at the end of the regression equation not suffice, because the PSU numbers repeat in various states? If so, how do I generate unique numbers for each PSU across the states?

Thanks very much!
Re: Primary sampling units (PSU) [message #3047 is a reply to message #3042] Tue, 07 October 2014 10:51 Go to previous messageGo to next message
Trevor-DHS is currently offline  Trevor-DHS
Messages: 787
Registered: January 2013
Senior Member
You probably need to combine state and psu. An easy way to do that is to use

egen newvar=group(state psu)
Re: Primary sampling units (PSU) [message #3049 is a reply to message #3047] Tue, 07 October 2014 11:39 Go to previous messageGo to next message
Liz-DHS
Messages: 1516
Registered: February 2013
Senior Member
Here is another response from one of our experts, Fred Arnold:
If you're using data for only a single state, you can use just the PSU number, but if you're using data across states or for the whole country you need to combine the state and PSU codes in order to obtain a unique code for each PSU. For example, every state will have a PSU=001 so that PSU code is not unique. If you add the PSU code at the end of the state code you will have a unique variable. For example, PSU 001 in state 22 will have the unique PSU number of 22001. PSU 001 in state 24 will have the unique PSU number of 24001. One way to generate the unique codes is to create a new variable, which =1,000*state code + PSU code.

Thank you for your post.

Re: Primary sampling units (PSU) [message #3265 is a reply to message #3049] Thu, 13 November 2014 15:03 Go to previous messageGo to next message
vega25 is currently offline  vega25
Messages: 14
Registered: April 2014
Location: United States
Member
Thank you very much for your response, and please accept my apologies for the delay in communicating my thanks.

Interestingly I notice that a suffix <cluster(psu2)> in a regression statement in STATA for all-India analysis, where psu2 has been calculated using the method that Trevor and Dr. Arnold described does not affect the results, even the standard errors. Is that normal? At my end, I'll look into the actual data file and the new psu2 variable that's been created a little more later this week, and report again.
Re: Primary sampling units (PSU) [message #3268 is a reply to message #3265] Fri, 14 November 2014 19:48 Go to previous messageGo to next message
Trevor-DHS is currently offline  Trevor-DHS
Messages: 787
Registered: January 2013
Senior Member
I'm not sure I'm following exactly what your question is, but in general I think you want to be using the svyset command to describe the sampling parameters and then svy: regress for your regression, and not trying to add other parameters to your regression. There are a number of other post on the forum about the use of svyset and svy.

For NFHS3 for example, you should be able to use:
svyset psu2 [pw=v005/1000000], strata(v022)

and then
svy: regress ...

Re: Primary sampling units (PSU) [message #12124 is a reply to message #3268] Thu, 30 March 2017 10:41 Go to previous messageGo to next message
kbietsch is currently offline  kbietsch
Messages: 14
Registered: November 2015
Location: Washington, DC USA
Member

Hello Trevor,
I am looking for the strata variable (or which to combine to construct one). You suggest v022, but looking at the data file, this variable is missing. According to the sampling framework described in the final report (volume 2), the NFHSIII is not a 2 stage cluster sample, has both 2 and 3 stages (in rural and urban areas, respectively).

This is the code I generally use for analyzing DHS data (in R) with strata of urban/rural.

df$sampleweights <- df$v005/1000000
design <- svydesign(ids=~v021+v002, strata=~v025, weights=~sampleweights, data=df)

How do you set the strata design for NFHSIII?
Thank you,
Kristin
Re: Primary sampling units (PSU) [message #12125 is a reply to message #12124] Thu, 30 March 2017 13:03 Go to previous messageGo to next message
Trevor-DHS is currently offline  Trevor-DHS
Messages: 787
Registered: January 2013
Senior Member
1) When I look at the NFHSIII dataset, I see values 1-73 for v022. The strata are urban slum and non-slum area and rural areas in each state. See table C.2 (Sample characteristics) in Volume II page 15 of the report.
2) For the svydesign the ids should be just the cluster level ids, not including the household id, so it should only be v021
Thus, I think the command should be:
df$sampleweights <- df$v005/1000000
design <- svydesign(ids=~v021, strata=~v025, weights=~sampleweights, data=df)
Re: Primary sampling units (PSU) [message #12131 is a reply to message #12125] Fri, 31 March 2017 02:58 Go to previous messageGo to next message
kbietsch is currently offline  kbietsch
Messages: 14
Registered: November 2015
Location: Washington, DC USA
Member

Thanks Trevor, I just downloaded the latest version of the data, and v022 is no longer missing. However, v021 is- though there is a note to use s021- which works perfectly.

The survey design is now set as:
df$sampleweights <- df$v005/1000000
design <- svydesign(ids=~s021, strata=~v022, weights=~sampleweights, data=df)

Cheers,
Kristin
Re: Primary sampling units (PSU) [message #12137 is a reply to message #12131] Fri, 31 March 2017 11:46 Go to previous message
Trevor-DHS is currently offline  Trevor-DHS
Messages: 787
Registered: January 2013
Senior Member
Looks good. I guess I'm behind the latest version.
Previous Topic: Occupation codes, NFHS 1 (India), 1992-93
Next Topic: District identifiers in NFHS - 3
Goto Forum:
  


Current Time: Fri Mar 29 06:26:00 Coordinated Universal Time 2024