The DHS Program User Forum: India » Primary sampling units (PSU)

Home » Countries » India » Primary sampling units (PSU)

Show: Today's Messages :: Show Polls :: Message Navigator

Primary sampling units (PSU) [message #1313]

Tue, 11 February 2014 23:49

joyp
Messages: 5
Registered: January 2014

Member

Dear users,

I would like to check the number of PSUs in round 1, round 2 and round 3. I was looking at variable (V021).

In NHFS3, it corresponds to 3850 PSUs reported in the final report (I understand that it has been randomised, but it takes as many unique values).

But in earlier rounds (NFHS 2 and NFHS 1) I have much fewer unique values for this variable.
According to the final reports, NFHS 2 should have 3165 PSUs and NFHS 1 should have 3522 area units. So I am thinking that V021 should have as many unique values.

Can anyone please confirm this? Or am I missing something?

Many thanks,
Jaai

Report message to a moderator

Re: Primary sampling units (PSU) [message #1363 is a reply to message #1313]

Mon, 17 February 2014 23:24

Liz-DHS
Messages: 1516
Registered: February 2013

Senior Member

Dear User,
Here is a response from one of our experts, Fred Arnold:
The PSU numbers are unique within a single state, but they repeat across states, so you should look at the state code in conjunction with the PSU numbers when trying to determine how many PSUs there are.

Report message to a moderator

Re: Primary sampling units (PSU) [message #1415 is a reply to message #1363]

Sun, 23 February 2014 23:50

joyp
Messages: 5
Registered: January 2014

Member

Many thanks, Liz and special thanks to Fred Arnold.
It is very good to get confirmation from the experts. Thanks.

Report message to a moderator

Re: Primary sampling units (PSU) [message #1417 is a reply to message #1415]

Mon, 24 February 2014 07:32

Liz-DHS
Messages: 1516
Registered: February 2013

Senior Member

you're welcome!

Report message to a moderator

Re: Primary sampling units (PSU) [message #3042 is a reply to message #1363]

Mon, 06 October 2014 16:55

vega25
Messages: 14
Registered: April 2014
Location: United States

Member

May I ask what "looking at state code in conjunction with the PSU numbers" means for analysis. If I want to cluster at the level of the PSU, would the option cluster(psu) in Stata at the end of the regression equation not suffice, because the PSU numbers repeat in various states? If so, how do I generate unique numbers for each PSU across the states?

Thanks very much!

Report message to a moderator

Re: Primary sampling units (PSU) [message #3047 is a reply to message #3042]

Tue, 07 October 2014 10:51

Trevor-DHS
Messages: 805
Registered: January 2013

Senior Member

You probably need to combine state and psu. An easy way to do that is to use

egen newvar=group(state psu)

Report message to a moderator

Re: Primary sampling units (PSU) [message #3049 is a reply to message #3047]

Tue, 07 October 2014 11:39

Liz-DHS
Messages: 1516
Registered: February 2013

Senior Member

Here is another response from one of our experts, Fred Arnold:
If you're using data for only a single state, you can use just the PSU number, but if you're using data across states or for the whole country you need to combine the state and PSU codes in order to obtain a unique code for each PSU. For example, every state will have a PSU=001 so that PSU code is not unique. If you add the PSU code at the end of the state code you will have a unique variable. For example, PSU 001 in state 22 will have the unique PSU number of 22001. PSU 001 in state 24 will have the unique PSU number of 24001. One way to generate the unique codes is to create a new variable, which =1,000*state code + PSU code.

Thank you for your post.

Report message to a moderator

Re: Primary sampling units (PSU) [message #3265 is a reply to message #3049]

Thu, 13 November 2014 15:03

vega25
Messages: 14
Registered: April 2014
Location: United States

Member

Thank you very much for your response, and please accept my apologies for the delay in communicating my thanks.

Interestingly I notice that a suffix <cluster(psu2)> in a regression statement in STATA for all-India analysis, where psu2 has been calculated using the method that Trevor and Dr. Arnold described does not affect the results, even the standard errors. Is that normal? At my end, I'll look into the actual data file and the new psu2 variable that's been created a little more later this week, and report again.

Report message to a moderator

Re: Primary sampling units (PSU) [message #3268 is a reply to message #3265]

Fri, 14 November 2014 19:48

Trevor-DHS
Messages: 805
Registered: January 2013

Senior Member

I'm not sure I'm following exactly what your question is, but in general I think you want to be using the svyset command to describe the sampling parameters and then svy: regress for your regression, and not trying to add other parameters to your regression. There are a number of other post on the forum about the use of svyset and svy.

For NFHS3 for example, you should be able to use:
svyset psu2 [pw=v005/1000000], strata(v022)

and then
svy: regress ...

Report message to a moderator

Re: Primary sampling units (PSU) [message #12124 is a reply to message #3268]

Thu, 30 March 2017 10:41

kbietsch
Messages: 14
Registered: November 2015
Location: Washington, DC USA

Member

Hello Trevor,
I am looking for the strata variable (or which to combine to construct one). You suggest v022, but looking at the data file, this variable is missing. According to the sampling framework described in the final report (volume 2), the NFHSIII is not a 2 stage cluster sample, has both 2 and 3 stages (in rural and urban areas, respectively).

This is the code I generally use for analyzing DHS data (in R) with strata of urban/rural.

df$sampleweights <- df$v005/1000000
design <- svydesign(ids=~v021+v002, strata=~v025, weights=~sampleweights, data=df)

How do you set the strata design for NFHSIII?
Thank you,
Kristin

Report message to a moderator

Re: Primary sampling units (PSU) [message #12125 is a reply to message #12124]

Thu, 30 March 2017 13:03

Trevor-DHS
Messages: 805
Registered: January 2013

Senior Member

1) When I look at the NFHSIII dataset, I see values 1-73 for v022. The strata are urban slum and non-slum area and rural areas in each state. See table C.2 (Sample characteristics) in Volume II page 15 of the report.
2) For the svydesign the ids should be just the cluster level ids, not including the household id, so it should only be v021
Thus, I think the command should be:
df$sampleweights <- df$v005/1000000
design <- svydesign(ids=~v021, strata=~v025, weights=~sampleweights, data=df)

Report message to a moderator

Re: Primary sampling units (PSU) [message #12131 is a reply to message #12125]

Fri, 31 March 2017 02:58

kbietsch
Messages: 14
Registered: November 2015
Location: Washington, DC USA

Member

Thanks Trevor, I just downloaded the latest version of the data, and v022 is no longer missing. However, v021 is- though there is a note to use s021- which works perfectly.

The survey design is now set as:
df$sampleweights <- df$v005/1000000
design <- svydesign(ids=~s021, strata=~v022, weights=~sampleweights, data=df)

Cheers,
Kristin

Report message to a moderator

Re: Primary sampling units (PSU) [message #12137 is a reply to message #12131]

Fri, 31 March 2017 11:46

Trevor-DHS
Messages: 805
Registered: January 2013

Senior Member

Looks good. I guess I'm behind the latest version.

Report message to a moderator

Previous Topic:	Occupation codes, NFHS 1 (India), 1992-93
Next Topic:	District identifiers in NFHS - 3

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

] [

]

Current Time: Sun Jul 6 00:02:48 Coordinated Universal Time 2025