The DHS Program User Forum: Dataset use in Stata » Reshaping Kenya's DHS dataset

Home » Data » Dataset use in Stata » Reshaping Kenya's DHS dataset (Reshaping Kenya's DHS dataset - need to know how to reshape and form unique ID's)

Show: Today's Messages :: Show Polls :: Message Navigator

Reshaping Kenya's DHS dataset [message #27894]

Wed, 18 October 2023 02:55

Ritapriya Bandyopadhyay is currently offline

Ritapriya Bandyopadhyay
Messages: 19
Registered: October 2023

Member

Hi, I am trying to reshape the Kenya DHS data.
Now, I believe the data is in wide format, so hv104_01-hv104_24 - represent the sex of each household member within a household? How to go forward with the reshaping?

Also, I want to create a unique household ID and a unique individual ID. The problem is, when I create a unique individual ID before reshaping to long - all individuals in the household gets the same unique ID, hence should I do this after reshaping to long?

Would greatly appreciate your help

Best
Ritapriya

Attachment: Capture.PNG
(Size: 64.19KB, Downloaded 376 times)
Attachment: Capture.PNG
(Size: 55.24KB, Downloaded 374 times)

Report message to a moderator

Re: Reshaping Kenya's DHS dataset [message #27897 is a reply to message #27894]

Thu, 19 October 2023 08:33

Bridgette-DHS
Messages: 3230
Registered: February 2013

Senior Member

Following is a response from Senior DHS Staff Member, Tom Pullum:

You are apparently using the HR file, in which all the household information is on a single very wide record. You should use the PR file, in which cases are individual household members and the household-level information is on each record. The PR file is a reshaped version of the HR file. You do not need to do the reshaping--it has already been done.

Report message to a moderator

Re: Reshaping Kenya's DHS dataset [message #27900 is a reply to message #27897]

Thu, 19 October 2023 09:04

Ritapriya Bandyopadhyay
Messages: 19
Registered: October 2023

Member

Thanks a lot! I had another question. I wanted to find out the number of students enrolled across different education levels per age. So to weight it, I am using the following command: svyset psu [pw=weight], strata(stratum) singleunit(scaled). To view my results I am using the following command: tab age school_level, count. I have used hv022 for stratum and hv021 for psu. I have divided the hv005 variable by 1000000 to arrive at the household weights.

However when the count is displayed I observed that "number of observations" is greater than "population size" - how is this possible? Because population size is supposed to be greater than sample observations, right? I am working with adolescents between 10-24 year olds, but I am using household weights.

Report message to a moderator

Re: Reshaping Kenya's DHS dataset [message #27901 is a reply to message #27900]

Thu, 19 October 2023 10:55

Bridgette-DHS
Messages: 3230
Registered: February 2013

Senior Member

Following is a response from Senior DHS Staff Member, Tom Pullum:

When you use svyset, the "number of observations" often shifts to some large number that's much different from the actual sample size. Frankly, I just ignore that number. Whatever it is, it's NOT the number of observations.

For what you are doing, you only need to adjust for the weights, not clustering or stratification. The adjustments for clustering and stratification only affect the standard errors of estimates, not the estimates themselves. If your command is just "tab age school_level [iweight=hv005/1000000]", I think you will get the same results. That table would give the weighted number of cases in the sample. The percentages describe both the sample and the population.

Report message to a moderator

Re: Reshaping Kenya's DHS dataset [message #27961 is a reply to message #27901]

Thu, 26 October 2023 01:49

Ritapriya Bandyopadhyay
Messages: 19
Registered: October 2023

Member

Hi,
Thank you!

Just wanted to confirm once, I have added a snapshot - I am checking weighted frequency of school enrollment - the number of observations reduces as I add iweight (as shown in the snapshot) - you're saying this is possible?

Best,
Ritapriya

Attachment: school enrollment.PNG
(Size: 24.29KB, Downloaded 355 times)

Report message to a moderator

Re: Reshaping Kenya's DHS dataset [message #28167 is a reply to message #27961]

Tue, 21 November 2023 09:16

Bridgette-DHS
Messages: 3230
Registered: February 2013

Senior Member

Following is a response from Senior DHS Staff Member, Tom Pullum:

Yes, the weighted and unweighted totals are never exactly the same for subpopulations. The weighted total can be smaller or larger than the unweighted total. Usually within 10% but sometimes there is a larger difference.

Report message to a moderator

Previous Topic:	GMHS and Weighting
Next Topic:	Tabulating cluster and outcome variable

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

] [

]

Current Time: Sat Dec 13 06:03:20 Coordinated Universal Time 2025