The DHS Program User Forum
Discussions regarding The DHS Program data and results
Home » Countries » India » District-level sex ratio and district-level age distribution
District-level sex ratio and district-level age distribution [message #23960] Fri, 21 January 2022 09:54 Go to next message
Lea F is currently offline  Lea F
Messages: 3
Registered: February 2017
Member
Hi,
I am using the 2015-2016 DHS data for India (NFHS-4), and I would like to calculate the following:

1. District-level sex ratio (not at birth but in the total population)
2. District-level age distribution (e.g. in 5-year categories, or % below 15 years of age, or something similar)

I was wondering: a) how these would be calculated from the dataset, and b) which dataset would be best to use for these estimations - I assume the household survey would be best, but am not sure.

Thank you very much for your help!
Re: District-level sex ratio and district-level age distribution [message #23974 is a reply to message #23960] Sun, 23 January 2022 16:07 Go to previous messageGo to next message
Bridgette-DHS is currently offline  Bridgette-DHS
Messages: 3035
Registered: February 2013
Senior Member
Following is a response from DHS Research & Data Analysis Director, Tom Pullum:

This would be done with the PR file. The following lines use the "collapse" command to give you the sex distribution and the age distribution (in 5-year intervals). There is also a file that gives the age-sex distribution. You need to specify whether you want the de facto or de jure population. Even the collapsed files are large!

* You need to specify a workspace
cd e:\DHS\DHS_data\scratch

use hv005 hv102-hv105 shdistri using "C:\Users\26216\ICF\Analysis - Shared Resources\Data\DHSdata\IAPR74FL.DTA" , clear

gen wt=hv005/1000000

* If you want the de jure population
keep if hv102==1

* If, instead, you want the de facto population
* keep if hv103==1

* drop if age (hv105) is missing (.02 of 1% of cases). hv105 is 95 for age 95+
drop if hv105>95

* age5 gives standard age intervals for age 0-4, 5-9, etc
gen age5=1+ int(hv105/5)

collapse (sum) wt, by(hv104 age5 shdistri)
save IAtemp_age_sex_district.dta, replace

use IAtemp_age_sex_district.dta, clear
collapse (sum) wt, by(sex district)

* this file gives the weighted numbers of males and females in each district in the NFHS-4
save IAtemp_sex_district.dta, replace

use IAtemp_age_sex_district.dta, clear
collapse (sum) wt, by(age district)

* this file gives the weighted numbers of males and females in each five-year age group in the NFHS-4
save IAtemp_age_district.dta, replace


Re: District-level sex ratio and district-level age distribution [message #23986 is a reply to message #23974] Mon, 24 January 2022 15:22 Go to previous messageGo to next message
Lea F is currently offline  Lea F
Messages: 3
Registered: February 2017
Member
Thanks very much for this response. This looks like it's Stata code - I am working in R and have read the Stata data files into R. Would it be possible to explain the calculation steps so that I can write equivalent code for R (instead of running the provided Stata code)?

Thank you very much!
Lena
Re: District-level sex ratio and district-level age distribution [message #23988 is a reply to message #23986] Tue, 25 January 2022 07:42 Go to previous message
Bridgette-DHS is currently offline  Bridgette-DHS
Messages: 3035
Registered: February 2013
Senior Member

Following is a response from DHS Research & Data Analysis Director, Tom Pullum:


I just looked again and I see a couple of typos. I didn't actually test the program because the file is so large.

In these lines:

use IAtemp_age_sex_district.dta, clear
collapse (sum) wt, by(sex district)

* this file gives the weighted numbers of males and females in each district in the NFHS-4
save IAtemp_sex_district.dta, replace

use IAtemp_age_sex_district.dta, clear
collapse (sum) wt, by(age district)

"by(sex district)" should be replace by "by(hv104 shdistri)" and "by(age district)" should be replaced by "by(age5 shdistri)".

I really think it is very clear what each step of the program does. Stata code is very intuitive. There are no complicated commands or options. I'll just clarify one command. "collapse (sum) wt, by(hv104 shdistri)" means that the file is reduced by adding up the values of "wt" within each combination of sex and district. There will be one line for each combination and it will include the total weighted number of cases.

Previous Topic: Birth Size
Next Topic: Estimating PNC
Goto Forum:
  


Current Time: Sat Apr 20 05:50:01 Coordinated Universal Time 2024