The DHS Program User Forum
Discussions regarding The DHS Program data and results
Home » Topics » Domestic Violence » Cluster identifiers and finite population correction
Cluster identifiers and finite population correction [message #8644] Tue, 24 November 2015 05:19 Go to previous message
FanuelShem is currently offline  FanuelShem
Messages: 2
Registered: November 2015

I am using R to analyze DHS data (from Zimbabwe,Tanzania and others) on domestic violence, specifically spousal violence. To analyze such complex survey in R, one need to use the survey package. My analyses goes as follows:

#import and subset the data to get only respondents who received the Domestic Violence module
data=subset(zim,V044=="Woman selected and interviewed")
#initiating the suvydesign object
samplewt <- D005/1000000

(a) mydesign <-
id = ~V001+V002,
strata = ~V022 ,
data = data ,
weight = ~samplewt,
From this design, id is the cluster identifier, strata is the variable specifying strata, weight is the variable specifying sampling weights and data is the data frame.

Design (a) looks like this: Stratified 2 - level Cluster Sampling design (with replacement)
With (406, 6542) clusters. svydesign(id = ~V001 + V002, strata = ~V022, data = data, weight = ~samplewt)

Going through the Zimbabwe DHS report, I understand that DHS used stratified, two-stage cluster design implying that each stage of cluster has an identifier as V001 for first stage and V002 for second stage, which led to my choice of the cluster identifiers in the design stated above. However going through other posts on this forum, I realized that most analysts use STATA and their design looks like this if I am to do it in R:

(b) mydesign <-
id = ~V021,
strata = ~V022 ,
data = data ,
weight = ~samplewt,

Where V021 is the primary sampling unit. Which one of these designs is correct?

Design (b) looks like this : Stratified 1 - level Cluster Sampling design (with replacement)
With (406) clusters.
svydesign(id = ~V021, strata = ~V022, data = data, weight = ~samplewt)

About the finite population correction(fpc), which variable in the DHS data defines fpc? I have read that fpc is not often used when analyzing DHS data, is it okay to ignore fpc? ignoring fpc results into a design in which the sampling is with replacement.

I was able to replicate the values in table 16.10 on page 263 of the Zimbabwe DHS report using design (a). However, I was not able to replicate the values on table 16.9 on page 280 of the Tanzania DHS report. Any ideas about this as well would be great.

Many thanks,
Read Message
Read Message
Read Message
Previous Topic: Pakistan 2014 IPV analysis -- multi-level weights
Next Topic: Sexual/physical violence experience among men
Goto Forum:

Current Time: Sun Aug 14 02:52:24 Coordinated Universal Time 2022