Home » Countries » Kenya » Kenya DHS weights (Kenya DHS weights)
Kenya DHS weights [message #27983] |
Tue, 31 October 2023 06:53  |
Ritapriya Bandyopadhyay
Messages: 18 Registered: October 2023
Member |
Dear all,
I am looking at school enrollment, progression and repetition for children in Kenya - I am looking at these indicators grade-wise. I have a few questions:
1. For school enrollment - I am looking at hv121 and hv125 (member attended school current and previous year respectively) - now for Grade 1 I see there are respondents who said "yes" to hv125 (attended school last year) - what does it mean? Does it mean these children were in pre-primary or early childhood education? I am asking because the number is high.
2. Secondly, to make calculations easier - I have re-coded secondary-level grades and higher level grades. Secondary-level grades have been coded as 9,10,11,12,13,14 and similarly higher-education grades have been coded from 13-19. Nonetheless, I am using the following code:
tab agrdprvyr_i scheprvyr_i [iweight = hv005/1000000]
agrdprvyr_i is previous year grades (coded as hv127 in DHS) and scheprvyr_i is school enrollment last year (coded as hv125 in DHS)
My question is that when I use weights, why is the "number of observations" decreasing? Secondly the data is showing that only 1621 individuals were enrolled in grade 12 last year - but this is a very small number, and doesn't seem right. Am I doing anything wrong with the weights? Please help.
I have attached snapshots of both unweighted and weighted data
Re: Kenya DHS weights [message #27984 is a reply to message #27983] |
Tue, 31 October 2023 09:25   |
Messages: 3230 Registered: February 2013
Senior Member |
Following is a response from Senior DHS staff member, Tom Pullum:
First, weighted and unweighted frequencies are always different, sometimes by large amounts. If, in the PR file, you enter the following lines, you will see that the weights for children (age<18) are usually much less than 1 (ignoring the factor of one million).
gen wt=hv005/1000000
collapse (mean) wt, by(hv105)t
graph hbar wt, over(hv105) yline(1)
In this survey, apparently, fertility tended to be higher (that is, there were more children) in the strata (geographic areas) that were over-sampled. In order for the estimates to be unbiased, the weights for these strata tended to be less than one (ignoring the factor of one million).
Second, regarding the education variables, I suggest is that you read the questions, the codes, and the tables and text in the report very carefully. I agree that what you are seeing is hard to believe. I looked at the data too. There are many inconsistencies. For examples, among the 3-year old (hv105=3) I see a child who, according to hv126, attended secondary school the previous school year; there are 5 3-year olds who are in primary school this year AND were in primary school the previous year. Perhaps primary school includes day care?
The school attendance variables (hv121-hv129) were probably not edited as much as they could have been. These variables tend to be somewhat survey-specific, because of differences in standards (e.g. the starting age) and definitions from one country to another and even over time in the same country. You may want to edit the responses yourself for consistency.
Usually it would help to do "tab hv124 hv122" to see how single years line up with levels. However, when I do that, I see a possible coding error involving hv124 for single years 7-12. We will look into this further and may add another post on that.
Re: Kenya DHS weights [message #27990 is a reply to message #27987] |
Tue, 31 October 2023 10:56   |
Messages: 3230 Registered: February 2013
Senior Member |
Following is a response from Senior DHS staff member, Tom Pullum:
The weight includes a factor of 1 million just to get rid of the decimal point. The mean of hv005 (ignoring that factor) in the HR file, with households as units, is 1.
You should always use weights to get unbiased estimates of means, proportions, percentages, ratios, rates, etc. You are using the correct formula.
The frequencies in the sample, unweighted or weighted, are not important--as I said, it's the means, etc., that are important, and you are ok with them.
Re: Kenya DHS weights [message #28027 is a reply to message #27992] |
Fri, 03 November 2023 08:23   |
Messages: 3230 Registered: February 2013
Senior Member |
Following is a response from Senior DHS staff member, Tom Pullum:
I hope you recognize that grades are numbered within levels. In KEPR8B you can get the combinations of grade and level in the PREVIOUS YEAR with "tab hv127 hv126". These combinations are converted to education in single years (previous year), hv128. To see this, enter "tab hv127 hv126, summarize(hv128) means." This will give the value of hv128 within each combination of hv127 and hv126.
For the CURRENT YEAR you can do "tab hv123 hv122" and "tab hv123 hv122, summarize(hv124) means".
These tabulations do not use weights because I just want to describe the coding pattern. For actual analysis you would use weights.
You will see that year 7 can be either the 7th year of primary or the 1st year of secondary, and year 8 can be either the 8th grade of primary or the 2nd year of secondary. I am suspicious that there was a coding error somewhere in the hv12* variables, affecting years "single years" 7-12, and have asked the data processing staff to look into this. We will post more after hearing from them. Until then, I recommend caution.
To be clear, I agree that primary school is not "supposed to" include daycare. When I mentioned that this could happen I was describing potential errors or misinterpretations by the respondents or interviewers. I wanted to understand how very young children could be reported as attending primary school.
Re: Kenya DHS weights [message #28058 is a reply to message #28054] |
Mon, 06 November 2023 13:15   |
Messages: 3230 Registered: February 2013
Senior Member |
Following is a response from Senior DHS staff member, Tom Pullum:
Several variables giving single years of education need to be recoded. They were previously coded with 6 years of primary and 6 of secondary. However, as you say, Kenya has 8 and 4, respectively. For example, hv128 in the PR file was constructed as
gen hv128= hv127 if hv126==1
replace hv128= 6+hv127 if hv126==2
replace hv128=12+hv127 if hv126==3
However, it should have been calculated as follows, with 8 rather than 6 in the second line:
gen hv128= hv127 if hv126==1
replace hv128= 8+hv127 if hv126==2
replace hv128=12+hv127 if hv126==3
The variables affected--that is, that need to be recoded--are hv108, hv124, hv128, v133, v715, and mv715. The next update of the files will include corrected versions of these variables, but you can fix them now.
Re: Kenya DHS weights [message #28088 is a reply to message #28064] |
Fri, 10 November 2023 05:13   |
Ritapriya Bandyopadhyay
Messages: 18 Registered: October 2023
Member |
Hello, thanks a lot.
I am just wondering that beyond 9,10,11,12 - there are two other grades (13/14) - which I believe are coded as secondary - so what to do with that?
I wanted to confirm two more things:
1. hv121 and hv125 represent school enrollment in current and previous school year right?
2. Secondly, I wanted to confirm - I saw a variable coded as hv129 - it gives the number of students repeated, dropouts and progressed/advanced. Could you please tell me the STATA codes for calculating these? The reason I am asking this is because my definitions of repeats, progression and dropouts are the same as DHS (as I verified from DHS Recode Manual). Now for dropouts my numbers are same as DHS - but it doesn't match for repeats - hence I wanted to know the code for that.
I followed your advice, and have changed education in single years - 8 years primary and 4 years secondary. I renamed education single year previous year to "edusyr_prvyr" and education single year current year to "edusyr_cr". Now I did tab edusyr_prv hv129 and I did tab edusyr_cr hv129 - as you can see the repeat numbers are different in both these commands - but why so? This shouldn't happen - repeat numbers should stay same I believe if I use either current or previous year.
Secondly, if you see grade 7 and 8 particularly - there are very large numbers of students who are repeating - while this is believable, it doesn't match with mine, wondering why.
I have attached snapshots for:
1. tab edusyr_prvyr hv129 [education single year previous year - edusyr_prvyr]
2. tab edusyr_cr hv129 [education single year current year - edusyr_cr]
3. tab edusyr_cr repeat [education single year current year - edusyr_cr] - this is my calculation of repeats
Re: Kenya DHS weights [message #28163 is a reply to message #28154] |
Tue, 21 November 2023 02:34   |
Ritapriya Bandyopadhyay
Messages: 18 Registered: October 2023
Member |
Clarify one thing
There are two variables that can give me school attendance in current/previous year - one is hv121/hv125 (school attendance status) and I believe the other is hv124/hv128 (education single years for each each year). As per our last discussion, you asked me to construct the hv128 variable as follows:
gen hv128= hv127 if hv126==1
replace hv128= 8+hv127 if hv126==2
replace hv128=12+hv127 if hv126==3
This was done to ensure 8 years of primary and 4 years of secondary school. I have done the same - I named it hv1281 so that original variable stays intact. Now, if I do tab hv1281 or tab hv1281 scheprvyr_i (latter one is the hv125 variable) - the numbers are same. As you can see, there are 5,113 students in grade 1 - so this is fine - I have attached screenshots for both commands.
However, as you can see, when I do hv1281 hv106 - there are students of grade 1 who have replied either no education or completed secondary education to hv106 - So the 5,113 individuals in Grade 1 comprise these individuals too? This is what I am confused about - 5,113 individuals said yes to school enrollment in previous year in grade 1 - but they also include those who said no education and completed secondary education to hv106? - so how many people are enrolled in grade 1 - 5,113 or 5,064?
Secondly, if that's the case then the 1,272 individuals who were in grade 12, secondary and said they completed education last year - can I really count them as graduated? Especially since there were also students in last year grade 12 who said they completed higher education. Plus, if I do hv1281 hv129 - the numbers for advanced, dropout, and repeats don't match with the 1,272 (number of students who are supposedly graduates as per hv1281 hv106) I have attached screenshots of hv1281 hv106 and hv1281 hv129.
To avoid confusion - I was looking at hv1281 sh19aa for graduates. From the 939 people in class 12 last year who dropped out (that is, not attending this year - you can see from hv1281 hv129), I am subtracting those in class 12 last year who said they stopped schooling as they have completed school (i get this number from hv1281 sh19aa).
I understand education variables are going through re-checking - but please do help at the moment - I may be incorrect in my approach
Re: Kenya DHS weights [message #28197 is a reply to message #28192] |
Tue, 28 November 2023 11:49   |
Messages: 3230 Registered: February 2013
Senior Member |
Following is a response from Senior DHS staff member, Tom Pullum:
If you open the BR file and enter "tab b2 v190 [iweight=v005/1000000], row" you will get what I believe you are looking for, going back in single calendar years. In the BR and KR files, b2 is the calendar year of birth. Note that the household's wealth quintile at the time of the survey is not necessarily what it was in the past. Also, in the IR file, v209, v238, and v208 give the number of births the respondent had in the past year, 3 years, and 5 years (years ago, not calendar years), respectively.
Re: Number of respondents [message #29887 is a reply to message #29884] |
Fri, 16 August 2024 12:26   |
Messages: 3230 Registered: February 2013
Senior Member |
Following is a response from Senior DHS staff member, Tom Pullum:
The weights compensate for the over-sampling of clusters in smaller strata and the under-sampling of clusters in larger strata, as well as for variations in the number of vacant households and non-response. That's why the weighted numbers are more representative than the unweighted numbers. All households within a cluster have the same weight. All individuals within a cluster have the same weight.
In (a), there is NOT an assumption that all household members are in school. Don't worry about that.
For (b), I would say there are 2 ways to rank the counties. Both of them use weights. You can rank them in terms of the number of students currently attending, or in terms of the percentage of eligible students who are currently attending. The following lines do this, using the PR file for the 2014 survey. I use the "collapse" command. There are alternatives, but this seems the easiest.
use "...KEPR72FL.DTA", clear
* measure of school attendance: sh18, "attend school current year"
* county list: shregion
gen cases=1
gen inschool=1 if sh18==1
collapse (sum) cases inschool [iweight=hv005/1000000], by(shregion)
* sort the counties by number currently attending
sort inschool
gen cases_rank=_n
list, table clean
* sort the counties by current attendance rates
gen inschool_pct=100*inschool/cases
sort inschool_pct
gen rate_rank=_n
list, table clean
Re: Number of respondents [message #29898 is a reply to message #29897] |
Mon, 19 August 2024 12:26  |
Messages: 3230 Registered: February 2013
Senior Member |
Following is a response from Senior DHS staff member, Tom Pullum:
Maybe I am misinterpreting what you mean by a percentage, but I am thinking that what you mean is the percentage of eligible children/people who are actually attending. If that's what you mean, then it is definitely possible for the number who are in school to be small and the percentage to be large, or for the number to be large and the percentage to be small. The ranking by number and the ranking by percentage can be very different. Weighting has nothing to do with this.
If someone age 12-17 is in primary school rather than secondary school, they could have started primary school late, or not have passed from one grade to the next. There could be a classification error. I looked at age in more detail with "tab hv105 hv122 if hv105<25". The correspondence between age and current level of school is not what I would expect but I can't explain why primary school seems to cover such a wide range of ages.
Goto Forum:
Current Time: Thu Mar 6 21:35:53 Coordinated Universal Time 2025