Difference DTA and SAV [message #25857] |
Tue, 20 December 2022 13:52 |
victor
Messages: 4 Registered: December 2022
|
Member |
|
|
I have noticed that data is different between SAV and DTA datasets. Is this not an error?
Specifically looking at Gambia 2013, in household member recode. The DTA dataset has an additional value in the hv140 variable compared to the SAV dataset.
As a result, the mean calculation for registered children is different between these two datasets. With the DTA dataset it is possible to reproduce the figures in the official report. With the SAV dataset this value is a percentage point higher. It seems to me that however that SAV is a better approximation as with the DTA dataset the respondents with value 9 are used in the calculation.
For the DTA file
hv140 n
<dbl+lbl> <int>
1 0 [Neither certificate or registered] 3294
2 1 [Has certificate] 8494
3 2 [Registered] 1961
4 8 [Don't know] 208
5 9 398
6 NA 38336
For the SAV file
hv140 n
<dbl+lbl> <int>
1 0 [Neither certificate or registered] 3294
2 1 [Has certificate] 8494
3 2 [Registered] 1961
4 8 [Don't know] 208
5 NA 38734
|
|
|
|
|
Re: Difference DTA and SAV [message #25915 is a reply to message #25907] |
Tue, 03 January 2023 08:34 |
Bridgette-DHS
Messages: 3185 Registered: February 2013
|
Senior Member |
|
|
Following is a response from Senior DHS staff members, Tom Pullum and Trevor Croft:
In CSPro, the package in which the data files are originally constructed, we have two different codes:
1. Blank this is the not applicable code for use when a question is not applicable.
2. 9 (or 99, 999, etc.) this is the missing value code for use when a question is applicable, but a response was not given.
9 (etc.) is used very rarely nowadays with CAPI, but was needed for paper questionnaires, and was used by data entry staff when a particular question was not NA but the interviewer forgot to enter a response.
Usually in SPSS, blank is the system-missing value, and 9 is the user-missing value. In Stata, usually both cases are converted to missing.
We are surprised that the 9 shows up in the Stata file. Perhaps the DTA file for this survey was not created in our usual way, which would convert the CSPro 'missing' to a dot in Stata.
You are reading the Stata and SPSS files with R, and we believe the conversion to R is being handled in different ways. You will get agreement if, when you read the Stata file with R, you add a line to change the 9 to a dot.
|
|
|
Re: Difference DTA and SAV [message #25916 is a reply to message #25915] |
Tue, 03 January 2023 09:25 |
victor
Messages: 4 Registered: December 2022
|
Member |
|
|
Thank you for this explanation. Do I understand correctly that in this case the SAV file provides the data in the correct manner?
If so, that would mean that there is a mistake in the final report for Gambia 2013. On page 21, the total should be 58.1 not 57.1, 15.2 not 14.9 and 73.3 not 72.0.
|
|
|