The DHS Program User Forum
Discussions regarding The DHS Program data and results
Home » Topics » HIV » recoding v034X (line number fo husband)
recoding v034X (line number fo husband) [message #9853] Tue, 31 May 2016 11:12 Go to previous message
mm4599 is currently offline  mm4599
Messages: 3
Registered: May 2016
Location: New York
Member
I am using the 2003/04 Tanzania HIV/AIDS Indicator Survey, the 2007/08 Tanzania AIDS Indicator Survey (AIS) and the 2011/12 AIS

The sample size in the 2003/04 individual recode file (IR): tzir4afl.dta is 12,522. The line number of husband variable (v034X) variable is wide and there are no observations in v034_5 - v034_8. Tabulating v034_1 - v034_4 reveals that:

v034_1 v034_2 v034_3 v034_4
0 677 203 22 4 906
1 3,541 2 3543
2 2,635 36 6 2677
3 199 38 3 240
4 119 4 4 127
5 50 3 1 54
6 40 4 44
7 35 2 37
8 33 2 35
9 22 3 1 26
10 21 1 22
11 14 14
12 10 10
13 5 1 6
14 6 6
15 3 3
16 2 2
17 2 2
20 1 1
22 1 1
31 1 1
35 1 1
36 2 2
39 1 1
42 1 1
43 1 1
0
Total 7,422 300 36 5 7,763

When you reshape the file ( gen id = _n; reshape long v034_ , i(id) j(linenum) and tabulate you get v034_ = 7,763 but the dataset increases to 50,088 records with 42,325 missing observations for the v034_ variable.

What do I need to do so that the sample size for the reshaped file is NOT 50,088?

When I create a variable (gen v034= .) and use the replace command to group the multiple husband line number categories, I get 12,522 records (7422+ 5100), which is not correct,
tab v034_

v034_ Freq. Percent Cum
0 677 9.12 9.12
1 3,541 47.71 56.83
2 2,635 35.5 92.33
3 199 2.68 95.01
4 119 1.6 96.62
5 50 0.67 97.29
6 40 0.54 97.83
7 35 0.47 98.3
8 33 0.44 98.75
9 22 0.3 99.04
10 21 0.28 99.33
11 14 0.19 99.51
12 10 0.13 99.65
13 5 0.07 99.72
14 6 0.08 99.8
15 3 0.04 99.84
16 2 0.03 99.87
17 2 0.03 99.89
20 1 0.01 99.91
31 1 0.01 99.92
35 1 0.01 99.93
36 2 0.03 99.96
39 1 0.01 99.97
42 1 0.01 99.99
43 1 0.01 100
Total 7,422 100

Another thing I tried was to spilt the dataset into 4 files: one for v034_1, one for v034_2, one for v034_3, and one for v034_4 and then merge the files. Doing so I obtain 12,863 records (7763 + 5100 missing or 12522+341)

Which is the correct way to reformat this dataset so that I only have one variable v034?

I am attaching the do file.

Thanks


MM
 
Read Message
Read Message
Read Message
Read Message
Previous Topic: General information on HIV and GIS
Next Topic: Bangladesh 2011 dataset
Goto Forum:
  


Current Time: Fri Apr 26 12:25:37 Coordinated Universal Time 2024