Home » Data » Dataset use in Stata » HELP!: Analysis on youth-specific age group only
HELP!: Analysis on youth-specific age group only [message #1627] |
Wed, 19 March 2014 11:16 |
malayaka
Messages: 18 Registered: March 2014
|
Member |
|
|
Hello,
First let me start by stating that I am by no means a STATA/statistical software person, and definitely in over my head on this task but it needs to get done. Thus, I apologize for the extremely detailed question/post.
I am attempting the following:
1) Isolate variables on a specific age group(s): 15-19yrs, 15-24yrs, 20-24yrs from the 2010 DHS Malawi STATA dataset.
2) For each age group, I would like a breakdown of:
- Female and Male
- Region of residence
- Wealth quintile
3) I would then want the categories above (#2) to go with the following variables by percent:
- Education (ie. secondary school progression rate, highest grade completed)
- Literacy (ie. Percent literate)
- Employment (ie. Percent employed in last 12 months, Percent in agriculture, control cash earnings)
- Fertility (ie. Percent married, percent married by 18yrs, percent of first birth by 18yrs, Exposed to family planning messages in media, method of contraception)
- AIDS/STI (ie. Percent who had STI in last 12 months)
- Domestic Violence (ie. Percent who experience phys violence since age 15yrs, ever experienced sexual violence)
- Health Behaviors (ie. Percent who use tobacco)
Using Fertility as an example: Among 15-19yrs olds, what percent of female 915-19yrs) are married? What percent of males (15-19 yrs) are married? What is the difference (if any) among wealth quintiles (each for females and males)? What difference (if any) among regions of residence (norther, central, southern)(each for females and males).
I am working with STATA-IC and understand that I need to "extract" the variables because the dataset is too large, then merge them to create one dataset with my select variables.
Please help!!
Thank you in advance.
|
|
|
Re: HELP!: Analysis on youth-specific age group only [message #1628 is a reply to message #1627] |
Wed, 19 March 2014 12:29 |
|
user-rhs
Messages: 132 Registered: December 2013
|
Senior Member |
|
|
Dear Malayaka,
I urge you to read the FRQ, FRW, and MAP files in the zip file that also contains the data to determine the names of the variables that you need. The files with extension FRQ, FRW, and MAP can be opened in Notepad (click on the file--> right click--> open with...--> select notepad).
If you are in doubt about what the variables mean, look at the DHS Survey Questionnaire that is at the end of the Malawi Report--Appendix G (Link: http://dhsprogram.com/pubs/pdf/FR247/FR247.pdf ). You can also look at the DHS recode manual for even more detail on what the variables mean.
Then, you can follow the steps I posted in an earlier post on how to extract specific variables when you only have Stata IC. Link to the post here: http://userforum.dhsprogram.com/index.php?t=msg&th=778&a mp;goto=1294&S=de16df2b6faf6307871c86871edc98b9#msg_1294
Alternatively, you may be able to create those tables in StatCompiler.
hth,
RHS
|
|
|
|
Re: HELP!: Analysis on youth-specific age group only [message #1632 is a reply to message #1631] |
Wed, 19 March 2014 16:31 |
|
user-rhs
Messages: 132 Registered: December 2013
|
Senior Member |
|
|
malayaka wrote on Wed, 19 March 2014 15:55Is it possible for me to merge these two (or is it necessary?).
Depends on what you're trying to answer. If you are trying to to see agreement in response between spouses, then definitely merge these on household unique identifier. If not, keeping them separate should be fine.
malayaka wrote on Wed, 19 March 2014 15:55The women variables are different than that of the men, v102 and mv102 respectively.
As you continue to work with DHS data, you will find that DHS is very good about keeping variable naming conventions so that you can figure out whether the variable pertains to the woman (prefix v), man (prefix mv), household (prefix hv), child (prefix b), maternal questions (m), local variables (s), and so on.
malayaka wrote on Wed, 19 March 2014 15:55how do I make this "new" dataset specifically only for 15-19, 15-24 and 20-24 year olds?
You can use the Stata command -keep- and the age variable to drop observations outside of your age range.
keep if v012>35 will delete everyone who is younger than or equal to 35 from your dataset
*Important: Stata is case-sensitive. All built-in commands are in lowercase. Most user-written commands are also lowercase.
malayaka wrote on Wed, 19 March 2014 15:55and be able to get the breakdown of my variables (education, fertility, etc) per wealth quintile as well as region? The -tab- command is used to get tabulations (and cross-tabulations). syntax is tab rowvbl colvbl
tab v130 v106 gets you the tabulation between religion (row) and education (column):
. tab v130 v106
............|..........highest.educational.level
...religion.|.no.educat....primary..secondary.....higher.|.. ...Total
------------+--------------------------------------------+-- --------
...orthodox.|.....2,836......2,582........910........667.|.. ...6,995.
...catholic.|........77.........81..........7.........12.|.. .....177.
.protestant.|.....1,212......1,366........209........149.|.. ...2,936.
.....muslim.|.....3,974......1,777........266........153.|.. ...6,170.
traditional.|........69.........22..........1..........1.|.. ......93.
......other.|.......107.........26..........1..........2.|.. .....136.
.........99.|.........3..........4..........1..........0.|.. .......8.
------------+--------------------------------------------+-- --------
......Total.|.....8,278......5,858......1,395........984.|.. ..16,515
Be aware that missing variables are coded as 99 and you need to recode it into system missing (.) for Stata to recognize it as a missing value. UCLA has an excellent resource on how to get started on Stata (Link: http://www.ats.ucla.edu/stat/stata/sk/ ), and I encourage you to spend some time on their site to figure out how to do what you need to do.
Stata has great documentation Stata commands. You simply need to type help and the command name you want to find out more about and a window will pop up showing you the syntax, the options for that command, and examples at the bottom of the page.
Again, StatCompiler (http://www.statcompiler.com/) might be able to get you the numbers you need without you having to write any commands, so try that first.
Good luck.
RHS
|
|
|
|
|
|
|
|
|
|
|
Re: HELP!: Analysis on youth-specific age group only [message #1769 is a reply to message #1762] |
Wed, 02 April 2014 14:17 |
|
user-rhs
Messages: 132 Registered: December 2013
|
Senior Member |
|
|
OK, try this. Use the -svyset- command to tell Stata to weight point estimates as follows:
svyset [pweight=wgt]
Use the -svy- prefix before your tabulations
svy: tab v013
This will give you the proportions of women in the dataset that are in a particular age group. If you want the actual numbers, you would specify "count" as an option. For cross-tabulations, decide whether you want the row or column percentages, and specify either row or column as an option.
e.g.
svy: tab v013
svy: tab v013, count /* gives counts instead of percentages */
svy: tab v013 v155, row /*gives the row percentages */
svy: tab v013 v155, col /*gives the column percentages */
svy: tab v013 v155 /*without specifying row or column, percentages are taken out of the total N */
Note that the syntax for tabulation is tab rowvbl columnvbl. Stata will give you an error message if your column variable has too many unique values. So for example if you tried to do literacy by single year age, you will get:
svy: tab v155 v012,row
too many values
r(134);
You should swap it and make v012 the row variable and v155 the column vbl (and switch the specification of row or column percentages as necessary)
svy: tab v012 v155,col
|
|
|
|
|
Re: HELP!: Analysis on youth-specific age group only [message #1816 is a reply to message #1798] |
Thu, 03 April 2014 17:25 |
|
user-rhs
Messages: 132 Registered: December 2013
|
Senior Member |
|
|
svyset should be executed before you run tabulations, regressions, etc. It's a matter of preference, really. I typically do svyset at the top of my do-file, right after I create the wgt variable by dividing v005 by 100000 (or as instructed by the DHS final report).
After you run svyset, anything you run with the svy prefix will use the svyset that you specified. If you want to change the specification of svyset, you can do svyset,clear and then re-specify svyset with the new settings.
The svy prefix is done with the command you are trying to execute. So for example if you want to cross-tabulate, do svy: tab variable1 variable2, col. I see you are trying to do the tabs by categories of the variable 'age in 5 year groups.' I don't think you can combine svy with by, so you can just tab for the subpop of interest (the different levels of mv013):
svy: tab mv102 mv149 if mv013==1,col
svy: tab mv102 mv149 if mv013==2,col
svy: tab mv102 mv149 if mv013==3,col
...
and so on.
|
|
|
Goto Forum:
Current Time: Thu Nov 21 15:33:05 Coordinated Universal Time 2024
|