Balancing number of observations for the dependent and independent variables [message #26309] |
Mon, 06 March 2023 21:54 |
I am Faithful
Messages: 3 Registered: March 2023
|
Member |
|
|
Hello
I am doing a research on maternal empowerment and the effect on underfive child nutrition in Malawi, I am using KR dataset.
With that in mind, I am trying to do some inferential and descriptive statistics to identify significant variables to use in the final model but for some variables, the number of observations do not balance.
For instance, when I do
tab stunting educ, the n in educ is more or less than the n in stunting. Please assist how I can balance the two
[Updated on: Mon, 06 March 2023 21:55] Report message to a moderator
|
|
|
|
|
|
Re: Balancing number of observations for the dependent and independent variables [message #26339 is a reply to message #26329] |
Wed, 08 March 2023 16:47 |
Janet-DHS
Messages: 938 Registered: April 2022
|
Senior Member |
|
|
Following is a response from DHS staff member, Tom Pullum:
I believe you are doing several different regressions, and are getting different sample sizes (n's). This can happen because the different variables may have different numbers of cases that are not applicable or are automatically excluded for different reasons. If you want all the models to have the same number of cases, then you have to define a variable "varsmissing" (for example) that is coded "1" if a case is dropped from ANY of the models and "0" otherwise. Then you re-run the models with a line "if varsmissing==0". There are alternative ways to do this, for example with "svy: subpop(X)". (If you do it with subpop, the variable X in parentheses should be 1 if you want to KEEP the case, the reverse of the coding I suggested for "varsmissing".)
There are advantages to having the same n for several models, for example if you want to test one model against another. But if you lose a lot of cases from just one or two of your variables, it may be preferable to drop the variable and keep the cases.
|
|
|