The DHS Program User Forum
Discussions regarding The DHS Program data and results
Home » Topics » Child Health » Predciting Low Birth Weight Prediction with Random Forest in R
Predciting Low Birth Weight Prediction with Random Forest in R [message #29914] Thu, 22 August 2024 01:46 Go to next message
Muhammad Naeem is currently offline  Muhammad Naeem
Messages: 1
Registered: August 2024
Member
Hi everyone,

I'm working with PDHS(2017-2018) data from the Pakistan Demographic Health Survey and have a variable indicating low birth weight, which I've categorized as follows:

0: Normal weight (2500 grams or above)
1: Low birth weight (<2500 grams)
In my dataset, the proportions are:

Normal weight: 80%
Low birth weight: 20%
I am using a Random Forest classifier in R to predict low birth weight. However, the model is producing very few correct predictions for the low birth weight category (1).

Given this class imbalance, what strategies or adjustments can I make to improve the model's performance for the minority class? Any advice or suggestions would be greatly appreciated!

Thank you!

  • Attachment: ml_pak.csv
    (Size: 88.46KB, Downloaded 18 times)
Re: Predciting Low Birth Weight Prediction with Random Forest in R [message #29940 is a reply to message #29914] Tue, 27 August 2024 14:17 Go to previous message
Janet-DHS is currently offline  Janet-DHS
Messages: 880
Registered: April 2022
Senior Member
Following is a response from DHS staff member, Tom Pullum:

I suggest that you tabulate the responses (m19 in the KR file). You will see that 82% of the children were either "not weighed" or have a "don't know" code. For children with numerical values, most are heaped at multiples of 500 grams, with a substantial bump right at 2500 grams, the boundary for LBW. There is a lot of omission and measurement error. You are further losing information by collapsing an interval-level variable into a binary variable.

But even if you had no omission or measurement error, it would be very hard to predict (or fit) birthweight, especially using the variables in DHS surveys. For example, the surveys do not include good estimates of gestational age, which has a clear causal relationship with birthweight.

There is a huge literature on this topic. DHS staff cannot help. Perhaps other users can offer suggestions.
Previous Topic: Under-5 Stunting variable for Household Level
Next Topic: post discharge child mortality rate
Goto Forum:
  


Current Time: Fri Nov 8 08:42:42 Coordinated Universal Time 2024