The DHS Program User Forum
Discussions regarding The DHS Program data and results
Home » Data » Dataset use (other programs) » filtering hhid in the HR file using R after importing in haven and using dplyr
filtering hhid in the HR file using R after importing in haven and using dplyr [message #13235] Sun, 08 October 2017 11:04 Go to next message
newtoDHS2017
Messages: 6
Registered: September 2017
Location: Europe
Member
Dear forum,

I have a basic question on how to correctly select cases based on the hhid field in the HR file. I have imported the Rwanda 2015-16 DHS HR (household file) in STATA format into R using the haven package, and all seems to have worked well.

However, I am trying to select certain cases of HH records by making use of the caseid (the first field in the HR file) with no success. I noticed that in R, the caseid (hhid) is a character field with values such as "1 1", "1 2", "1 10" etc as this is a combination of the cluster number (hv001) and the household number (hv002).

I used the filter command of the R dplyr package to just get the first case but it does not work:

code example - t <- select(HR_dataset, hhid=="11"). It does not return the line of record.

Then I thought it looks like there might be spaces in the hhid field given the way some hhid values are displayed, so equally I tried putting a space between the two 1s but this doesn't make a difference:

t <- select(HR_dataset, hhid =="1 1")

Can you let me know what I did wrong, or perhaps I should filter the cases using the cluster number and the household number (hv001,hv002) instead?

Thanks a lot for your help,

newtoDHS2017
Re: filtering hhid in the HR file using R after importing in haven and using dplyr [message #13252 is a reply to message #13235] Mon, 09 October 2017 07:54 Go to previous messageGo to next message
Bridgette-DHS is currently offline  Bridgette-DHS
Messages: 3035
Registered: February 2013
Senior Member
Following is a response from Senior DHS Stata Specialist, Tom Pullum:

The easiest solution will be to use hv001, hv002, and hvidx (cluster id, household id, and line number, respectively), all of which are numeric rather than strings.



Re: filtering hhid in the HR file using R after importing in haven and using dplyr [message #13253 is a reply to message #13252] Mon, 09 October 2017 10:06 Go to previous message
newtoDHS2017
Messages: 6
Registered: September 2017
Location: Europe
Member
Dear Bridgette and Tom,

Many thanks for this.I used the cluster and hh data and they work fine now.

thanks a lot for your response.
Previous Topic: Reading data files into R Studio
Next Topic: import data file to R
Goto Forum:
  


Current Time: Fri Apr 19 22:12:10 Coordinated Universal Time 2024