PGDA Post-Doctoral Fellows
JOCELYN FINLAY
PGDA Research Associate 2008-
PGDA Fellow 2006-2008
A guide to setting up your computing requirements to conduct analysis in Stata using multiple Demographic and Health Surveys
Jocelyn E. Finlay
jfinlay@hsph.harvard.edu
The following will help you access computing, data, and template code that has been created for Harvard users (especially those affiliated with the Center for Population and Development Studies) who would like to conduct statistical analysis using multiple Demographic and Health Surveys. Others not affiliated with Harvard may find the .do files useful.
In constructing these files I have drawn on work done by other PGDA members (past and present) especially that of Isabel Günther, Sebastian Linnemayr and Günther Fink.
To get set up, you have four core tasks.
- Get permission from the Demographic and Health Survey (DHS) to use the data for your project.
- Contact the Harvard MIT Data Center (HMDC) to request an account with the Research Computing Environment. They will set you up with FileZilla and NoMachine to use their computing facilities.
- Download the .do files from my website, and adapt them for your project
- Report back any feedback or errors so that we can continue to improve this computing, data, and template code package.
I will go into each of these tasks in detail.
- Get permission from the DHS to use the data for your project.
- You will need to contact the DHS to request permission to access the Individual Recode and Wealth Index DHS data for all countries with unrestricted data. http://www.measuredhs.com/accesssurveys/access_instructions.cfm
- Then contact Martha Fay (mfay@hsph.harvard.edu) or Rachel Boyce (rboyce@hsph.harvard.edu) and provide them with details of your data access permission. They will then ensure that you have access to the DHS through the HMDC. More on this in step 2.
- Contact the HMCD to request an account with the Research Computing Environment. They will set you up with FileZilla and NoMachine to use their computing facilities.
- Contact the HMDC to request a Research Computing Environment Account (RCE) with the purpose of accessing the DHS data files. http://support.hmdc.harvard.edu/contact
- The HMDC will instruct you to download NoMachine to access the RCE from your computer, http://www.nomachine.com/download.php, and FileZilla Client for file transfer between the RCE and your own computer, http://filezilla-project.org/ . The HMDC have very detailed instructions on how to set these two programs up, you can refer to those at this stage.
- Download the .do files from my website, and adapt them for your project. There are many files that complement each other.
- data_construction.do
- Change the path names to suit your own file access and destination
- Change the list of surveys under the Global Surveys to suit your project. The one that I have done is for all the Standard DHSs. Use the DHS_Survey_List.xls to assist with your selection.
- The data_construction.do file is made up of many components that often call on other .do files. Go through the data_construction.do file and check that all the components are necessary (wealth index, region, religion, continents, etc)
- do_file_for_each_survey.do
- At this point in the loop you can either do a keep command (the most basic would be “keep file_name_IR v000-v008”), or you can run a .do file that you want to be applied to each survey. I have done some other .do files dealing with child histories, if you are interested please email me.
- If you do a .do file, take note of a few important bits that you must include:
- Capture gen xxxx=.: the “capture” command is very useful here, as it will generate the variable equal to missing if it does not exist in the particular survey that you are calling on, but will not replace the variable with missing values if it does exist. The benefit of using this command is that if you do any operations on a variable that is not in the survey, then your loop will stop and an error will return. The other is that when you do “keep” at the end of do_file_for_each_survey.do an error will return if this variable was not found in the survey.
- You will need a keep command at the end to truncate you dataset into a manageable size. At minimum you will need “keep file_name_IR v000-v008”: you will need these variables for other code later in the data_construction.do file.
- Notice the treatment of religion and region in this do_file_for_each_survey.do file.
- If you are working with child or sibling histories you will need to use the capture command for these too. I will upload some sample .do files I have been using for other projects I am working on. A quick look at these will tell you how to deal with the histories.
- country_codes.do
- The country numbers are a numerical coding used by the PGDA.
- WI_merge.do
- You could run this as a separate do file, save the data, and then just call it in for each project you do. I followed the DHS guide on how to merge the wealth index files. Given that we now have all the surveys merged together and not just merging a single wealth index file with a single individual recode file, this is a possible source of error. If you find any problems please let me know.
- religion_recoding.do
- Let me know if you find any errors.
- region_recoding.do
- Again, let me know if you find any errors.
- continent_dummies.do
- Report back any feedback or errors so that we can continue to improve this computing, data, and template code package. The best way to contact me is via email: jfinlay@hsph.harvard.edu.
Note that the provision of the stata .do files on this site in no way authenticates users to access or use the Demographic and Health Survey without permission from the rightful organization. It is the researcher's responsibility to gain permission to the DHS and to seek any IRB approval that you may require.
|