top of page

How to import and export data using R-programming


Introduction

Importing data into R is very simple and easy. For SPSS data file, we will use Hmisc package, for STATA and Systat data file, we will use foreign package. for some reason we will use readr package to import csv and text file into R.

In this notes, we will use Data Exercise.sav, demo.csv, KAP1.dta for demonstration. all these data can be download here.

At first, let us setting up the environment of R. we need to make sure the working directory is correctly setup. and clear all object that we not going to use in the environment. this is because R consume/depend at your physical memory (RAM).

1) to clear all objects in R environment:

rm(list=ls())

2) to check working directory

#to check current working directory

getwd()

#if not satisfied, we can change the working directory using

setwd("~/data/whateveryourfoldernamed")

importing from comma delimited text file

First, we must make sure the original file have these criterias: 1. The first row of the file should contain variable names at the top. 2. The variable names must include normal letters, numbers and underscores (for example: Age, Admission_ date, Score_1). 3. Should not include special characters such as space, @,$,#(for example: “Admission Date”, #race, @prob). 4. The data should begin at the second row and first column onwards. 5. Everything that is not part of the data should be removed (for example: comments, labels, graphs, pivot tables).

This is the example of comma delimited text file mydata <- read.table("demo.csv", header=TRUE, sep=",")

Next, we can check for the dimension of the data set by using this code:

dim(mydata) #To look at the dimensions of the data

# it will display Eg: 6400 28 # means that in the dataset we have: # 6400 observations (rows) # 28 variables (columns)

we also can view some of the observations by:

head(mydata) # it will show first 6 observations for all variables head(mydata[1:4], n=10L) #it will show first 10 observation (10L) #for variable number 1 until number 4.

Importing from an Excel File

one of the best ways to read an excel file is to export it to a comma delimited file and import it using method above. But it kinda stupid way. The best way is to use the xlsx package to access excel file. The first row should contain variable/column names.

As start, we can read in the first worksheet from the workbook demo.xls by double click on the orginal excel file and make sure the first row contains variable names and all the criteria mention in the previous section.

To start importing excel file, we need to install a library named xlsx.

install.packages("readxl")

If you already install the package, please load it up by calling the library.

library(readxl)

Now we can begin to import the data from excel file

mydata1 <- read_excel("demo.xls", sheet='demo')

#We need to specify the sheet name.

#In this case the sheet name is 'demo'.

In the file name, you can mention either it is .xls or .xlsx, There are another method to import data from excel, without specifying the sheet name. But you need to know which number of sheet you want to import,in this case, the sheet is in the 1st sheet.

mydata1 <- read_excel("demo.xls", 1)

#We need to specify the sheet name.#In this case the sheet name is 'demo'

Importing dataset from SPSS file

To import data file from SPSS format file we need to install ‘foreign’ library. In foreign library, we able to import data file from SPSS and STATA.

install.packages("foreign")

If you already install the package, please load it up by calling the library

library(foreign)

Now we can begin to import the data from SPSS file

data.sav = read.spss("Data Exercise.sav", to.data.frame = TRUE) #SPSS #Last option converts value labels to R factors

There are another method to import SPSS data file into R, by using Hmisc library

install.packages("Hmisc") library(Hmisc) mydata1 <- spss.get("Data Exercise.sav", use.value.labels=T)

Importing data from STATA

In this example, we will use foreign library to import data

library(foreign)

Now, we will begin to import STATA data file format .dta into R.

mydata <- read.dta("KAP1.dta")

It actually pretty fast to import STATA data file into R.

Exporting Data to other file format

There are numerous methods for exporting R objects into other formats. For SPSS, SAS and Stata, you will need to load the foreign packages. For Excel, you will need the xlsReadWrite package.

Exporting to a Tab Delimited text file

First, we need to load foreign library to enable us to export to text file.

library(foreign)

To begin exporting, let consider mydata object at the last importing notes (import from STATA) which is KAP1.dta.

write.table(mydata, "text.txt", sep="\t")

The function works are where x is our object data (mydata), then file=“” is the new file that we want to create. Others, quote, eol, na, dec, row.names, col.names, qmethod and fileEncoding definition can be view in the document file of write.table by :

?write.table

I would like to emphasize on ‘sep’ function in the command line. For tab delimited we use sep=“’’, for comma delimited is sep=“,”, For space delimited is sep=" “. Now we can find the text.txt file that we have created in our working directory.

Export to an Excel Spreadsheet

We need to call xlsx library in order to export the data into excel spreadsheet

library(xlsx)

To write into excel spreadsheet

write.xlsx(mydata, "myexcel.xlsx")

To Export into SPSS data file

To write out text datafile and an SPSS program to read it we need to use foreign library.

library(foreign)

Begin to export to SPSS datafile

write.foreign(mydata, "mydata.txt", "mydata.sps", package="SPSS")

To Export into STATA data file

To Export datafile into STATA data file format still under foreign library

library(foreign)

Begin to export to STATA binary datafile format

write.dta(mydata, "mydata.dta")

Comments


bottom of page