R is used to analyze data. Data is in general in the form of data frame, a structure with rows and columns. Each column represents a variable to be analyzed, and can be considered as a vector. Therefore, it is important to know how to deal with vectors.
We will use an existing data set to manipulate vectors. This data set is known as "iris" and consists of 50 samples from each of three classes of iris flowers: sepal length, sepal width, petal length, petal width and class. We will be focusing on the variable sepal length.
head(iris) # gives us a look at the first few lines of this data set.
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## 1 5.1 3.5 1.4 0.2 setosa ## 2 4.9 3.0 1.4 0.2 setosa ## 3 4.7 3.2 1.3 0.2 setosa ## 4 4.6 3.1 1.5 0.2 setosa ## 5 5.0 3.6 1.4 0.2 setosa ## 6 5.4 3.9 1.7 0.4 setosa
We will use the square bracket [ ] to extract infromation from a vector.
head(Sepal.Length) # got error message because R does not know which Sepal.Length we are working with
## Error in eval(expr, envir, enclos): object 'Sepal.Length' not found
head(iris$Sepal.Length) # asks R to work with Sepal.Length of iris data set , that is why $ sign
## [1] 5.1 4.9 4.7 4.6 5.0 5.4
attach(iris) # makes iris data set available so that we do not need to use $ every time Sepal.Length[5] # extracts the 5th value
## [1] 5
ext_1 <- Sepal.Length[5] # extracts the 5th value and assign it to a variable Sepal.Length[c(1, 3, 12, 36)] # extracts 1st, 3rd, 12tn and 36th element of Sepal.Lenght
## [1] 5.1 4.7 4.8 5.0
Sepal.Length[6:11] # extracts the values from 6th to 11th elements
## [1] 5.4 4.6 5.0 4.4 4.9 5.4
detach(iris) # do not forget to detach data set when you are done
We can also use a logical expression to extract data from a vector. This is a really useful way to manipulate a vector.
attach(iris) Sepal.Length[Sepal.Length > 7] # extracts all elements with a value greater than 7
## [1] 7.1 7.6 7.3 7.2 7.7 7.7 7.7 7.2 7.2 7.4 7.9 7.7
Sepal.Length[Sepal.Length == 5.8] # extracts all elements with a value equal to 5.8
## [1] 5.8 5.8 5.8 5.8 5.8 5.8 5.8
Sepal.Length[Sepal.Length <= 4.5] # extracts all elements with a value less than or equal to 4.5
## [1] 4.4 4.3 4.4 4.5 4.4
Sepal.Length[Sepal.Length > 7 & Sepal.Length != 7.9] # values greater than 7 and not equal to 7.9
## [1] 7.1 7.6 7.3 7.2 7.7 7.7 7.7 7.2 7.2 7.4 7.7
Sepal.Length[Sepal.Length > 7.8 | Sepal.Length < 4.5] # values greater than 7.8 or less than 4.5
## [1] 4.4 4.3 4.4 4.4 7.9
detach(iris)
We can also change the values of some elements in a vector if we want to. It is a better practice always to keep the original vector or data frame and create a copy just in case you want to change some values.
attach(iris) x1 <- Sepal.Length # keeps the original vector and create a copy of it x1[4] # displays the 4th value
## [1] 4.6
x1[4] <- 4.8 x1[4] # displays the new value
## [1] 4.8
x1[c(5, 6)] <- 4.9 # replaces the 5th and 6th element with 4.9 x1[x1 < 4.5] <- 5.1 # replaces values that are less than 4.5 with 5.1 detach(iris)
Next thing we will learn is how to order the values in a vector.
attach(iris) x1 <- sort(Sepal.Length) # sorts the values from lowest to highest head(x1) # displays the first ten values
## [1] 4.3 4.4 4.4 4.4 4.5 4.6
x2 <- sort(Sepal.Length, decreasing = TRUE) # reverses the sort from highest to lowest head(x2)
## [1] 7.9 7.7 7.7 7.7 7.7 7.6
# Next line will also do the same thing by using sort() and rev() function together x2 <- rev(sort(Sepal.Length)) # first sort the vector and then reverse the sorted vector detach(iris)
If we want to sort one vector according to the values of another vector, we should use the order() function. The trick with the order() function is that it will return the ascending or decending position of each element, not the values. Then we can use those positions, which is a vector, to see the sorted valuesof the other vectors.
attach(iris) x1 <- order(Sepal.Length) # creates a vector of the positions of the ascending values head(x1) # smallest value of the vector is the 14th element of this vector and so on
## [1] 14 9 39 43 42 4
x2 <- Sepal.Length[x1] # Sepal.Length is not a function. It is a vector. That's why we use [ ] head(x2) # displays the ascending values of the vector by using x1
## [1] 4.3 4.4 4.4 4.4 4.5 4.6
x3 <- Petal.Length[x1] # displays the values of another vector according to the sorted vector head(x3) # the smallest value (4.3) of Sepal.Length corresponds to the value (1.1) of Petal.Length
## [1] 1.1 1.4 1.3 1.3 1.3 1.5
x4 <- Species[x1] # Species is a character variable. head(x4) # The smallest 6 values of Sepal.Length belongs to the "setosa" species
## [1] setosa setosa setosa setosa setosa setosa ## Levels: setosa versicolor virginica
detach(iris)