Working With Vectors



R is used to analyze data. Data is in general in the form of data frame, a structure with rows and columns. Each column represents a variable to be analyzed, and can be considered as a vector. Therefore, it is important to know how to deal with vectors.

We will use an existing data set to manipulate vectors. This data set is known as "iris" and consists of 50 samples from each of three classes of iris flowers: sepal length, sepal width, petal length, petal width and class. We will be focusing on the variable sepal length.

head(iris)        # gives us a look at the first few lines of this data set.
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

We will use the square bracket [ ] to extract infromation from a vector.

head(Sepal.Length)   # got error message because R does not know which Sepal.Length we are working with
## Error in eval(expr, envir, enclos): object 'Sepal.Length' not found
head(iris$Sepal.Length)      # asks R to work with Sepal.Length of iris data set , that is why $ sign  
## [1] 5.1 4.9 4.7 4.6 5.0 5.4
attach(iris)                 # makes iris data set available so that we do not need to use $ every time
Sepal.Length[5]              # extracts the 5th value   
## [1] 5
ext_1 <- Sepal.Length[5]     # extracts the 5th value and assign it to a variable
Sepal.Length[c(1, 3, 12, 36)] # extracts 1st, 3rd, 12tn and 36th element of Sepal.Lenght
## [1] 5.1 4.7 4.8 5.0
Sepal.Length[6:11]           # extracts the values from 6th to 11th elements
## [1] 5.4 4.6 5.0 4.4 4.9 5.4
detach(iris)                 # do not forget to detach data set when you are done

We can also use a logical expression to extract data from a vector. This is a really useful way to manipulate a vector.

attach(iris)
Sepal.Length[Sepal.Length > 7]   # extracts all elements with a value greater than 7
##  [1] 7.1 7.6 7.3 7.2 7.7 7.7 7.7 7.2 7.2 7.4 7.9 7.7
Sepal.Length[Sepal.Length == 5.8]  # extracts all elements with a value equal to 5.8
## [1] 5.8 5.8 5.8 5.8 5.8 5.8 5.8
Sepal.Length[Sepal.Length <= 4.5]  # extracts all elements with a value less than or equal to 4.5
## [1] 4.4 4.3 4.4 4.5 4.4
Sepal.Length[Sepal.Length > 7 & Sepal.Length != 7.9] # values greater than 7 and not equal to 7.9
##  [1] 7.1 7.6 7.3 7.2 7.7 7.7 7.7 7.2 7.2 7.4 7.7
Sepal.Length[Sepal.Length > 7.8 | Sepal.Length < 4.5] # values greater than 7.8 or less than 4.5
## [1] 4.4 4.3 4.4 4.4 7.9
detach(iris)

We can also change the values of some elements in a vector if we want to. It is a better practice always to keep the original vector or data frame and create a copy just in case you want to change some values.

attach(iris)
x1 <- Sepal.Length        # keeps the original vector and create a copy of it
x1[4]                     # displays the 4th value 
## [1] 4.6
x1[4] <- 4.8
x1[4]                     # displays the new value
## [1] 4.8
x1[c(5, 6)] <- 4.9        # replaces the 5th and 6th element with 4.9
x1[x1 < 4.5] <- 5.1       # replaces values that are less than 4.5 with 5.1
detach(iris)

Next thing we will learn is how to order the values in a vector.

attach(iris)
x1 <- sort(Sepal.Length)       # sorts the values from lowest to highest 
head(x1)                       # displays the first ten values
## [1] 4.3 4.4 4.4 4.4 4.5 4.6
x2 <- sort(Sepal.Length, decreasing = TRUE)     # reverses the sort from highest to lowest
head(x2)
## [1] 7.9 7.7 7.7 7.7 7.7 7.6
# Next line will also do the same thing by using sort() and rev() function together
x2 <- rev(sort(Sepal.Length))     # first sort the vector and then reverse the sorted vector
detach(iris)

If we want to sort one vector according to the values of another vector, we should use the order() function. The trick with the order() function is that it will return the ascending or decending position of each element, not the values. Then we can use those positions, which is a vector, to see the sorted valuesof the other vectors.

attach(iris)
x1 <- order(Sepal.Length)        # creates a vector of the positions of the ascending values
head(x1)   # smallest value of the vector is the 14th element of this vector and so on
## [1] 14  9 39 43 42  4
x2 <- Sepal.Length[x1]     # Sepal.Length is not a function. It is a vector. That's why we use [ ]
head(x2)              # displays the ascending values of the vector by using x1
## [1] 4.3 4.4 4.4 4.4 4.5 4.6
x3 <- Petal.Length[x1]      #  displays the values of another vector according to the sorted vector 
head(x3)     # the smallest value (4.3) of Sepal.Length corresponds to the value (1.1) of Petal.Length      
## [1] 1.1 1.4 1.3 1.3 1.3 1.5
x4 <- Species[x1]     # Species is a character variable.
head(x4)          # The smallest 6 values of Sepal.Length belongs to the "setosa" species
## [1] setosa setosa setosa setosa setosa setosa
## Levels: setosa versicolor virginica
detach(iris)