Math on R data structures

# Prologue We have discussed the different data structures in R. We now discuss the arithmetic operations on vectors, matrices and data frames. Apart from such scalar or element-wise operations we discuss some basic linear algebra operations, such transposing and cross products. Then we discuss some random number generators in R, and play with random numbers with some useful R functions. Finally we discuss set theoretic operations on R data structures. # Scaler operations # Vector operations # Random numbers in R # Set theoretic operations Most of us are aware of basic set theoretic operations, such as, union, intersection and set difference. Let's say, we have two sets of murine genes: `Immunol1` and `Immunol2`. Let's define them as character arrays as follows: ```r Immunol1 <- c("Tnf", "Nfkbia", "Il1r", "Dnaja1", "Nr4a3", "Cd52", "Klf6", "Ctla4") Immunol2 <- c("S100a4", "S100a6", "Klf6", "Pax5", "Jarid2", "Nfkbia", "Ly6d", "Tnf") ``` ## Intersection Now let's say we are interested to know the genes which are common between these two sets. In such case, we are looking for the intersection of two sets, and we use the `intersect()` function in R: ```r intersect(Immunol1, Immunol2) [1] "Tnf" "Nfkbia" "Klf6" ``` So, Tnf, Nfkbia and Klf6 are common across two sets of genes. ## Set difference Further, we might be interested to know the genes which are exclusive for the set `Immunol1`. So, we want to exclude the genes which are members of both `Immunol1` and `Immunol2`, from `Immunol1`. There is a dedicated function for this in R: `setdiff()`. ```r setdiff(Immunol1, Immunol2) [1] "Il1r" "Dnaja1" "Nr4a3" "Cd52" "Ctla4" ``` We can of course execute the same operation on `Immunol2`, only the position of the inputs will change. ```r > setdiff(Immunol2, Immunol1) [1] "S100a4" "S100a6" "Pax5" "Jarid2" "Ly6d" ``` Notice that Tnf, Nfkbia and Klf6 were excluded from the sets in both cases. ## Union Finally, we might be interested to find all genes in `Immunol1` and `Immunol2`, without repeating the common genes. For that, we use the `union()` function. ```r > union(Immunol1, Immunol2) [1] "Tnf" "Nfkbia" "Il1r" "Dnaja1" "Nr4a3" "Cd52" "Klf6" "Ctla4" "S100a4" "S100a6" "Pax5" "Jarid2" "Ly6d" ``` Using `union()` will be similar to concatenating the two arrays as taking the unique entries using the `unique()` function. ```r > Immunol <- c(Immunol1, Immunol2) > Immunol [1] "Tnf" "Nfkbia" "Il1r" "Dnaja1" "Nr4a3" "Cd52" "Klf6" "Ctla4" "S100a4" "S100a6" "Klf6" "Pax5" "Jarid2" "Nfkbia" "Ly6d" "Tnf" > unique(Immunol) [1] "Tnf" "Nfkbia" "Il1r" "Dnaja1" "Nr4a3" "Cd52" "Klf6" "Ctla4" "S100a4" "S100a6" "Pax5" "Jarid2" "Ly6d" ``` When we concatenate `Immunol1` and `Immunol2` into a single array `Immunol`, that has repetitions of common genes, i.e., Tnf, Nfkbia and Klf6. Subsequently, running `unique()` on it removes the repeats. These set theoretic functions can be used for vectors, as well as data frames. However, the class of two inputs need to be same. ## Operations on more than two sets You cannot use more than two inputs in these functions. If you have more than two sets you have to apply these functions on your sets serially. ## Conditional operations on sets There are two conditional functions for set theoretic operations which are very useful for writing quick conditional statements: setequal() and is.element. Setequal is for quickly testing if two sets are identical. Note if one set has repeats it still works. However, conceptually, unique() should be used before doing set operations. is.element() for testing if an item is a member of a set.