Intro to coding

## Prologue This chapter is our first forage into the world of R coding. If you are new to writing code, we suggest you take it easy and slow - work through this chapter - along with all it's examples. And, definitely make sure you try some of them on your own in R - since your R coding environment is waiting for you to write some code (Chapter 1). We start the journey of writing R code with data structures, which, in computer science parlance, means the different ways data can be stored and accessed in a programming language. We shall learn about the different data structures in R and how they can be useful to tame different types of biological data. We shall followup on R coding through next two chapters on control flow and graphics, which simply mean telling R how to process the data stored in data structures and then generating cool plots with it, respectively. Let's begin our journey with data structures. ## Some general remarks on Coding Computer programs are usually a list of steps which a computer executes in that particular order. The programmer uses one among many programming languages to achieve that goal. Although there are tens of programming languages, there are a few fundamental aspects which are common for all. Having a thorough understanding of these fundamentals enables a programmer to efficiently use any programming language of choice. In the following sections we shall discuss such aspects. We discuss this especially in the context of R here. However, these are broadly applicable to many programming languages. ## Variable assignment Variable assignment is possibly the most fundamental idea for writing computer code. Variables are spaces in the memory of the computer, which is assigned with a name and a value, utilizing the assignment sign '='. Let's say we wish to assign a value of 42 to the variable x, and a value of 84 to the variable y. To that end we should write the following code: ```r x = 42 y = 84 ``` Now, you can check what is the value of the variable x in the console, by placing the cursor beside the command prompt, and then typing x, and pressing enter key. When you try it yourself, you should see the following results: ```r > x [1] 42 ``` ```r > y [1] 84 ``` So, R knows what are the variables x and y now. What is that 1 in the square bracket before the value of the variable? Hold on, it is not important right now, but will be important later - we shall get there in next few sections. What happens if we write the following code now: ```r x = y ``` We assigned the value of y to the value of x. Now if we try checking the value of x, the following happens: ```r > x [1] 84 ``` Now R knows that the value of x is 84 and it does not remember what was the value assigned to x before this (i.e., 42). To reiterate, computer codes are series of instructions executed serially, and the latest assignment to a variable is what is accessible to the interpreter, previous assignments are erased from memory and not available for use. Note here that the following symbols are used for assignment in R: `=` and `<-`. The later is more frequently encountered than the former. We suggest you choose one and stick to that, when writing your own code. ## Algorithms We always hear this term - almost everywhere. What does it actually mean? It means recipe. Yes, you read that right. It is really that simple. The word 'recipe' is most prominently associated with - yes, you are right - with food. We would like to be very clear about something - we are really serious about food - hope that you are too. So, let's talk about recipes. A recipe is a step-by-step manual to cook a dish. It starts like, take basmati rice, dried grapes, almonds etc. Then it says something like boil the rice for 30 minutes on medium heat and add the almonds at the end. The recipe here enables you to make pulao, given that you have the ingredients and other resources like, a working oven and some utensils. A well written recipe does not have any hidden or secret steps. It is exact. It tells you how many kilo grams of rice to add in how large pot of which type, and in how many liters of water. A good recipe write-up is detailed. It is so detailed that even if you follow it for the first time cooking pulao in your life you are not going to fail. It does not assume any prior culinary training, let alone talent. It is so systematic and quantitative, you simply cannot fail if you follow it blindly. Now, that is what we call a good recipe. Algorithms are recipes for the computers to cook something with the input data and give you some outputs. And you write this recipe in programming language which your machine interprets. Only difference is unlike human beings machines are unable to extrapolate and interpret anything that is not said in the recipe. For instance, if a cooking recipe says that boil the rice for 60 minutes and then serve hot, it means all of that and implies that you turn your oven off before you leave your kitchen to serve your guests. Even most detailed recipes will not tell you to turn off your oven when you are done or to put the knife back to the stand or in the cupboard once you have chopped the carrots. These are implied. In programming recipes nothing is implied. You have to be explicit about all of it. If you miss writing even one instruction among a series of hundred, or mistakenly, even slightly alter the order of the actions, rest assured - your machine is bound to fail to serve you your desired outputs. So, to summarize, the core attribute of computer algorithms is rigor. If you enjoy achieving perfection in your daily activities, you are going to excel in programming. ## Commenting One aspect of writing good code is commenting. Commenting is writing sentences in human readable languages within your code describing how the code works, while telling the machine interpreter not to execute those sentences. In R, such human readable parts within a code starts with '#' sign. ```r # This is a comment. # This is also a comment. This line will not be executed. ``` The '#' sign is the way of telling the interpreter not to look at them - those are for humans only. The reason commenting is so important is that it increases the readability and hence reusability of your code. Each code has a lifetime. With increasing readability your code can be used by others, and thus commenting prolongs the life of a code. However, the '#' sign before the human readable language is a must. Otherwise R gives you an error saying it cannot recognize the terms, since it recognizes only the R language, and that the input is unexpected. ``` > This is a comment. Error: unexpected symbol in "This is" ``` We shall discuss about sharing your code through GitHub and other platforms. A well commented code is an asset for the community beyond it's merits on what it can do as a R script. This is true for any programming language. Note that you are commenting on your code not just for you altruism for the R community. It routinely happens that you will dig out scripts written by yourself ten years back (you will know how often that happens when you are in the business for that long). We don't know about you, but we generally forget the intricacies of a hundred line long script within a week. We are pathetic - we know. But trust us on this, remembering the flow of a hundred line code without having used it for a decade is close to impossibility. So, our comments help us - it is a time capsule for your future self. ## Accumulative power Keep accumulating the codes you write. Do not throw away anything. Individual scripts are generally small files, backing them up for future is cheap. Well, we are not saying you keep what you think is absolute garbage - yes, even professional programmers routinely generate those, and they know it when they revise and edit it. Much like writing fiction, all coding is editing. You start with something and make it better and more useful for your purpose. Where you start can be your previous script or a script written by others. The script you wrote from the blank page will possibly go through several versions when it generates worthy of something sharing with others as a publication, patent or a web-app. In many cases it would appear to you that the current code will never be usable by you, let alone others, simply because the problem you are working on is too niche. But it is quite possible that you find an use case similar to this in another couple of years in the future. You will not have to start from the scratch then, and the archived script you have will save several hours of your time. ## ChatGPT programmer We are definitely not against it, but we think it can be really great when the user knows the programming syntaxes and logic for the language for which they are seeking help from ChatGPT. A combination of clear fundamentals of programming and ChatGPT is a cool and killer combination. However, not having the fundamentals clear but only copying ChatGPT prompts can be deadly - definitely not killer. ## Debugging > To bug is human, to debug is divine. \-Anonymous In programmers' jargon, a bug is an error in the code. There can be several types of bugs, and we shall discuss that in a moment, but in the business of coding the real ninja mode is the act of debugging. Debugging means that finding out what is the error, i.e., the bug, why does it occur, and then eliminate that. In regular scientific and engineering coding practices it gets *slightly* more complex that the last sentence. Sarcasm in italics. The bugs in codes have caused misfiring of missiles, and have made satellites crash. In scientific calculations they are more deadly and often silent. They quietly mask the real signals and make you throw away your data. Bugs are your real enemies - stay alert, flag them and kill them if you are able to - for the rest of your present coding life.