You must be wondering that it’s very easy to use a CSV file by putting the name inside the read.csv() function. But, in most of the cases, we also need to put some extra conditions in order to get things right and less frustrating for us.
Use the below syntax for the use.
my_data<- read.csv(“filename.csv”, stringsAsfactors = FALSE, strip.white=TRUE, na.strings=c(“NA”,””))
stringsAsFactors = FALSE tells R to keep
character variables as they are rather than convert to factors.
strip.white = TRUE removes spaces at the start and end of character elements. R treats “game” and ” game” differently, which is not usually desired.
na.strings = c("NA","") tells R that in addition to the usual NA, empty strings in columns of character data are also to be treated as missing
lapply() function is used when you want to apply a function to each element of a list in turn and get a list back.
x<- list(a=1, b=1:3, c=10:100)
$b 3 $c 91
You can use other functions like max, min, sum, etc.
supply() function is used when you want to apply a function to each element of a list in turn, but you want a vector back, rather than a list.
Vector is useful sometimes because it will get you a set of values and you can easily perform an operation on it.
x <-list(a =1, b =1:3, c =10:100) #Compare with above; a named vector, not a list sapply(x, FUN = length) a b c 1391 sapply(x, FUN = sum) a b c 165005
A scatterplot is a graph which shows many points plotted in the Cartesian plane. Each point holds 2 values that are present on the x and y-axis. The simple scatterplot is plotted using plot() function.
The syntax for scatterplot is:-
plot(x, y ,main, xlab, ylab, xlim, ylim, axes)
x is the data set whose values are the horizontal coordinates y is the data set whose values are the vertical coordinates
main is the tile in the graph
xlab and ylab is the label in the horizontal and vertical axis
xlim and ylim are the limits of values of x and y used in the plotting
axes indicate whether both axes should be there on the plot