Linear models are among the most useful, versatile, and ubiquitous data analysis tools out there. R provides an enormous number of functions for building, manipulating, and analyzing linear models. This talk will focus on the basics, but will also touch on a few relatively advanced topics in the field, such as mixed-effects models, and general hypothesis testing frameworks. The goal is to provide beginners with insight into how to approach common data analysis problems with linear models, and to give seasoned practitioners a few new tricks to add to their toolkits.
This script, presented at the April 2015 Meetup, demonstrates how to carry out basic social network analysis (SNA) using the network and sna packages. Topics covered include preparing data for analysis, interpreting output, and generating social network graphs.
The script assumes no prior knowledge of SNA but does assume a basic working knowledge of R. Most code is commented with an explanation of its purpose.
After this demonstration, hopefully Meetup participants will have an understanding of how to begin and carry out a basic social network analysis in R.
library(dplyr) # Data manipulation & magrittr pipe library(ggplot2) # General plotting library(NMF) # aheatmap() library(gplots) # heatmap.2() library(RColorBrewer) # Brewer palettes set.seed(123) ############################ # 2D histograms # ############################ # simulate data that consiststs of paired observations in two experiments covar_mat <- matrix(c(5, 4, 4, 5), ncol = 2) # Covariance matrix data <- MASS::mvrnorm(n = 10000, mu = c(0, 0), Sigma = covar_mat) %>% #Simulate correlated data rbind(matrix(rnorm(20000, sd = 0.
This demonstration of the caret package was given by Mark Lawson, bioinformatician at Hemoshear LLC, Charlottesville VA. The caret package (short for Classification And REgression Training) is a set of functions that streamline the process for creating predictive models. The package contains tools for data splitting, pre-processing, feature selection, model tuning using resampling, and variable importance estimation. Read more about the caret package here.
This demonstration uses the caret package to split data into training and testing sets, and run repeated cross-validation to train random forest and penalized logistic regression models for classifying Fisher’s iris data.
Web Scraping Take data formatted for display in a web browser and reformat for analysis.
It helps to know…
a little about HTML and XML how to manipulate strings in R a little something about regular expressions how to write a function and do some basic conditional looping Web scraping is mostly cleaning data.
Strategy Every web page is different, but a basic procedure in R (for a single web page) is as follows:
This is an R script ported from here.
# Load packages library("dplyr") library("ggplot2") library("nycflights13") library("lubridate") # it's a data.frame, but also a tbl_df. # doesn't print entire thing to screen. flights ## # A tibble: 336,776 x 19 ## year month day dep_time sched_dep_time dep_delay arr_time ## <int> <int> <int> <int> <int> <dbl> <int> ## 1 2013 1 1 517 515 2 830 ## 2 2013 1 1 533 529 4 850 ## 3 2013 1 1 542 540 2 923 ## 4 2013 1 1 544 545 -1 1004 ## 5 2013 1 1 554 600 -6 812 ## 6 2013 1 1 554 558 -4 740 ## 7 2013 1 1 555 600 -5 913 ## 8 2013 1 1 557 600 -3 709 ## 9 2013 1 1 557 600 -3 838 ## 10 2013 1 1 558 600 -2 753 ## # .
In the first example we use college admissions data to model gpa as a function of class rank, act score and year of admission. Three linear models are fit and summarized in a single table using the texreg functions screenreg, texreg and htmlreg. These functions create nicely formatted tables for the R console, LaTeX, and HTML, respectively. The second example uses NRC Data on research-doctorate programs to do beta regressions of PhD completion rates on Faculty Publications, Citation Rate, Faculty Grants and Institution type.