Content: References, Learning Guides, Etc. / Tutorial

Change Search Criteria:

Correlation and Linear Regression

/

From datascience+: “Before going into complex model building, looking at data relation is a sensible step to understand how your different variables interact together. Correlation looks at trends shared between two variables, and regression looks at relation between a predictor (independent variable) and a response (dependent) variable.”

Principal Component Analysis using R

/

From R-bloggers: “Curse of Dimensionality: One of the most commonly faced problems while dealing with data analytics problems such as recommendation engines, text analytics is high-dimensional and sparse data. At many times, we face a situation where we have a large set of features and fewer data points, or we have data with very high feature vectors. In such scenarios, fitting a model to the dataset, results in lower predictive … Continue Reading

Bayesian machine learning

/ /

From FastML: “So you know the Bayes rule. How does it relate to machine learning? It can be quite difficult to grasp how the puzzle pieces fit together – we know it took us a while. This article is an introduction we wish we had back then.” This article covers the following topics: Bayesians and Frequentists Priors, updates, and posteriors Inferring model parameters from data Model vs inference Statistical modelling … Continue Reading

Getting Started in Open Source

/ /

From Rebecca Bilbro: “The phrase ‘open source’ evokes an egalitarian, welcoming niche were programmers can work together towards a common purpose–creating software to be freely available to the public in a community that sees contribution as its own reward. But for data scientists who are just entering into the open source milieu, it can sometimes feel like an intimidating place…. And yet, open source development does have a lot going … Continue Reading

Fun With Plotly

/

From Len Kiefer: “Plotly enables you to make interactive htlm widgets that you can embed in your webpage or view from within R. I’ve been having a lot of fun converting existing visualizations I have made with ggplot2 into plotly visualizations using ggplotly…. I’m going to include the code and discussion for several graphs I’ve been using. I will use updated data that we used in our Cross talk dashboard. … Continue Reading

R for Excel Users

/ /

This post covers the why and how to switch from Excel to R for managing data and undertaking analysis. It covers the following topics: Four Fundamental Differences Between R and Excel Example: Joining two tables together Interation Generalizing through functions “Excel users have a strong mental model of how data analysis works, and this makes learning to program more difficult. However, learning to program will allow you to do things … Continue Reading

What’s Wrong With My Time Series

/

From MultiThreaded: “What’s wrong with my time series? Model validation without a hold-out set Time series modeling sits at the core of critical business operations such as supply and demand forecasting and quick-response algorithms like fraud and anomaly detection. Small errors can be costly, so it’s important to know what to expect of different error sources. The trouble is that the usual approach of cross-validation doesn’t work for time series … Continue Reading

Getting Started with tidyverse in R

/ /

From Storybench: “The tidyverse is a collection of R packages developed by RStudio’s chief scientist Hadley Wickham. These packages work well together as part of a larger data analysis pipeline. To learn more about these tools and how they work together, read R for data science…. The following tutorial will introduce some basic functions in tidyverse for structuring and analyzing datasets.”

Neural Networks from Scratch (in R)

/

From Medium: This tutorial: “is for those of you with a statistics/econometrics background but not necessarily a machine-learning one and for those of you who want some guidance in building a neural-network from scratch in R to better understand how everything fits (and how i doesn’t).” The author’s motivations for writing the tutorial are: “Understanding (by writing from scratch) the leaky abstractions behind neural-networks dramatically shifted my focus to elements … Continue Reading

Emulating R plots in Python

/

This tutorial covers the equivalent plotting commands in R and Python, with the goal of replicating R plots in Python: “Some commands can be straightforward replicated in Python, some are surprisingly hard to find equivalents without using custom functions etc.” The tutorial includes: Residual plots QQ plots Scale-location plots Leverage plots