Don’t trust Rasmussen polls!

Political scientist Alan Abramowitz brings us some news about the notorious pollster: In the past 12 months, according to Real Clear Politics, there have been 72 national polls matching Clinton with Trump—16 polls conducted by Fox News or Rasmussen and 56 polls conducted by other polling organizations. Here are the results: Trump has led or […]

The post Don’t trust Rasmussen polls! appeared first on Statistical Modeling, Causal Inference, and Social Science.

Deep learning architecture diagrams

Like a wild stream after a wet season in African savanna diverges into many smaller streams forming lakes and puddles, deep learning has diverged into a myriad of specialized architectures. Each architecture has a diagram. Here are some of them.

This is a draft. Come back later for the final version.

Neural networks are conceptually simple, and that’s their beauty. A bunch of homogenous, uniform units, arranged in layers, weighted connections between them, and that’s all. At least in theory. Practice turned out to be a bit different. Instead of feature engineering, we now have architecture engineering, as described by Stephen Merrity:

The romanticized description of deep learning usually promises that the days of hand crafted feature engineering are gone – that the models are advanced enough to work this out themselves. Like most advertising, this is simultaneously true and misleading.

Whilst deep learning has simplified feature engineering in many cases, it certainly hasn’t removed it. As feature engineering has decreased, the architectures of the machine learning models themselves have become increasingly more complex. Most of the time, these model architectures are as specific to a given task as feature engineering used to be.

To clarify, this is still an important step. Architecture engineering is more general than feature engineering and provides many new opportunities. Having said that, however, we shouldn’t be oblivious to the fact that where we are is still far from where we intended to be.

Not quite as bad as doings of architecture astronauts, but not too good either.

LSTM diagrams

How to explain those architectures? Naturally, with a diagram. A diagram will make it all crystal clear.

Let’s first inspect the two most popular types of networks these days, CNN and RNN/LSTM. You’ve already seen a convnet diagram, so turning to the iconic LSTM:


It’s easy, just take a closer look:


As they say, in mathematics you don’t understand things, you just get used to them.

Fortunately, there are good explanations, for example Understanding LSTM Networks and
Written Memories: Understanding, Deriving and Extending the LSTM.

LSTM still too complex? Let’s try a simplified version, GRU (Gated Recurrent Unit). Trivial, really.


Especially this one, called minimal GRU.

Minimal GRU

More diagrams

Various modifications of LSTM are now common. Here’s one, called deep bidirectional LSTM:



The rest are pretty self-explanatory, too. Let’s start with a combination of CNN and LSTM, since you have both under your belt now:

Convolutional Residual Memory Network, 1606.05262

Dynamic NTM, 1607.00036

Evolvable Neural Turing Machines, PDF

Recurrent model of visual attention
Recurrent Model Of Visual Attention, 1406.6247

Unsupervised domain adaptation by backpropagation
Unsupervised Domain Adaptation By Backpropagation, 1409.7495

This diagram of multilayer perceptron with synthetic gradients scores high on clarity:

Synthetic gradients
MLP with synthetic gradients, 1608.05343

Every day brings more. Here are two fresh ones:

Google's Neural Machine Translation System
Google’s Neural Machine Translation System, 1609.08144

Deeply Recursive CNN for super resolution
Deeply Recursive CNN For Image Super-Resolution, 1511.04491

And Now for Something Completely Different

Drawings from the Neural Network ZOO are pleasantly simple, but, unfortunately, serve mostly as eye candy. For example:


These look like not-fully-connected perceptrons, but are supposed to represent a Liquid State Machine, Echo State Network, and Extreme Learning Machine.

How does LSM differ from ESN? That’s easy, it has green neuron with triangles. But how does ESN differ from ELM? Both have blue neurons.

Seriously, while similar, ESN is a recursive network and ELM is not. And this kind of thing should probably be visible in an architecture diagram.

Profiling Top Kagglers: Walter Reade, World’s First Discussions Grandmaster

Profiling Top Kagglers | Walter ReadeNot long after we introduced our new progression system, Walter Reade (AKA Inversion) offered up his sage advice as the first and (currently) only Discussions Grandmaster through an AMA on Kaggle’s forums. In this interview about his accomplishments, Walter tells us how the Dunning-Kruger effect initially sucked him into competing on Kaggle and how building his portfolio over the last several years since has meant big moves in his career.

Introducing sparklyr, an R Interface for Apache Spark

Earlier this week, RStudio announced sparklyr, a new package that provides an interface between R and Apache Spark. We republish RStudio’s blog post below (see original) for your convenience.


Over the past couple of years we’ve heard time and time again that people want a native dplyr interface to Spark, so we built one! sparklyr also provides interfaces to Spark’s distributed machine learning algorithms and much more. 

Read More

The post Introducing sparklyr, an R Interface for Apache Spark appeared first on Cloudera Engineering Blog.

Why the garden-of-forking-paths criticism of p-values is not like a famous Borscht Belt comedy bit

People point me to things on the internet that they’re sure I’ll hate. I read one of these awhile ago—unfortunately I can’t remember who wrote it or where it appeared, but it raised a criticism, not specifically of me, I believe, but more generally of skeptics such as Uri Simonsohn and myself who keep bringing […]

The post Why the garden-of-forking-paths criticism of p-values is not like a famous Borscht Belt comedy bit appeared first on Statistical Modeling, Causal Inference, and Social Science.

The “One of Many” Fallacy

I’ve been on book tour for nearly a month now, and I’ve come across a bunch of arguments pushing against my book’s theses. I welcome them, because I want to be informed. So far, though, I haven’t been convinced I made any egregious errors. Here’s an example of an argument I’ve seen consistently when it comes […]

NPR’s gonna NPR

I was gonna give this post the title, Stat Rage More Severe in the Presence of First-Class Journals, but then I thought I’d keep it simple. Chapter 1. Background OK, here’s what happened. A couple weeks ago someone pointed me to a low-quality paper that appeared in PPNAS (the prestigious Proceedings of the National Academy […]

The post NPR’s gonna NPR appeared first on Statistical Modeling, Causal Inference, and Social Science.

Apache Spark 2.0 Beta Now Available for CDH

Today, Cloudera announced the availability of an Apache Spark 2.0 Beta release for users of the Cloudera platform.

Apache Spark 2.0 is tremendously exciting (read this post for more background) because (among other things):

  • The Dataset API further enhances Spark’s claim as the best tool for data engineering by providing compile-time type safety along with the benefits of a query-optimization engine.
  • The Structured Streaming API enables the modeling of streaming data as a continuous DataFrame and expresses operations on that data with a SQL-like API.

Read More

The post Apache Spark 2.0 Beta Now Available for CDH appeared first on Cloudera Engineering Blog.