Snowsuit Zine // issue 01

Table of Contents


Welcome to the first issue of the Snowsuit Zine. Snowsuit is a small group of people brought together mostly by a shared interested in technology. Our discussions take place over email, for collectively we span nine time zones.

This zine is an experiment to see what happens if we say our thoughts more loudly. As with all firsts, it's crude, simple, and likely to change. But the writing is genuine and the presentation will mature.



Go, Swift, and Corporate Culture

An interesting thing happened recently in mobile computing. Both Apple and Google got what they wanted. The smart phone wars are finished. Apple is basically a luxury device (phone?) company and Google wants you to use their services so they can show you ads. Simple enough, but consider how this plays out in their behavior.

Apple cares a lot about what phone you buy because they want to sell you a specific one: theirs. They are also the expensive choice, so a user must opt into the higher cost. These users buy more apps, they click on more ads, and they buy expensive phones every two years.

Google cares a lot about people using their services because they want to use that data to put an ad in front of you. The device isn't as important. There are many more Android users in the world than there are iOS users. Interestingly, Android's user base is over 1.5 billion, and according to Ben Evans, 500K are running without any Google services on them at all. Google has made many more Internet users, one billion in the first five years alone. It took the PC industry 15 years.

I realized something this year. It comes from a growing belief that industries grow in a flurry of experimentation and then mature into a big market with few companies, roughly 2–4 major companies. Consider that there was once 1800 US car companies, today there are 2. There was once over 500 steel companies, today there are 2.

In some cases, you have a closed system and an open system, and many companies coexist in the open system. Android's ecosystem is like this. iOS is closed, and Android means anything from Samsung, HTC, Google, and whatever else runs the OS. Amex makes an interesting case study for this model. They have a closed system and go specifically for the luxury side of american spending. One can then think of Visa, Mastercard, Chase, NYCE, Interlink, etc, as different layers of the more open system.

What I realized, or think I realized, is that the network patterns we've seen in software, and especially on the web, exist clearly in both current industry as well as the industries of the past.

It's been said that building a great technology company is about helping developers do amazing things. It then follows that a technology company should build tools that help developers build things that serve the company's needs. Apple wants you to build apps native to only iOS because it helps them sell phones. Google wants everyone using the Internet because it helps them sell ads.

Both companies now have new programming languages. Google has been working on Go, Apple has been working on Swift. I spent time learning both this year and realized that both languages capture the personality of the companies that created them.

Apple's Languages

I started learning Objective-C in February. I felt like I had taken a major step backwards in terms of expressiveness, but I wanted to build for my iPhone badly enough that I shrugged and opened a book.

Initially, Objective-C felt like OO languages I had used in the past. Especially like Java. There were singletons, factory patterns, and boiler-plate code. There is also a header and implementation file. And on top of that, the type system is very light-weight. The language can be a drag, but the library support for just about everything is generally quite amazing.

Apple's ecosystem has this way of making you feel like all your needs are met. They've got tools for making lists, building clean navigation systems, gorgeous buttons, etc. It's not too different from something like Bootstrap, but it's for native apps and Apple also built it.

The language, in my opinion, is a drag. The languages feels like it's from a prior era, like 1983, and needs an overhaul. It takes a lot of code to express ideas and requires memorizing a huge list of objects you might use.

And then Apple released Swift. This language reintroduced many of the paradigms I didn't want to go without. You can work almost exclusively with functions and data structures if that's what you want; and I do!

I consider myself a functionally leaning programmer. I can wrangle objects if I must but I don't like it. This is the core of why I was disgruntled with Objective-C. Swift fixes that.

Even though Objective-C is a drag, the Apple ecosystem is vibrant and full of creativity. Lots of people have done amazing things with it. Many are happy with Objective-C and didn't feel the need for a new language. People like myself, however, were relieved.

By introducing a new language, Apple expanded the network of developers that might build something that helps sell an Apple device. It's not just people who like Objective-C, it's them and also people who like Haskell or Python. It also makes a statement that building native software will feel new again. Apple wants you to forget about building for the web and build native apps instead.

Google's Language

Google employs some of the best minds in computing and they have the resources to let these minds wander. Google Go is what happens when some of the people who designed C and Unix have had a long time to think about what kind of language they'd design if they tried again.

After years of Python, Go felt like a huge upgrade in terms of clarity. There are things that show the language is still early, but they're moving fast. One thing I like a lot is that concurrency is built into the language and everyone who uses Go uses that system. On top of that, a minimal web framework is built in, too. They clearly spent time reducing the opportunities for developers to disagree, which seems to me like a very wise move.

For comparison, consider that the Python community has had a long history of web frameworks and concurrency systems, all built by the community as libraries. As a result, lots of experimentation takes place but there is much debate on what the right method is and, in my opinion, too much fragmentation. Consider that a database system required a separate driver to be written for each concurrency library. That's a lot of duplicated effort!

Go provides all of that stuff at the language level, and people no longer argue about those topics. This was eye-opening for me to realize. Seems kinda obvious in retrospect, but maybe that's just how far gone I was in the Python debates, even offering my own web framework, Brubeck.

The momentum behind Go is just staggering. A massive amount of infrastructure has been replaced with better tools, often written in Go. It's as though the server hackers of the world thought, "finally… the things servers always need is easier to build" and off they went. Docker is written in Go. CoreOS. Lots of Heroku, Youtube, and Google are in Go. I've heard it described it as the best C, and that seems fair to me, tho not perfect. I've also heard it described as the best Python, which also seems fair to me, tho not perfect.

One thing is clear. Google wants developers building more clouds. The language is designed specifically to make that easier.

Apple & Google

Both Apple and Google have won the smart phone war. They're solidifying their moats by making it hard for developers to resist using their tools. This is historically a strategy shown to work. Get the developers, you get the users, who are using the stuff the developers built.

Apple and Google coexsit on many iPhones. Apple is happy they sold you the luxury device and Google is happy they know so much about you. In a similar way, Swift could be there with you, paying attention to when your users tap the screen, and Go could be there fulfilling your cloud needs behind the Apple screen.

The personalities of both Apple and Google have lead directly to two programming languages that uncompromisingly support their world view and can coexist happily in the minds of developers.

RPCs: The Failed Abstraction

In learning about distributed systems it's common that Remote Procedure Calls comes up in discussion. It is common that they are, in one way or another, called a failed abstraction. However, many good papers refer to RPCs as a mechanism for communication. The Raft paper, for example, talks about the AppendEntries and RequestVote RPCs. What do distributed systems people mean when they talk about RPCs being a failed abstraction then?

The difference really comes down to what someone mean when they talk about Remote Procedure Calls. But first, let's define what "remote" means. In this context it refers to executing work in a different process than the code that calls it. This process could be on the same physical machine or a different one, involving a network hop.

Given that, on one hand, there is the abstract concept of executing some work in a different process than the one performing the RPC. In the case of AppendEntries in Raft, this is telling another machine to perform the work of appending log entries to its log. On the other hand is using RPCs as an abstraction for making remote calls look semantically the same as local calls. Java Remote Method Invocation, or RMI, is an example of this.

It should be clear that, in a distributed system, one needs to execute work on a remote machine somehow. In this sense RPCs come down to a straight forward way to express that need in a way pretty much everyone understands. Using the Raft paper as an example again, it calls AppendEntries and RequestVote RPCs simply because it is clear that these are executed on remote machines. It doesn't put any requirements on the implementor of Raft to implement the calls in terms of Java RMI or similar functionality offered by their language of choice. RPCs is a common and accepted way to describe how processes in a distributed system interact.

When RPCs are referred to as a failure then one must be referring to the second case then, the actual implementation and use of a framework which attempts to make remote calls look local. Why is this bad, though? The root issue is that remote calls behave very little like local calls.

First, performance. A remote call has a different performance profile to local calls. Good design of a system generally involves lots of clear function calls. An example of this is iterating a data structure. Most languages often involve getting something like an iterator and calling a function on it that gives the next entry. Local calls are cheap. Remote calls, on the other hand, are expensive. General numbers thrown around are about 1 ms for communication between two machines inside a datacenter. According to an article from Plumbr, an unoptimized method call in Java is about 0.1ms and an optimized method call about 0.0015ms. Performing an RPC for each iteration of a data structure is prohibitively expensive. APIs must then be designed differently to reduce the overhead of the network. Iterating a data structure should actually involve performing one RPC to get the entire data structure (or a reasonable chunk of it) and then locally iterating over that chunk. Notice that getting the whole data structure might be too expensive as well. While passing references to large chunks of memory is perfectly fine inside the same memory space, it can be time consuming to move a lot of data between processes. The network imposes design constraints on a remote call that are not present in a local call.

Second, failures. A remote call can fail in ways that a local call cannot. What happens if the remote process is not running? What happens if it's just really slow? This could be because the process is overloaded or because the network is acting funny. A local call doesn't have concerns related to the substrate that the call happens on. If the machine dies during a local call then the program itself will no longer be running as well. If the processor is acting up the whole program is acting up. If a local call takes a long time, it's because the local call is doing something that takes a long time to do. The cost of entering and exiting the procedure locally is more, or less, constant. Performing work in a different process than the calling process introduces numerous failure cases that do not happen in local calls.

Finally, partial failures. This is being given as a distinction point to overall failures because it has far reaching ramifications for the overall design of a distributed system. The section on failures talks about remote calls having different failure case than local calls but partial failures goes deeper. To demonstrate this, consider the problem of incrementing a counter by 1. To simplify things, consider the case of only one actor incrementing the counter. In a program running in one process, this is trivial: add 1 to the counter. Now, move the counter to another process and use an RPC to increment it. There are three cases we can consider:

  1. The call succeeds, the counter is incremented.
  2. The call fails prior to incrementing the counter.
  3. The call fails after incrementing the counter but before returning to the calling process.

The first case is boring and doesn't need much explanation. In the second two cases, though, as a calling node we can see that calls have failed but we have no way to reason about if the counter is correct. How to solve this?

The common way to do this is to turn a call which modifies something as a side-effect into an idempotent operation. Idempotence Is Not a Medical Condition by Pat Helland is a good introduction to this but the basic idea is to have operations that can be executed multiple times with only having the effect of being executed once. How would the counter example be implemented then? We can turn a single increment operation into two operations, both of which can be called multiple times safely. In the case of failure, just call the operation again. Here are the two calls:

  1. Ask the remote machine for a unique ID.
  2. Call increment with the unique ID.

The client will perform these two RPCs. Only after both calls have been executed successfully is the counter incremented. The server implementation is simple. When the first call is made it generates a unique ID and store it locally and returns it to the client. On the second call, it checks to see if ID being given is in its list of stored IDs and if so performs the increment and forgets the ID.

What happens during failure? If the first call fails for the client it can just make the call again. The counter is not modified. The server might create a bunch of IDs that can never be executed but let's not worry about that now. Once the client has successfully received an ID, it can call the increment RPC with the ID. If a call fails it can call again. If it receives an error saying the ID does not exist on the server, it knows that one of the previous failures only failed to return and the increment has been applied. Even though increment has been called multiple times only one increment has happened.

Dealing with partial failures of remote calls imposes significant design decisions on the creator of an API that one does not have to worry about in local calls. In local calls, one usually just has to remember to cleanup after ones self. If a local call succeeds it will return successfully to its caller.

While RPCs refer to performing a call outside the executing process, they are almost always used in the context of performing calls over the network. The network is not an implementation detail that can be hidden. The network imposes performance constraints as well as failure cases that are not present in local calls. Ignoring the difference will result in a system that is both slow and fragile.

A Remote Procedure Call, as an abstraction, is a useful tool for explaining the work done by different processes in a distributed system. RPCs, as a way to hide that a call is happening remotely, too often leads to fragile systems that don't work well and are unnecessarily difficult to understand. Experience tells us to design APIs that clearly distinguish between local and remote calls.

Monthly Consumption


  • The Founder's Dilemmas: Anticipating and Avoiding the Pitfalls That Can Sink a Startup by Noam Wasserman link
  • The Everything Store: Jeff Bezos and the Age of Amazon by Brad Stone link
  • Young Money: Inside the Hidden World of Wall Street's Post-Crash Recruits by Kevin Roose link
  • Failure Is Not an Option: Mission Control From Mercury to Apollo 13 and Beyond by Gene Kranz link
  • The Information: A History, A Theory, A Flood by James Gleick link
  • The Death And Life Of Great American Cities by Jane Jacobs link
  • Capitalism, Socialism, and Democracy by Joseph Schumpeter link
  • Functional Programming in Swift by Chris Eidhof, Florian Kugler, and Wouter Swierstra link


  • PNUTS: Yahoo!'s Hosted Data Serving Platform by Cooper et al. link
  • Program design in the UNIX environment by Pike et al. link