Neural artistic style: Trials and tribulations

If you haven’t seen it yet, the algorithm A Neural Algorithm of Artistic Style is pretty cool. In essence, it allows you to transform a picture and a painting into a picture that looks as though it was painted by that painter. For example:

Original headshot
+
Jackson Pollock painting
=
Combined output

You can try this yourself at a small site I created: neuralimage.com. Or, just Google “neural artistic style” to find several other implementations.

The rest of this post is not about the algorithm or Deep Learning or AI or any of that jazz. It’s about how damned hard it is to set up an image processing pipeline of any complexity using AWS.

It’s more a brain dump than anything else, but should be useful as a map of the minefield that awaits would-be implementors.

The long road

What follows is a list of the roadblocks and speedbumps I encountered while standing up the neuralimage.com backend.

Constraint: Price

The hip, new deep learning algorithms need a beefy GPU to run efficiently. The go-to instance type is g2.2xlarge, which currently costs $0.65/hour ($468/month!). Clearly, you do not want to keep this thing running 24x7.

Fortunately, spot instance prices usually hover around $0.10/hour ($72/month) which is appreciably cheaper. Even so, you still probably only want to spin up an instance when needed and shut it down when not needed. This is where the troubles begin.

Road block: No love for spot instances with Elastic Beanstalk

Simply put: spot instances with Elastic Beanstalk aren’t supported. When you factor in that you pretty much need a custom AMI to run any of the deep learning packages, Elastic Beanstalk is a non-starter. Which means that you are stuck configuring auto-scaling on your own.

Speed bump: Auto-scaling is a mess

Making sense of the plethora of auto-scaling knobs and config options can drive you crazy. After a bit of thrashing, here’s what I came up with:

  • Create a SQS queue (with associated dead letter queue)
  • Create an AMI with your required libraries
  • Create a launch configuration with the custom AMI and set the user-data to be the appropriate script for installing the CodeDeploy agent, but do not create the CodeDeploy deployment group yet
  • Create the auto scaling group, choosing spot instances at $0.12/hour
  • Set a scale up policy to set the number of instances to 1 when the ApproximateNumberOfMessagesVisible in your SQS queue is above 1 for, say, 2 minutes.
  • Set a scale down policy to set the number of instances to 0 when NumberOfMessagesReceived is <= 0 for, say, 1 hour (at least 2 times the expected processing time). Note that it would be more appropriate to scale down when the queue is empty, but “empty” means that both ApproximateNumberOfMessagesVisible and ApproximateNumberOfMessagesNotVisible are 0. Using NumberOfMessagesReceived is a reasonable substitute.

Phew! Got all that? Great. Now let’s try to actually get our code to run on one of these suckers.

Speed bump: CodeDeploy kind of sucks

Remember that we didn’t create the CodeDeploy deployment group earlier while creating the AutoScaling group. Now is the time to do that since adding an AutoScaling group to an existing CodeDeploy deployment group doesn’t really work correctly. It looks like it does, but it doesn’t. (ed: as of Feb. 22nd, 2016)

Once you’ve created the deployment group, spin up at least one instance in your AutoScaling group, then deploy your code. I haven’t dug into this too extensively, but it appears that deploying to an empty AutoScaling group appears to work, but new instances don’t received the updated code.

Some day CodeDeploy might actually work like you would expect it to.

Speed bump: Logs, anyone?

So, now you’ve got your code on those spot instances, but what happens if something goes wrong? Good luck logging in to the instance before it disappears again with all its precious log data.

The best solution I’ve come across is to install the CloudWatch daemon on the instance (through the CodeDeploy appspec.yml) and publish your log files to CloudWatch. A couple warnings here as well. I’ve seen the awslogs daemon die if you try any funny business with your log files like truncating via echo '' > my-log-file, so keep it simple and vanilla.

The destination

So, after all that nonsense (and, believe me, there are still some rough edges that need to be worked out in my implmentation), was it worth it?

Totally. When I can wake up in the morning and Kadinsky my child, that’s priceless.

Rosalind