Quantcast
Channel: Mike Perham
Viewing all 58 articles
Browse latest View live

Building a Binary Tree with Enumerable

$
0
0

I believe that the Enumerable module is the most important thing to understand if you want to go from a beginner to intermediate Rubyist. It requires you to understand two fundamental parts of Ruby: modules and blocks.

Ruby’s standard library includes hashes, arrays, sets and thread-safe queues. One structure missing is a generic binary tree. Binary trees are great general purpose data structures: they aren’t super fast for any operations (e.g. lookup, insert, delete) but they aren’t super slow for those operations either. Databases typically implement indexes as a tree structure; every time you insert a row into a table, a node is inserted into the index’s binary tree structure too. Here’s what a binary tree “looks” like.

binary tree

Let’s build a binary tree in Ruby; you will be amazed at how little code it actually takes.

Funny thing about a binary tree is that every part of the tree looks like the same: you have a node with some data, along with left and right pointers to the child nodes.

classNodeattr_accessor:data,:left,:rightdefinitialize(data)@data=dataendend# build the first two levels of the tree pictured aboveroot=Node.new(7)root.left=Node.new(3)root.right=Node.new(12)

The amazing thing about Enumerable is this: you implement one method, each, and you get dozens of useful methods in return! each knows how to iterate through elements in your data structure and so Ruby can leverage that to implement lots of other functionality.

Remember I said that every part of a binary tree looks the same: that’s a hallmark of a recursive data structure. We’ll use recursion to iterate through the tree in our each method:

classNodeincludeEnumerableattr_accessor:data,:left,:rightdefinitialize(data)@data=dataenddefeach(&block)left.each(&block)ifleftblock.call(self)right.each(&block)ifrightendendroot=Node.new(7)root.left=Node.new(3)root.right=Node.new(12)root.each{|x|putsx.data}# will print "3 7 12"putsroot.inject(0){|memo,node|memo+=node.data}

The final trick to Enumerable is to implement a comparison operator so Ruby can compare two Nodes and tell which one is greater. This allows it to implement sorting, min and max operations. This comparison operator is commonly called the “spaceship” operator because <=> kinda looks like a spaceship if you squint. Note we delegate the <=> call to the data itself. We assume the tree is storing comparable data: integers, strings, or a value object which itself implements <=>.

classNodeincludeEnumerableattr_accessor:data,:left,:rightdefinitialize(data)@data=dataenddefeach(&block)left.each(&block)ifleftblock.call(self)right.each(&block)ifrightenddef<=>(other_node)data<=>other_node.dataendendroot=Node.new(3)root.left=Node.new(2)root.right=Node.new(1)root.each{|x|putsx.data}# just a few of the various operations Enumerable providesputs"SUM"putsroot.inject(0){|memo,val|memo+=val.data}puts"MAX"putsroot.max.dataputs"SORT"putsroot.sort.map(&:data)

This is pretty incredible and really shows off the power of Ruby: we’ve built a really powerful data structure in just a few lines of code. All is not wine and roses though, there’s several hard parts we didn’t implement (inserting a new node, deleting a node, rebalancing), I’ll leave those as an exercise for the reader to steal from a StackOverflow post.

Conclusion

Understanding and implementing Enumerable and the spaceship operator is the key to making Ruby data structures “feel” normal. In this example, the binary tree looks like any old Ruby code using an Array but is completely different under the covers.


Go 1.4 runtime visualized

$
0
0

Go 1.4 is out! I won’t reiterate the changes, read the release notes for that. But it’s one thing to read that things have improved – what if I could show you the improvement?

These snapshots are from an upcoming feature for Inspeqtor Pro: real-time memory and GC monitoring for any Go-based daemon. They show the first 3 minutes of Inspeqtor Pro itself starting up. After 1 minute, it sends an alert email. That big spike you see is all the code required to send email to Gmail (net/smtp, TLS, etc) being paged into memory. The small but regular bumps in Allocated Memory are from Inspeqtor Pro gathering system metrics every 15 seconds.

Some notes: GC in 1.4 looks faster than in 1.3.3, there’s no pause in 1.3.3 that is less than 1ms. Total memory shrunk about 1MB, from 4 to 3 MB, and the total size at startup is noticably smaller. Awesome!

Interested in seeing this for your own Go daemons? Chime in on the GitHub issue and give Inspeqtor a test drive.

Go 1.3.3: go-1.3.3

Go 1.4: go-1.4

The expvar package - Metrics for Go

$
0
0

Last week I discovered a mysterious package in the Go standard library, expvar. A google search turned up little content on it. Undiscovered APIs for exploring? How exciting! I immediately dove in and what I found was neat yet unsurprising.

The expvar package allows a Go process to expose variables to the public via an HTTP endpoint that emits JSON. The simplest usage requires you to do two things in your custom code:

1. Import the package - this has the side effect of registering the /debug/vars HTTP endpoint.

import"expvar"

2. Start up an HTTP server for your process to handle HTTP requests:

sock,err:=net.Listen("tcp","localhost:8123")iferr!=nil{returnerr}gofunc(){fmt.Println("HTTP now available at port 8123")http.Serve(sock,nil)}()

If you hit http://localhost:8123/debug/vars, you should see something like this:

{"cmdline":["/var/folders/bc/27hv15_d2zvcc3n3s9dxmfg00000gn/T/go-build421732654/command-line-arguments/_obj/exe/main","-l","debug","-s","i.sock","-c","realtest"],"counters":{"a":10,"b":10},"memstats":{"Alloc":1076016,"TotalAlloc":1801544,"Sys":5966072,"Lookups":209,"Mallocs":7986,"Frees":4528,"HeapAlloc":1076016,"HeapSys":2097152,"HeapIdle":327680,"HeapInuse":1769472,"HeapReleased":0,"HeapObjects":3458,"StackInuse":212992,"StackSys":917504,"MSpanInuse":21504,"MSpanSys":32768,"MCacheInuse":8800,"MCacheSys":16384,"BuckHashSys":1441160,"GCSys":1183744,"OtherSys":277360,"NextGC":1436032,"LastGC":1418102095002592201,"PauseTotalNs":2744531,"PauseNs":[480149,171430,603839,288381,494934,522995,182803,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],"NumGC":7,"EnableGC":true,"DebugGC":false}}

(Put that blob into JSONLint if you want to see a more readable but verbose version.)

By default, Go’s runtime exposes data about command line arguments, memory usage and garbage collection with very little effort, all built into the standard library. Simple and easy yet powerful, like most Go functionality.

Adding your own Metrics

As the expvar name implies though, you can expose your own variables into the mix for monitoring purposes. In fact Datadog recently announced that their monitoring agent can pull your custom expvar values into their system for monitoring purposes.

Here you can see how to declare a map of counters and then start to increment them as actions happen in your daemon. You can see in the JSON blob above how they appear when exported:

var(counts=expvar.NewMap("counters"))funcinit(){counts.Add("a",10)counts.Add("b",10)}

Wherein I Mix in some Awesome

Well, I’m even more excited because the next version of Inspeqtor Pro will have a Web UI for visualizing the memory and GC data which the Go runtime exposes. This type of functionality is always what I’ve wanted in production and with almost no effort. Sweet.

Here’s a prototype I’m working on right now. You modify your Go daemon to expose the expvar memory data and Inspeqtor Pro can give you this real-time memory visualization.

memory and gc visualizer

The Future

I’d love to see other runtimes expose similar data via HTTP/JSON. Can Ruby or Python expose similar data? What about the JVM? Rubinius recently discussed their VM metrics support, let’s see other runtimes do the same! Make it easy to expose and tooling will appear to support it.

CGI: Ruby's Bare Metal

$
0
0

How simple can you make a web request?

Happy 2015 everyone! For 2015, I wanted to spend some time documenting and automating my business as much as possible. Ezra Z’s and James Golick’s recent passing was a reminder to myself about life: hope for the best but plan for the worst.

My biggest technical task was then to automate the onboarding of a new Sidekiq Pro customer. If I pass away next week, I want people to still be able to purchase and renew their subscription so my wife and child have recurring income they can count on for the next few years. Essentially I want to automate day-to-day operations.

My Sidekiq Pro server is as simple as humanly possible: it’s running only Apache. Perfect for serving static files but how do I handle an arbitrary request? That’s when I asked myself: How simple can you make a web request? The requirements are straightforward: Stripe will call my server with a subscription event when someone starts or stops their Sidekiq Pro subscription. I need a script to perform the magic to grant/revoke access and send the customer an email with access details. This call will only happen a few times a day, max.

This is a perfect case for going down to the bare metal and using the oldest web technology: CGI.

Common Gateway Interface

CGI was the first standard for tying Unix and the Web together. The Unix programming model says a process should take input on STDIN and output on STDOUT. CGI allows a webserver like Apache to call an external script with the details of a web request as STDIN. The script then outputs the HTTP response back as STDOUT. Ruby’s cgi library will parse the request coming from STDIN and provides some response output helpers your code can use to generate HTML responses.

In my case, Stripe POSTs a blob of JSON in the request body. Since I’m responding to the Stripe robot, it only needs to see a 200 OK response — no fancy view rendering layer required.

#!/usr/bin/env rubyrequire'json'require'cgi'cgi=CGI.new# CGI tries to parse the request body as form parameters so a# blob of JSON awkwardly ends up as the one and only parameter key.stripe_event=JSON.parse(cgi.params.keys.first)do_the_magic(stripe_event)# magic happens right herecgi.out("status"=>"OK","type"=>"text/plain","connection"=>"close")do"Success"end

I configured Apache to know to execute my CGI script by adding this inside the vhost configuration:

ScriptAlias/stripe//opt/stripe/<Directory/opt/stripe/>Requireall granted
</Directory>

Now if I request http://server/stripe/event.rb, Apache will call /opt/stripe/event.rb.

Look at what I’m not running: puma or unicorn, rails or sinatra, redis or memcached, postgres or mysql, bundler, capistrano, etc. The real thing is using 3-4 gems. That’s it. The script runs in a few seconds and then exits. Nothing to keep running 24/7 and nothing to monitor. Deployment means using scp to copy the .rb file to the server. I don’t even have to restart anything upon deploy because nothing was running in the first place!

Reality Check

CGI certainly isn’t the right solution for every problem: each request starts a new Ruby process so there’s a small bit of overhead but for systems which expect little traffic but require maximum reliability, it’s something worth considering. There’s a higher performance variant of CGI called FastCGI which solves the performance overhead by keeping a process running 24/7.

Ultimately plain old CGI solved my requirements: only Apache is running 24/7 and new Sidekiq Pro customers now get their license information within seconds of purchase, making everyone happy!

Inspeqtor 0.8.0 released

$
0
0

Inspeqtor 0.8.0 is now available for your monitoring needs. If you are using Monit, bluepill, god, eye or any other process monitoring system, I hope you’ll check out Inspeqtor. It’s different because it’s simple and opinionated.

What’s New in Inspeqtor?

Plain old Inspeqtor sees a dozen bugfixes for 0.8.0 with no new notable features. I’m pretty happy with the feature set as it stands today. Consider this a pre-1.0 release.

What’s New in Inspeqtor Pro?

I’ve spent a good amount of the last month working on a really cool new feature for Inspeqtor Pro: real-time memory monitoring and visualization for Go applications. The new feature is covered in depth in this new wiki page but who can resist a little eye candy?

memory monitoring

The Web UI is a combination of Bootstrap and Mozilla’s new MetricsGraphics.js library. It’s easy to use and recommended if you need some metrics visualization in JS.

Of course you can use this feature to pipe memory data from your daemon into statsd and view it within your existing metrics visualization system and not use the Web UI at all.

Ready to try it? Go here to download and install Inspeqtor.

Indie Developers in Ruby, 2015 Ed.

$
0
0
image source

Ruby is great for programmers: you can download the source for so many different libraries to solve many different problems. But there’s a dark side: there’s no incentive for the library author to help or support you as a user. So many open source projects become ghost towns over the course of a year or two due to ongoing support and time commitments. If you’re building a long-term business with these tools, it’s in your best interests to ensure they remain supported with a steady flow of features and bug fixes for years to come.

Many developers are trying to find a happy medium between the Open Source “give it all away” approach and the traditional commercial software approach “talk to this sales guy to get a quote” by selling Open Source-based products. Steady income makes the ongoing time commitment so much easier to handle.

Note that I’m focused on bootstrapped or “indie” developers selling Ruby-related products and ignoring VC-backed companies or SaaSes. Indie developers don’t have the benefit of big marketing budgets or sales staff. They might blog, go to conferences and/or give away swag but they often don’t get much public attention. Let’s change that!

Class of 2015

Without further ado, here’s the list I came up. Thanks to my Twitter followers for many suggestions!

Dresssed

Dresssed sells premium site templates for Rails. Get a beautiful template for your new site with lots of form variants, error pages, SASS integration, responsive design, etc already built.

GitLab

GitLab offers an open source alternative to GitHub Enterprise. If you need to self-host your own source code repositories, this is one way to do it. They offer GitLab Enterprise with extra features and support.

Passenger

Passenger is the canonical commercial/OSS hybrid product and Phusion does a fantastic job pushing the envelope in features and performance. (I’m happy to admit I borrowed several of their ideas when bootstrapping my own project.) They sell Passenger Enterprise: an upgrade of their free Passenger OSS library with additional features and support. If you have a business running on Rails, this should be your first purchase.

Payola

Pete Keen has carved out a niche as the expert on Rails + Stripe. If you need top-notch Stripe integration, Payola is your solution. His commercial variant, Payola Pro, adds a number of additional features and a support contract.

RailsKits

RailsKits offers Rails app templates with common functionality already built-in. Their SaaS kit gives you a full Rails 4.1 app skeleton with monthly subscriptions, pricing tiers and multi-tenant database storage pre-built for one low price; sounds awesome!

RailsLTS

RailsLTS offers long-term support for very popular Rails versions, backporting critical security fixes to older Rails versions which aren’t maintained anymore by Rails Core. They’ve released 12 versions of 2.3.18 so far and will be adding Rails 3.2 support soon.

RubyMotion

How awesome is RubyMotion? Write native iOS applications with Ruby! Not technically a Ruby product since it’s all Mac/iOS native, but it’s focused on making iOS development for us Rubyists as easy as possible.

Sidekiq

I’m happy to be part of this list too! Sidekiq Pro extends Sidekiq with more features and a support contract. If you want to build a scalable Ruby website for the long-term, use Passenger Enterprise for your app server and Sidekiq Pro for processing background jobs. Since its launch three years ago, Sidekiq has grown from a side project to my full-time job.

TwoFactorAuth

Peter Harkins built TwoFactorAuth, a gem which adds U2F key fob (e.g. Yubikey) support to your Rails app. It’s AGPL licensed for free, you must pay for commercial usage. I love OSS projects which have the courage to use GNU + commercial licensing: everything is out in the open but the author built something valuable, businesses need to pay for that value and the AGPL enforces that. Makes for a dead simple way to allow free trials while ensuring sales with no effort on the developer’s part. When you think about all the government, health care and financial apps which can be built in Rails and need robust authentication, I bet there’s a nice niche market for Peter to capture here!

Conclusion

That’s only nine - I wish there were more! If I missed anyone, please let me know.

Sidekiq Pro 2.0!

$
0
0

I’m happy to announce that Sidekiq Pro 2.0 is ready for general use. There’s two major features and some refactoring you need to know about.

Batches

Sidekiq allows you to fire off a set of jobs to process asynchronously but you don’t know when the whole set of jobs are complete:

no job workflow

Sidekiq Pro provides the Batch abstraction, it represents a set of jobs; you can attach callbacks to be fired when the set of jobs has finished. This allows more complex job workflows:

simple job workflow

For 2.0, the Batch implementation and data model was overhauled for higher performance, much smaller size and Batches can now be nested: a job within a Batch can itself create a child Batch of jobs. The callbacks for the parent batch will only be fired once the child batch callbacks have finished.

complex job workflow

This was some of the hardest code I’ve ever written (it requires a bag of buzzwords: asynchronous, distributed, transactional, threadsafe and fast!) but I’m really proud of the outcome. Sidekiq Pro should be able to handle job workflows of any depth now.

As part of 2.0, the batch data model changed significantly but have no fear: existing 1.x batches will process as normal.

Scheduler

Sidekiq’s scheduler scans the scheduler set for jobs to perform. Unfortunately the way it scans and enqueues jobs is not atomic. There is a very very small but real chance a scheduled job could be lost.

For 2.0, I’ve written a Lua-based scheduler which is atomic and enqueues 50-100x faster than the existing scheduler since it does not require any network round trips. It’s very easy to enable:

Sidekiq.configure_serverdo|config|config.reliable_scheduler!end

It’s optional because it does require Redis 2.6.

Miscellaneous Changes

  • Various deprecated APIs were removed
  • Reliability features are now enabled via methods, not require
Sidekiq::Client.reliable_push!Sidekiq.configure_serverdo|config|config.reliable_fetch!end

Conclusion

Here’s the Sidekiq Pro 2.0 Upgrade Notes. If you’re not a Sidekiq Pro customer yet, you can buy Sidekiq Pro here. Enjoy!

Diagrams courtesy of the very nifty draw.io.

Timeout: Ruby's Most Dangerous API

$
0
0

In the last 3 months, I’ve worked with a half dozen Sidekiq users plagued with mysterious stability problems. All were caused by the same thing: Ruby’s terrible Timeout module. I strongly urge everyone reading this to remove any usage of Timeout from your codebase; odds are very good you will see an increase in stability.

You, using Timeout.

You might think I’m overreacting or hyping up the problem: I’m not. Here’s Charles Nutter, lead developer of JRuby, writing about how Timeout is fundamentally broken and cannot be used safely in 2008.

The Problem

Timeout is typically used to ensure a block of code executes within a given time. It does this by raising an error within the Thread executing that block. Relevant to Sidekiq: this will corrupt shared network connections. Imagine this sequence of events:

  1. Code makes request A to Redis
  2. Timeout triggers, block stops executing
  3. Redis connection is returned to connection pool
  4. Network receives response A for request A
  5. Code checks out same connection and makes request B
  6. Code reads response A instead of waiting for response B!

That shared Redis connection has been corrupted due to Timeout skipping response A handling.

The Solution

The only safe timeouts to use are lower-level network timeouts. The underlying operating system understands them and ensures everything is cleaned up properly. All good network APIs will expose those timeouts so you can set them in your application code. Here’s a few examples:

# Sidekiq's redis connection poolSidekiq.configure_serverdo|config|config.redis={:network_timeout=>2,:url=>'redis://localhost:3970/12'}end
# Generic redis-rb$redis=Redis.new(:url=>'...',:connect_timeout=>5,:timeout=>5)
# Dalli$memcached=Dalli::Client.new('...',:socket_timeout=>5)
# Net::HTTPNet::HTTP.start(host,port,:open_timeout=>5,:read_timeout=>5)do|http|http.request(...)end

If your favorite network library does not document its timeout options, be a sport and open a new issue or send them a PR with updated documentation. I just did that for Redis.

Conclusion

Ruby’s Timeout is a giant hammer and will only lead to a big mess. Don’t use it.


Sidekiq Pro Gem Server Outage

$
0
0

On Friday 6:30 PM PST, the Sidekiq Pro gem server became unavailable due to a Linode data center outage in Fremont, CA. Connectivity was restored after 15 hours at Saturday 9:30 AM PST. I’m terribly sorry not only for the outage but the extended length of time.

The Past

I knew that the gem server was a single point of failure and had plans to build a redundant server later this summer. I had spent some time in Oct 2014 building a new, simpler gem server but hadn’t automated the process - the gem server was still a “unique snowflake” and had a number of files which couldn’t easily be recovered to bring up a replacement server.

The Present

While I was waiting for the server to become available, I started the process of automating the gem server build Saturday morning. Once connectivity was restored, I immediately copied the irreplacable files off-site. As of today (Monday) I was able to build a new DigitalOcean droplet with a simple shell script and a repository of those critical files necessary (TLS cert, .gem binary files, etc). Within 10 minutes of creation it was successfully serving Sidekiq Pro’s gems to my test app.

The Future

By the end of the month I will have two servers available, one in SF and one in NY. Each server will be rebuilt annually to ensure the build process is tested every 6 months and remains usable. I hope this will get us closer to the goal of 100% uptime.

Postscript

If you are interested in app deploys that don’t depend on any gem servers, take a look at the bundle package command.

Inspeqtor Pro now Open Source

$
0
0

Last year I created Inspeqtor as a fresh take on process monitoring, to improve upon monit and similar tools. Part of that process was creating a commercial version, Inspeqtor Pro, to provide long-term viability for the project. I have little desire to debate why but Inspeqtor Pro has not sold well enough to continue as is; something has to change.

logo-inspeqtor

Today I’m open sourcing Inspeqtor Pro. It is still not free software, you must purchase a license to use it in production, but being open source means it will be easier and cheaper for me to support via issues and pull requests. I hope today’s leading companies and developers want their infrastructure open source, this covers both bases: the source is available to inspect plus licensing and long-term support are easy to purchase.

Installing Inspeqtor Pro on your Linux server is now dead simple. Try it out - I hope you like it!

Sidekiq and Upstart

$
0
0

The best and most reliable way to manage multiple Sidekiq processes is with Upstart. Many developers know little to nothing about Upstart so I wanted to write up how to integrate Sidekiq with Upstart. With Upstart doing the hard work, it becomes easy to manage deployments with Capistrano or another similar tool.

Starting Sidekiq

The Sidekiq repo has example .conf files you can use as a template to create your own services. Customize the .conf files as necessary and place them in /etc/init; they tell Upstart when and how to start the associated processes.

The workers service is a “fake” service which knows how to start/stop N sidekiq processes. Upon machine boot, Upstart will start workers which will start those N processes. Within workers.conf we’ve declared how many processes we want to start:

envNUM_WORKERS=2

If you want to quickly shut down all Sidekiq processes, run stop workers. Start them back up with start workers. Of course you can do both with restart workers. It literally can’t be any easier!

The sidekiq service is an “instance” service, allowing you to create N processes. It requires an index parameter to define which instance you are controlling:

$ start sidekiq index=0
$ start sidekiq index=1
$ stop sidekiq index=2
etc...

Deployment

Deployment should do two things: quiet Sidekiq as early as possible and restart as late as possible.

Quieting Sidekiq

During a deployment, we want to signal Sidekiq to stop processing jobs as early as possible so Sidekiq has as much time as possible to finish any jobs in progress. We do this by sending each process the USR1 signal, here’s a Capistrano task which does that:

task:quietdoonroles(:worker)doputscapture("sudo pgrep -f 'sidekiq' | xargs kill -USR1")endend

Note that workers does not support reload since it doesn’t map to a single process so we have to use that pgrep hack.

You can use Upstart’s reload command to quiet a specific instance:

$ reload sidekiq index=X

Restarting Sidekiq

Restarting is easy: restart workers. This will actually stop and then start the processes.

task:restartdoonroles(:worker)doputscapture("sudo restart workers")endend

Notes

  • We don’t need to daemonize. Modern daemons should never daemonize themselves.
  • We don’t need PID files. PID files are legacy from years ago and their use should signal that something is wrong.
  • We don’t need to specify our own log files. Sidekiq will output to stdout; Upstart will direct stdout to a file within /var/log/upstart/and automatically rotate those log files for you - no logrotate setup necessary!

In other words, stop reinventing the wheel and let the operating system do the hard work for you! I hope this makes your Sidekiq deployment cleaner and more stable.

Sidekiq Enterprise

$
0
0

After many months of development and preparation, I’m proud to announce the newest member of the Sidekiq family: Sidekiq Enterprise. Sidekiq Enteprise is targeted at large companies and businesses which are building and scaling their operations with Sidekiq. It offers a whole new level of functionality beyond what Sidekiq and Sidekiq Pro contain.

What’s New?

Four major new features:

Rate Limiting

Many Sidekiq users and customers have asked how to throttle or limit their concurrency so a 3rd party API is not crushed by a huge number of Sidekiq workers at the same time. The new Sidekiq::Limiter API allows you to declare and enforce rate limits across all your Ruby processes, Sidekiq or not:

# Allow up to 50 concurrent operations to the ERP serviceERP_LIMITER=Sidekiq::Limiter.concurrent(:erp,50)defperform(...)ERP_LIMITER.within_limitdoErp.do_somethingendend

The Limiter API allows you to limit based on concurrency or a rate limit (e.g. 5 ops per sec).

If the operation cannot be executed due to the rate limit, it will raise an error by default. If this error is raised within a Sidekiq job, Sidekiq will catch the error and reschedule the job to execute in the near future.

The Web UI has a new “Limits” tab containing a overview of registered limiters along with usage metrics or history for each.

Documentation: Rate Limiting

Periodic Jobs

Possibly the most popular 3rd party plugins are ones which add cron job-like functionality. Cron is also a common single point of failure since you typically pick one machine to run cron jobs.

Sidekiq Enterprise offers an officially supported solution for periodic jobs. Jobs are created according to the specified schedule and any Sidekiq process can pick up the job. As a side benefit, your system will no longer have that cron machine as a single point of failure. It’s dead simple to register a periodic job:

# config/initializers/sidekiq.rbSidekiq.configure_serverdo|config|config.periodicdo|mgr|mgr.register"* * * * *",MinutelyWorker,retry:1mgr.register"*/4 * 10 * *",OddTimedWorker,queue:'critical'endend

Your Worker must take no arguments.

The Web UI has a new “Cron” tab containing an overview of registered periodic jobs along with their recent execution history.

Documentation: Periodic Jobs

Unique Jobs

“How do you catch a unique rabbit?”
“Unique up on him!”

Sidekiq Enterprise’s new unique jobs support will automatically de-duplicate any jobs already pending within Redis. If you create a job every time your user presses a button, you might not want a storm of clicks to create a storm of jobs.

To activate the feature, add this line:

# config/initializers/sidekiq.rbSidekiq::Enterprise.unique!unlessRails.env.testing?

Your workers must declare their uniqueness TTL with the unique_for option:

classMyWorkerincludeSidekiq::Workersidekiq_optionsunique_for:10.minutesdefperform(...)endend

The uniqueness will remain in effect until the job is successfully processed or the TTL expires. Uniqueness is based on (class, args, queue) so you can push the same class/arguments to two different queues.

Documentation: Unique Jobs

Leader Election

If you have a swarm of N Sidekiq processes, how can you run some code on a single Sidekiq? Many customers schedule a special job to run over and over but if there’s a Redis networking issue, the job can be lost and the cycle broken. With Sidekiq Enterprise you can run an infinite loop on a single Sidekiq “leader” process, elected randomly from your processes. If the leader disappears, a follower will be promoted to leader within a minute.

Documentation: Leader Election

Onboarding

Each Sidekiq Enterprise customer gets a one hour onboarding video chat session with me to help with any questions they might have and discuss any problems they might see in their application. I can help optimize Sidekiq for your application and environment.

Licensing

Sidekiq Pro’s low price means I cannot accept license changes which the lawyers at larger corporations often demand. These corporations can now purchase Sidekiq Enterprise and negotiate custom terms.

As part of this release, my lawyer has drawn up a new commercial license for Sidekiq Pro and Sidekiq Enterprise. New customers will use those licenses.

Pricing

Sidekiq Enterprise is priced on a sliding scale, based on number of workers running in your production environment. Pricing is $1750/yr per 250 workers.

  • 250 workers - $1750/yr
  • 251-500 workers - $3500/yr
  • More? Volume discounts available.

A worker is a thread within a Sidekiq server process. Ten processes with the default concurrency of 25 = 250 workers.

Existing Pro subscribers can contact me to upgrade to Enterprise for the prorated difference in price.

Older Pro lifetime customers will need to purchase a new Enterprise subscription in order to upgrade.

Purchasing

Many large companies have contacted me privately, asking if they can purchase without a credit card. Sadly, until now my answer was “no” because I didn’t have any other purchase workflow. Today I’m happy to say that companies can purchase Sidekiq Enterprise via the more traditional quote/purchase order/invoice workflow. Because of its lower price, Sidekiq Pro remains credit card only.

Sidekiq Pro

Sidekiq Pro is now the entry-level commercial version, with unlimited workers for $950/yr. This unmetered pricing remains a great value and something I want to maintain for smaller startups out there with limited funding. Purchasing is via credit card only but is completely automated so you can purchase and have Sidekiq Pro running in minutes.

sidekiq.org

The sidekiq.org website has been completely redesigned for the Enterprise release.

Conclusion

Sidekiq Enterprise offers not only a whole new set of features for serious Sidekiq users but also legal and support options important to large companies.

My goal here is to offer a product for all types of users: from hobbyists using Sidekiq to startups using Sidekiq Pro and larger companies using Sidekiq Enterprise. I hope one of them fits your needs too.

Storing Data with Redis

$
0
0

Would you stuff all of your data into one database table? That’s crazy, Mike, don’t be silly! What if I told you most people do just that with Redis?

Redis users often have several distinct datasets in Redis: long-lived transactional data, background job queues, emphemeral cached data, etc. At the same time I see lots of people using Redis in the most naive way possible: put everything into one database.

There are several questions to answer when determining how to use Redis for different datasets:

  1. Can I flush with the dataset without affecting other datasets?
  2. Can I tune the persistence strategy per dataset? For transactional data, you want real-time persistence with AOF. For cache, you want infrequent RDB snapshots or no persistence at all.
  3. Can I scale Redis per dataset? Redis is single-threaded and can perform X ops/sec so consider that your performance “budget”. Datasets in the same Redis instance will share that budget. What happens when your traffic spikes and the cache data uses the entire budget? Now your job queue slows to a crawl.

Data Partitioning

You have several different options when it comes to splitting up your data:

Namespaces

This is the most naive option. With namespaces, the Redis client prefixes every key with the namespace, e.g. “cache”, so the keys for “cache” don’t conflict with the keys for “transactional”. Namespacing increases the size of every key by the size of the prefix. You don’t get to tune Redis for the individual needs of “cache” and “transactional”. I strongly recommend avoiding namespaces. I see people use namespaces to share a single Redis across multiple apps and/or multiple environments. Consider this for hobbyists only who only want to pay for a single Redis database from a SaaS; you do not want to build a business on top of this hack. Answers: No, No, No.

Databases

Out of the box, every Redis instance supports 16 databases. The database index is the number you see at the end of a Redis URL: redis://localhost:6379/0. The default database is 0 but you can change that to any number from 0-15 (and you can configure Redis to support more databases, look in redis.conf). Each database provides a distinct keyspace, independent from the others. Use SELECT n to change databases. Use FLUSHDB to flush the current database. You can MOVE a key from the current database to another database.

Want to put all your Sidekiq job data in a separate database?

# Use DB 4 for all job dataredis={url:'redis://localhost:6379/4'}Sidekiq.configure_clientdo|config|config.redis=redisendSidekiq.configure_serverdo|config|config.redis=redisend

Using separate databases is an easy way to put a “firewall” between datasets without any additional administrative overhead. Now you can FLUSHDB one dataset without affecting another dataset. Protip: configure your test suites to use a different database than your development environment so the test suite can FLUSHDB without destroying development data. Answers: Yes, No, No.

Instances

Running separate instances of Redis on different ports is the most flexible approach but adds significant administrative overhead. If you are using Redis for caching (and you should probably use memcached1 instead), use a separate instance so you can tune the configuration and dedicate 100% of Redis’s single thread to serving high-traffic cache data. Configure another Redis instance to handle lower-traffic transactional and job data with more appropriate persistence. Answers: Yes, Yes, Yes.

Conclusion

My main goal of this blog post is educate people on the drawbacks of stuffing everything into one Redis database. Namespaces are a poor solution for splitting up datasets in almost every case.


  1. I recommend memcached because it is designed for caching: it performs no disk I/O at all and is multithreaded so it can scale across all cores, handling 100,000s of requests per second. Redis is limited to a single core so it will hit a scalability limit before memcached. Using Redis for caching is totally reasonable if you want to stick with one tool and are comfortable with the necessary configuration and lower scalability limit per process. Redis does have a nice advantage that it can persist the cache, making it much faster to warm up upon restart.

Optimizing Sidekiq

$
0
0

Sidekiq has a reputation for being much faster than its competition but there’s always room for improvement. I recently rewrote its internals and made it six times faster. Here’s how!

It’s been quite a while since I’ve touched Sidekiq’s core design. That was intentional: for the last year Sidekiq has stabilized and become reliable infrastructure that Ruby developers can trust when building their applications. That didn’t stop me from wondering though, what would happen if I change this or tweak that?

Recently I decided to embark on an experiment: how hard would it be to remove Celluloid? Could I convert Sidekiq to use bare threads? I like Celluloid and how much easier it makes concurrent programming but it’s also Sidekiq’s largest dependency; as Celluloid changes, Sidekiq must accomodate those changes.

1. Get a Baseline

First thing I did was write a load testing script to execute 100,000 no-op jobs so I could judge whether a change was effective or not. The script creates 100,000 jobs in Redis, boots Sidekiq with 25 worker threads and then prints the current state every 2 seconds until the queue is drained. This script ran in 125 seconds on MRI 2.2.3.

2. Bare Threads

Once a baseline was established, I spent a few days porting Sidekiq’s core to use nothing but plain old Threads. This wasn’t easy but after a few days I had a stable system and the improvement was impressive: the load testing script now ran in 57 seconds. Every abstraction has a cost and benefit; Celluloid allows you to reason about and build a concurrent system much quicker but does have a small runtime cost.

3. Asynchronous Status

Once the rewritten core was stable and tests passing again, I ran ruby-prof on the load testing script to see if there was any low hanging fruit. The profiler showed that the processor threads were spending most of their time sending job status data to Redis. Sidekiq has 25 processor threads to execute jobs concurrently and each thread called Redis at the start and finish of each job; you get precise status but at the cost of two network round trips. To optimize this, I changed the processor threads to update a global status structure in memory then changed the process’s heartbeat, which contacts Redis every few seconds, to update the status as part of the heartbeat. If Sidekiq is processing 1000 jobs/sec, this saves 1999 round trips! Result? The load testing script ran in 20 seconds.

4. Parallel Fetch

The last major change I made when I noticed that MRI was using 100% of CPU and JRuby was using 150% during the script execution. Only 150%??? I have four cores in this laptop; why isn’t it using 300% or more? I had a hunch: Sidekiq has always used a single Fetcher thread to retrieve jobs from Redis one at a time. To test my theory, I introduced 1ms of latency into the Redis network connection using Shopify’s nifty Toxiproxy gadget and immediately the script execution time shot up to over five minutes! The processor threads were starving, waiting for that single thread to deliver jobs to them one at a time over the slow network.

I refactored things to move the fetch code into the processor thread itself. Now all 25 processor threads will call Redis and block, waiting for a job to appear. This, along with the async status change, should make Sidekiq much more resilient to Redis latency. With fetch happening in parallel, the script ran in 20 seconds again, even with 1ms of latency. JRuby 9000 uses >300% CPU now and processes 7000 jobs/sec!

Bonus: Memory and Latency!

I also ran the script with GC disabled. With no optimizations, Sidekiq executed 10,000 jobs using 1257MB of memory. With all optimizations, Sidekiq executed the same number of jobs in 151MB of memory. In other words, the optimizations result in 8.3x less garbage.

But that’s not all! I measured job execution latency before and after: the time required for the client in one process to create a job, push it to Redis, Sidekiq to pick it up and execute the worker. Latency dropped from 22ms to 10ms.

VersionLatencyGarbage created when
processing 10,000 jobs
Time to process
100,000 jobs
Throughput
3.5.122ms1257 MB125 sec800 job/sec
4.0.010ms151 MB22 sec4500 jobs/sec

Data collected with MRI 2.2.3 running on my MBP 13-inch w/ 2.8Ghz i7.

Drawbacks?

There are a few trade offs to consider with these changes:

  • more Redis connections in use. Previously only the single Fetcher thread would block on Redis. Now each processor thread will block on Redis, meaning you must have more Redis connections than your concurrency setting. Sidekiq’s default connection pool sizing of (concurrency + 2) will work great.
  • job status on the Busy tab in the Web UI isn’t real-time when the page renders, it may be delayed up to a few seconds
  • Celluloid is no longer required by Sidekiq so if your application uses it, you will need to pull it in and initialize it yourself

Conclusion

Keep in mind what we are talking about: the overhead of executing no-op jobs. This overhead is dwarfed by application job execution time so don’t expect to see radical speedups in your own application jobs. That said, this dramatic lowering of job overhead is still a nice win for all Sidekiq users, especially those with lots of very fast jobs.

This effort will become Sidekiq 4.0 coming later this Fall. All of this is made possible by sales of Sidekiq Pro and Sidekiq Enterprise. If you rely on Sidekiq every day, please upgrade and support my work.

See the GitHub pull request for all the gory detail.

Should you use Celluloid?

$
0
0

I’ve used Celluloid from day one. More importantly I’ve evanglized Celluloid and advised Rubyists to use it. So it came as a shock to several people that I recently overhauled Sidekiq to remove Celluloid. What does that mean? I must be a huge hypocrite!

Engineering is about trade offs.

  • To make something easier or safer to use, create an abstraction layer.
  • To make something faster, remove one or more abstraction layers.

Multithreading is extremely hard to get right and the APIs that Ruby exposes for threading are rudimentary at best. Celluloid is an abstraction layer designed to make multithreading easier and safer to develop. If you are building your own threading, use an abstraction layer! Celluloid is fantastic, Michael Grosser’s parallel gem is great, etc.

Using threads typically gets you a huge increase in throughput per process. This increase usually dwarfs any overhead which an abstraction layer introduces.

But there’s an exception for every rule. Sidekiq has gone from a young, quickly moving project to a mature, stable project over the last two years. Celluloid makes redesigning a system easier but Sidekiq doesn’t really need that ease anymore. Celluloid does add a fixed overhead to every job execution, which thousands of apps running billions of jobs pay every day. The overhead is small but noticeable when running no-op jobs:

sidekiq 4 metrics

Celluloid is also Sidekiq’s biggest dependency. By removing it, I shrink the surface area of 3rd party gems I have to monitor and stay compatible with. Not a problem if you are using Celluloid in your app (you can just lock versions) but Sidekiq can’t stay on an old version without limiting people who are trying to use Celluloid APIs within Sidekiq.

Conclusion

My opinion has not changed: if I were building a new concurrent system today, I’d start with Celluloid. The abstraction is quite valuable when building something new but Sidekiq itself is at a point where it can do without that abstraction layer.


Advanced Data Structures in Ruby

$
0
0

Ruby provides several complex data structures out of the box; hash, array, set, and queue are all I need 99% of the time. However knowing about more advanced data structures means you know when to reach for something more esoteric. Here’s two examples I’ve used recently.

ConnectionPool::TimedStack

My connection_pool gem implements a thread-safe Stack with a time-limited pop operation. This can be very useful when coordinating time-sensitive operations between threads. Recently I used it as an alternative to sleep so I could wake up a sleeping thread immediately:

definitialize@done=false@sleeper=ConnectionPool::TimedStack.newenddefterminate@done=true@sleeper<<nilenddefstart@thread||=Thread.newdowhile!@donedo_something# normally we'll sleep 60 seconds.# terminate will wake up the sleeper early so it can exit# immediately.begin@sleeper.pop(60)rescueTimeout::Errorendendendend

The algorithms gem

Although somewhat misnamed, the algorithms gem contains a large set of advanced data structures for use in Ruby code. Sidekiq Enterprise’s cron feature uses a Heap to store jobs in-memory, sorted based on their next occurrence; this makes checking for the next job to run a constant time operation, no matter how many jobs are defined.

I’d suggest reading over the API documentation, this gem has a lot of good structures that can turbocharge your Ruby code: trees, deques, and many others.

Time Complexity

When should you use a Heap? A Queue? A Stack? A Tree?

Part of understanding advanced data structures is understanding the complexity of their operations: how long will it take to add an element, remove an element, change an element? The time complexity of an operation can be constant (O(1), great), logarithmic (O(log N), good), linear (O(N), meh), or worse, where N is the number of elements in the data structure. Read more about Time complexity on Wikipedia.

Knowing about more advanced data structures and time complexity will make you a better developer. If you understand the operations that your code will perform frequently and the expected data structure size, you can pick a structure which best suits your own needs.

Sidekiq 4.0!

$
0
0

I’m happy to announce that Sidekiq 4.0 is now available!

I’m happy to announce that Sidekiq Pro 3.0 is now available!!

I’m happy to announce that Sidekiq Enterprise 1.0 is now available!!!

Sidekiq

Sidekiq 4.0 is a major optimization release. Sidekiq’s core has been redesigned to remove dependencies and now goes down to the bear metal. Benchmarks show job overhead is reduced six times, garbage creation reduced by eight times and job latency cut in half.

Redis 2.8 or greater is now required.

There are no public Sidekiq API changes so this version upgrade should be very easy.

Please read the Sidekiq 4.0 release notes for all the detail.

VersionLatencyGarbage created when
processing 10,000 jobs
Time to process
100,000 jobs
Throughput
Sidekiq 4.0.010ms151 MB22 sec4500 jobs/sec
Sidekiq 3.5.122ms1257 MB125 sec800 job/sec
Resque 1.25.2--420 sec240 jobs/sec
DelayedJob 4.1.1--465 sec215 jobs/sec

Data collected with MRI 2.2.3 running on my MBP 13-inch w/ 2.8Ghz i7. Resque started via COUNT=25 QUEUE=default rake resque:workers

Sidekiq Pro

Sidekiq Pro 3.0 is designed to work with Sidekiq 4.0’s new core design. Reliable fetch has been reimplemented but the semantics should remain identical. Pausing and unpausing queues now takes effect in real-time due to the redesign, no more polling or 10 second delay.

Platforms without persistent hostnames, notably Heroku and Docker, get official support for reliable fetch through the new ephemeral_hostname option.

Read the Sidekiq Pro 3.0 release notes.

Sidekiq Enterprise

The newest member of the Sidekiq family, Sidekiq Enterprise, has solidified over the last three months with a handful of bugs fixed and almost one hundred customers running it in production. At this point I think it’s stable enough to call 1.0. As with Sidekiq Pro, some features have been re-implemented to work with the new Sidekiq 4.0 core.

There are no release notes because there’s nothing to note: no new features and the semantics are identical to 0.x.

Support

Sidekiq 3.x and Sidekiq Pro 2.x are stable and now in maintenance mode; they will get critical bug fixes through 2016.

Conclusion

The demand for both Sidekiq and its commercial siblings continues to amaze me since I released Sidekiq Pro three years ago. Today Sidekiq has passed 5 million downloads on Rubygems, Sidekiq Pro has many hundreds of customers and Sidekiq Enterprise approaches its first hundred. Thank you to my customers; you make it possible for me to support and work full-time on Sidekiq.

You can buy Sidekiq Pro or Sidekiq Enterprise here and be up and running in minutes.

How to Charge for your Open Source

$
0
0

Every month or so we hear of another high profile open source developer who flips the table and rages about open source because of burnout, most recently Ian Cordasco and Ryan Bigg. Does this sound familiar?

  1. Spend many hours writing software.
  2. Spend many hours supporting users and solving their problems.
  3. Make $0.

If you were a freelancer and did that with clients, you’d be:

  1. Overwhelmed with people asking for your help.
  2. Quickly starve with an empty bank account.

So why is it noble for OSS developers to do it? If a library would cost a client $5000 in your time to develop, why not sell the library as a product and charge each user one hour of your time, say $150? Money doesn’t solve burnout but think of it this way: almost every issue email you get is a net negative on your mental health, almost every sale email you get is a net positive.

“Corporations need to step up.”

99% of corporations will do what they are legally obligated to do, no more or less. So make them do something! The easiest way to make someone value your time and software is to charge for it. The goal of this blog post is not to rant about hypotheticals but rather show you how easy it is to charge for your work.

Sidenote

I don’t want to argue freedom and capitalism in the comments. If software freedom comes at the expense of burning out software developers, that’s not freedom, that’s tragedy of the commons. By charging for your work, you stop its exploitation.

Remember: Open Source != Free Software. The source may be viewable on GitHub but that doesn’t mean anyone can use it for any purpose. There’s no reason you can’t make your source code accessible but also charge to use it. As long as you are the owner of the code, you have the right to license it however you want.

  1. Make sure your project documents the fact that contributors must assign all rights to you. This allows you to accept pull requests. Some projects require a CLA; I think that’s overly conservative but IANAL.
  2. License the project as AGPL by default by adding a LICENSE file with the AGPL in it. The AGPL requires users to open source any software which uses your library; businesses will pay to avoid it. This is exactly what we want.
  3. Add an MIT license in a COMM-LICENSE file.
  4. Note in your README that the licensing is AGPL by default but an MIT license is available for purchase.

Commerce

This assumes you are in the United States.

  1. Open a Plasso account. You’ll also open a Stripe account as part of the signup, connected to your bank account. (I have no connection with Plasso aside from being familiar with their platform as I use it to sell Sidekiq Pro. I’ve also seen people use Wufoo or FastSpring.)
  2. Determine your pricing. Decide if you want to charge one-time or as a subscription. More complex projects can easily justify ongoing payment. You can also charge per major version, like books do with first edition, second edition, etc. For small libraries with no guaranteed support, I think charging your hourly rate per major version is entirely reasonable.
  3. Once you have created a product page in Plasso, put the link in your README. Here’s Sidekiq Pro’s product page on Plasso for example.
  4. Money will appear in your bank account two days after purchase. You’ll get an annual 1099-K from Stripe which (IIRC) you report as miscellaneous income on your annual tax return.

That’s it! Upon purchase, the customer gets an email receipt from Plasso that says they bought the commercial license option for their records.

Well done, you just became a professional open source developer and thought leader in our industry. Tweet at @mperham so I can congratulate you personally!

Just because you charge for something doesn’t guarantee you will make any money. But if you don’t charge for it, I guarantee you will make no money.

Denouement

Consider what we’ve done here: your project is still Open Source but we’ve added a method for grateful users to support you and a legal mechanism forcing businesses to pay if they want to use your library and not open source their application due to the AGPL.

Now it’s up to you how much time and effort you want to spend evanglizing and supporting your work. If you think having users feels good, just wait until you get a few customers and can pay for a vacation each year with your OSS work. Good luck!

Notes

  • “But what about…” Money doesn’t solve all problems. This won’t work for every developer, every library or every project. The world is complex and your mileage may vary.
  • Having unpaid collaborators can feel a little weird at first but reality is most smaller OSS projects have a single person doing 95% of the work. If this is true, be grateful for unpaid help but don’t feel guilty about keeping 100% of the income.
  • “Then users will just use some other free library” - and you just reduced your support load. You don’t want free users who don’t value your effort. Let someone else deal with them and suffer the burnout. Focus on making the best, most useful library you can that solves your own need.
  • “Great, now I’ve got two jobs!” You have just as much commitment as you had when building a free library. You aren’t guaranteeing any support.

How to Test Multithreaded Code

$
0
0

Multithreaded code is hard to write and even harder to test. Since much of my work is dedicated to making Ruby threading easier for my users and customers, I thought some might be interested in the patterns I’ve developed to make multithreaded code as simple and testable as possible.

Separate Threading from Work

If you can’t test a big block of code, break it into a set of smaller testable pieces.

Sidekiq::Processor is an object which is designed to run in its own thread and doesn’t have any public API aside from starting/stopping the thread.

p=Sidekiq::Processor.newp.start

Interally it has quite a bit of complexity - think of it like an iceberg. In order to test those complex internals, I make its internal API public so that the test suite has full access to the methods. The start method spins up a thread which calls a very simple run loop similar this:

defrunwhile!@donejob=fetchprocess(job)ifjobendend

I’ve kept the run method as simple as possible since we can’t call it in the test suite but we can call fetch and process in order to test them:

deftest_process_fake_jobp=Sidekiq::Processor.newresult=p.process(some_fake_job)# asserts...end

In this case, I’ve kept the thread management code as simple as possible and pushed as much of the code complexity into separate methods which can be called directly by the test suite and deterministically verified.

How do I test the thread management code? Simple: in some cases I don’t. 100% test coverage is for fundamentalists. Keep the code simple, verify it manually and then don’t change it. Code complexity leads to churn which leads to bugs. Since most of the complexity in Sidekiq::Processor is in the process and fetch methods, they are most likely to change so we test those methods directly.

Use Callbacks

If you must test multithreaded code, you’ll want to design testability into the API. Ever seen or written a test littered with sleep calls? We’ve all been there but you can test threaded code without sleep calls, I swear! Generally the pattern is:

  1. Start the other thread
  2. Tell the other thread to process something
  3. Wait for the result
  4. Assert results

Most people don’t know how to do (3) properly so they use sleep as a hack. Here’s a complete example of how to do it in Ruby:

require'thread'# We want to test Upcaser by exercising its full API,# including the internal threading.classUpcaserRequest=Struct.new(:args,:block)definitialize@queue=Queue.newenddefstart@thread=Thread.new(&method(:run))enddefprocess(*args,&block)@queue<<Request.new(args,block)trueenddefterminate@queue<<nilendprivatedefrunloopdoreq=@queue.popbreakunlessreq# perform the actual workresult=req.args[0].upcase# call the block with the resultreq.block.call(result)endendenddeftest_upcaserm=Mutex.newcv=ConditionVariable.newa=Upcaser.new# Step 1# tell Upcaser to start its internal threada.startresults=nil# the main thread will lock the mutex so it can pass data# to Upcaser and then wait for the resultsm.synchronizedo# Step 2# pass "something" to Upcaser for its internal thread to process# the internal thread must call the block with results when donea.process("something")do|res|results=resm.synchronizedocv.signalendend# Step 3# the main thread will wait here for Upcaser's thread to finish.cv.wait(m)end# Step 4# assert whatever you want about the resultsassert_equal"SOMETHING",results# shut down Upcaser's internal threada.terminateend

The “trick” is the callback block passed to the process method. That callback will save the results and unlock the main thread once Upcaser’s thread is finished processing. If your API exposes a similar callback mechanism, it can be properly tested across threads.

I hope this helps people untangle some of their messy threading. Got any other patterns for making threading easier to manage? Please link to them in the comments.

Contributed Systems: the 2015 wrapup

$
0
0

In July 2014 I started my company Contributed Systems and work on Sidekiq full-time. Let’s review how 2015 went. The end of January will be Sidekiq’s 4th birthday so here’s some birthday numbers:

 1st Birthday2nd Birthday4th Birthday
Downloads214,3001,192,2595,505,145
Stars214435355846
Closed Issues66314201887
Forks2665631003
Closed PRs228380836
Versions4474110
Customers25200675
Employees001


Good News

In 2015, I:

  • released Sidekiq Pro 2.0 with nested batches and expiring jobs
  • introduced Sidekiq Enterprise, with rate limiting, cron job and unique job features
  • released Sidekiq 4.0 with a simpler and higher performance core
  • released Sidekiq Pro 3.0 and Enterprise 1.0 major upgrades
  • attended Railsconf and Rubyconf, passing out t-shirts and stickers

As for business numbers, my revenue is up 2.6x YoY from 2014. Average sales price has doubled due to the introduction and demand for Sidekiq Enterprise.

When I started Sidekiq and Sidekiq Pro in 2012, I had three goals I never mentioned to anyone to measure my own success:

  1. change the standard advice of “use resque or DJ” to “use resque, sidekiq or DJ”
  2. pass resque as the most popular job framework
  3. make $1 million from Sidekiq Pro by selling 2000 copies for $500 over five years.

The latest State of the Rails Stack 2016 shows that I’ve blown away (1) and (2). Sidekiq now has more market share than Resque and DJ combined. Market leader achievement unlocked!

market share

(3) has been a learning experience: note the slow down in customer growth from 1 -> 2 -> 4 years. Like any business-focused niche software product, a small customer pool means a higher price is demanded (versus the original $500 price); it turns out to be easier to convert 500 customers at $2000 each. Given the amount of functionality in Pro and Enterprise, it doesn’t take long for businesses to realize that the value is there: no engineering time and effort necessary to build and maintain their own gems, a steady stream of fixes and improvements, expert support only an email away, etc.

(3) will be met: 2016 will be the year I pass $1 million in total revenue, right on schedule.

Bad News

At the end of 2014 I introduced Inspeqtor and Inspeqtor Pro as a second product line for Contributed Systems with the vision of building a modern process monitoring system.

I still believe the product is amazing – it’s the replacement for monit that I’ve always wanted – but it’s hard to argue with the facts: uptake and sales have been minimal. What’s there is stable and people are welcome to use it; I won’t be adding any more features.

Conclusion

Sidekiq has been a success, Inspeqtor a failure. In baseball, a .500 batting average is superhuman. I’ll take it. I hope your 2015 went well and cheers to 2016!

Viewing all 58 articles
Browse latest View live