Rails Freak

Permalink

Be pragmatic with your time

So you’re trying to write that awesome new feature you’ve been chomping at the bit for, right? Your gonna get it done and it’s gonna have so many bells and whistles! Let’s say you give yourself two weeks to get it done, but at the end of one week you realize you just can’t finish it in one more week, right?

Been there, done that.

Here are a few things that I’ve found that really help keep me on track and keep code flowing even when a feature seems so far away.

Keep it simple (commit)

So I have this tendency to go off and work, work, work. Code changes pile up until I finish a feature or a ticket and then I try to sort it all out in the end with git diff and git add -p. This is bad, I wouldn’t advise working this way. If you practice BDD, you should setup a describe block with some tests, make them pass and then do a commit. In the end, it will help you feel you are accomplishing things while making things like a rebase much smoother. Be atomic.

Hocus Focus

With so many distractions around today, it is a wonder people still get work done. You know what I am talking about: email, IM, and twitter! Maybe it is YouTube, gaming, or Huffington Post? Either way, I find I work best when I have a focus time. I set aside 1 to 2 hours a couple times a day in which I shutdown all distracting programs (ichat, campfire, irc, email, twitter, etc) and make sure I really focus on the tasks I need to do.

This focus time has really helped me increase the flow of code, and I would recommend something similar, maybe even the pomordoro technique if that is your thing. Focus on coding.

Don’t work in a vacuum

I work on a team of four developers and it is very important that we review each other’s code, discuss how to approach problems, and help out with pair programming. We have a 2.5 week iteration cycle, and then production is branched off from master, so that we can keep working on master. It is important that we push code daily to ensure we can easily work with each other.

The added benefit is that your topic branches will stay up to date with master and you will minimize git conflicts. Pushing at least once a day also adds to that snowball “accomplishment” effect as well. Feature or bug not completely fixed? That’s ok, push what you have with passing tests - then go back to being heads down for a few more hours. Share your progress.

Spec, spec, spec

I know this might sound goofy, but BDD really does work. Write your failing spec and make it green. Writing your specs force you to think about how your code works and what the interaction interfaces look like - this means you have clear goals when you start to write actual code.

I had a friend IM me to ask how to write a non-database model and he laughed when I told him to write specs for it. Don’t get me wrong, he tests sometimes, but he wanted to spike this. I told him to start with the tests, the DSL would write itself as he put tests together.

BDD is important because it makes you think about the code you write, but it is also important as a bug finding tool. I found myself writing a simple MVC stack the other day and I wrote the model specs to handle some special update methods, lo and behold the specs uncovered a bug I accounting for in my code. I wasn’t testing for that bug in particular, nor was I coding to make sure it worked, but I still found it by having those specs.

This one is really important. Spec, spec, spec.

Thanks for the mem…

I know some of this may seem like common sense, but this stuff kind of emerged when I started changing my patterns and habits to be more BDD and more team oriented as a coder. This is new for me because despite having worked in teams before, we always seemed to work separately.

If you glean nothing else from this article, please consider this: Spec first, it works. Commit early, commit often.

Comments
Permalink

Fun with ActiveRecord

Recently, I’ve been working on a medical provider (doctors, pharamcies, hospitals) matching system at change:healthcare. We have a clean/pristine medical provider table, but we have claims from insurance companies that don’t match up right. Most of the unmatched claims come from mis-spellings, mis-matched unique identifiers, or other small typos. By matching the claim to a provider in our table, we can then use the claim in our average cost and our savings calculations.

Currently this is being done by hand, we export a SQL query to CSV file and distribute that to a team of people to manually try to match through a simple search interface that already exists on the site. The goal is to make this less manual by offering an “admin area” where the team of people can view these unmatched claims, then click one and see suggestions for matches from our provider table. While this is still a manual process, it is much less time intensive and it also provides the first step towards possible automation in the future.

Let’s take a look at some of the difficulties I’ve had to address while work on this new feature:

Unions

The original SQL query I was given consisted of three UNION select statements, the most complex one containing GROUP BY and HAVING in the query. My original goal was to break the UNION statements into named scopes joined by OR statements. We already had searchlogic plugin installed so we went with that, it supports using OR for named scopes.

My approach was to break the three select queries into individual queries and run a count on the results. I then created the three named scopes and ran each of those and compared the counts. Everything was perfect! I was pretty excited… Until I joined them all with the OR statements. It turns out when you use a named scope and OR, the GROUP BY and HAVING statements apply to the whole query, not just the one part. The counts were off.

So I went back to the drawing board, but I had new information in hand, the queries were taking WAY to long to run during a normal web request cycle, on my small dataset the UNION took 50+ seconds and the OR statement took around 45 seconds.

Caching long queries

The new approach was shaping up using find_by_sql and we would run it nightly via cron to fill up a new table that housed all the unmatched medical providers and the medical claims they came from. By creating a separate table and model to fill, it allowed us to add database indexes and nice named scopes.

Pre-caching into a regular model also allowed us to create a “normal” controller to interact with our unmatched providers. The benefit with that is we quickly built out the easy actions/views so we could focus on search and the match making.

Locking

The last key concern was to avoid double working. Medical Provider matching is mostly done in an automated fashion, but some of the data is so mangled that we have to manually match the rest. It is a whole team effort and it is dreaded by most in the office, so we wanted to make sure the list of unmatched providers we show you is a list you can work on. The intended idea was to use pessimistic locking and SELECT FOR UPDATE to ensure we didn’t show an unmatched provider in the index if someone was working on it.

That didn’t work like we wanted. We still kept the SELECT FOR UPDATE on the edit query to ensure integrity, but we also added an in_use timestamp and a named scope (not_in_use, I know, real original) to ensure we don’t show any records that are being matched up by someone else.

Know your tools

I think the great part about working on a more out of the box feature like this is that you get to explore different ideas and try new things, ultimately learning how to use everyday tools better. This is only my second time using locking, and I had no idea that mysql didn’t offer a row level lock that stopped reads from happening.

Comments
Permalink

Mocking and Stubbing can be evil

Lately I’ve been seeing some things that make me worry. Mocking and stubbing can be very helpful tools when you use them right, but I am not seeing them used right!

Why stubbing is dangerous

Don’t get me wrong, stubbing is good when you need to decouple one class from another, or your app from a service. My problem is with functional testing and view testing with mocks and stubs - what’s the point? If I make a change to a model and the controller throws a 500 because of the change, don’t you want to know about it? If you setup a stub and then add expectations for model calls and fake return values, you won’t get any red flags.

Twitter Poll

Before we get into the nitty gritty with some real code examples, let’s take a minute to look at some exchanges on this very subject via twitter:

  • @joshowens - “Do you stub model calls in your controller tests? I don’t think you should, if the model breaks, so does the controller, tests should fail.”
  • @aeden - “@joshowens if your controller tests are functional tests then you should consider it, if they’re integration then you shouldn’t.”
  • @joshowens - “@aeden They are functional, but if the action breaks on the controller level and not in the view, shouldn’t the test catch that?”
  • @aeden - “@joshowens a functional test should catch broken controller logic…not broken model logic (that’s reserved for unit tests)”
  • @dougalcorn - “@joshowens @aeden with how little logic should be in your controller, I’ve quit writing functional tests and only do integration with webrat”

Some code for your enjoyment

Let’s assume we have a Saving Model and a Savings Controller. Saving has a state field and uses a state machine plugin to offer transition methods.

Now let’s look at a quick spec sample with some stubs

Alright, everything is looking good, we are green and the pages are loading in the browser. But wait, the client changes their mind, they don’t like the states “pending” and “used”, they want to use “new” and “completed”. With this change, we are going to change the transition methods to match, we will now use @saving.complete! in our update action.

I change our model specs, they go red, then I fix the code to support the new states and methods. I run the full suite and get green… Wait, green?!? Yip, because we stubbed out the calls, we never catch the broken controller.

A better shade of green

Here is how I would approach a better test that would catch our failure:

Let’s get real!

So now that I’ve given you a few examples, let me give you some real experience on why false passing tests are bad. At change:healthcare we have 19,000 LOC with a 1:2.5 code to test ratio - that doesn’t include our custom plugin across shared apps. Even with a full six months of working on the app, doing a major overhaul of the system can be problematic if you don’t have good tests. How can you be 100% confident of rolling out your new code when you haven’t seen and touched 100% of the app?

At change:healthcare we rely on a robust test suite, a hand rolled continue integration suite, and vigorous staff testing to ensure we get everything right. Even with all these efforts, we miss occasional bugs. Better tests (avoiding false passing tests) and better code coverage (81.9% right now) are the best way to catch these bugs! With a code bed so large, the test suite is used as much for regression testing as it is for anything you gain with TDD/BDD.

Personally, I am with Doug’s tweet above, forget functional testing and even view testing and head straight for the integration testing. If you are doing it yet, look at webrat and cucumber. After all, we want a rails app that works from top to bottom, right?

Comments
Permalink

Building an api

Ever thought about building an api for your web app? Not sure where to start, or perhaps the best practices for a rails app? This post will explore some of the choices I made when building a new API for Change:Healthcare.

Versioning

One of the key things I knew needed to happen was versioning. Having used many other APIs before, un-versioned APIs like twitter can be very frustrating. They require you to stay lock and step with the development team, especially breaking API changes. I won’t go into the how too much, I followed John Barnette’s advice.

Versioning covers a lot of pain points in deal with APIs, but it definitely falls flat in a few spots. RESTifarians argue that using versioning in your url means you break with strict RESTful routing and principals. There is also the matter of just doing a straight copy from one version to the next, it creates a lot of extra code to get that versioned benefit.

Encrypt for security

Another key concern when writing this API was to ensure passing sensitive data was as secure as possible. We already require SSL for all API interactions, but we needed to take it farther. We quickly settled on a two way encryption scheme, using AES 256 Bit.

Why AES 256 bit? We wanted something well supported and well documented. We use ruby to build our apps, but not everyone does. A quick search turned up easy ways to deal with AES in Java, Php, etc. One other key point to remember is to use AES256-CBC, as it requires a initialization variable (ivar) to start the encryption and is more secure than just using all 0’s to start encrypting all the blocks.

Supporting JSON

The decision of what formats to support was left up to me, and I decided we would start with JSON, as it is so easy to implement and support. Since rails 2.3 switched to rack, it is very easy to switch out your own Param Parser if you need it, but the built in one works nicely.

In some spots we used the built in .to_json support for objects, and it worked out very nice - we implemented some of those methods very quickly. In other areas we needed custom output, so we wrote helpers to take the object and output the strict JSON we needed. Note the use of .to_json in our helper method below - we used it to easily output proper null when ruby had a nil piece of data.

Write documentation

API adoption relies on a few different factors, but none of more important than good documentation. Having worked with other APIs from eBay, amazon, and twitter - it is easy to realize you can’t get started without good docs. I made the decision to follow a format similar to twitter’s wiki documentation for their API.

We really liked twitter’s inclusion of example return values and curl examples. The example return values are great because you can easily copy and paste them into tests/fixtures/factories when writing your tests. The curl examples are super helpful as well because you can actually figure out how to play around with the API without writing a single line of code.

Bringing it all together

If you glean only one thing from this post, I hope it is that writing an API should be something you put a real effort into. Don’t half ass it, people will be able to tell. Spend the time to figure out your approach, your input and output formats, and above all write documentation!

Comments