Today's Question:  What does your personal desk look like?        GIVE A SHOUT

Why I love everything you hate about Java

  Nick Kallen        2011-11-29 08:48:15       7,100        0    

If you’re one of those hipster programmers who loves Clojure, Ruby, Scala, Erlang, or whatever, you probably deeply loathe Java and all of its giant configuration files and bloated APIs of AbstractFactoryFactoryInterfaces. I used to hate all that stuff too. But you know what? After working for all these months on these huge pieces of Twitter infrastructure I’ve started to love the AbstractFactoryFactories.

Let me explain why. Consider this little Scala program. It uses “futures”, which are a way to schedule computation to be done in parallel from the main flow of a program. They are sometimes a natural way of modeling the most efficient scheduling of program execution. Usually you schedule in advance some expensive work that can be done in parallel and then you do something else in the meantime. Only when you really need the result of the original computation do you block and wait (and hopefully only very briefly since you scheduled the work way in advance!). Here is a “typical” Java-ish Futures library used from Scala:

    private val executor = new ThreadPoolExecutor(
      poolSize, maxPoolSize,
      keepAlive.inSeconds, TimeUnit.SECONDS,
      new LinkedBlockingQueue[Runnable],
      new NamedPoolThreadFactory(name))

    val future = new FutureTask {
      doSomeWork
    }

    executor.execute(future)

If you come from a dynamic language like Ruby or Python you will probably have a visceral reaction like “Yeck! Look at all that horrible boilerplate. Convention over configuration!” Wouldn’t it be nice if you could just do something like:

    val future = new Future {
      doSomeWork
    }

It seems nice but its nicety is just an illusion. All that boilerplate is really important when you work at massive scale and where efficiency really matters. These magic numbers like the thread pool size and the kind of queue you use to schedule work can vastly impact the performance of your application. And the “right” configuration depends entirely on the nature of the problem you’re solving and how callers of this code behave. What all of this weird boilerplate provides is a way to configure the behavior of the system; it doesn’t assume there’s one right way of doing things. And that is precisely how modular software behaves: modular code is code designed to grow past the assumptions of just one user. Modularity really matters when your software isn’t a little throw-away program.

Twitter recently open-sourced Querulous, a minimal database querying library for Scala. We use it in several projects in Twitter, but it was designed principally to meet the extreme demands FlockDB, our distributed, fault-tolerant graph database. FlockDB demands extremely low-latency (sub millisecond) response times for individual queries. Any excessive indirection from an ORM would be unacceptable. Furthermore, because FlockDB processes tens of thousands of queries per second across dozens of shards, FlockDB must collect extensive statistics on the performance and health of the various shards in order to direct traffic to the most efficient place.

So Querulous was designed for querying databases at low latency, massive scale, and with easy operability. It has flexible timeouts, extensive logging, and rich statistics. But as FlockDB became more mature and sophisticated, the demands grew greater. We needed different health-check and timeout strategies in different contexts. It became clear that Querulous would need to be made extremely modular and extremely configurable to work at all.

So we set about to re-write Querulous using my favorite modularity techniques: Dependency Injection, Factories, and Decorators. In other words, everything you hate about Java.

The design patterns of modularity

In order for code to be modular it must have few hard-coded assumptions. In Object-Oriented software this means something very particular since the essence of an Object-Oriented program is that its structure is organized around the types of objects. Therefore, the most fundamental, anti-modular assumption in Object-Oriented software is the concrete type of objects. Any time you write new MyClass in your code you’ve hardcoded an assumption about the concrete class of the object you’re allocating. This makes it impossible, for example, for someone to later add logging around method invocations of that object, or timeouts, or whatever isn’t anticipated a priori.

In a very dynamic language like Ruby, open classes and method aliasing (e.g., alias_method_chain) mitigate this problem, but they don’t solve it. If you manipulate a class to add logging, all instances of that class will have logging; you can’t take a surgical approach and say “just objects instantiated in this context”. (Update: some people are asking “what about metaclasses? Metaclasses do not solve this problem at all because if you do not have control over the caller of Foo.new then you cannot later add new behavior to the metaclass; it has to be hardcoded at the site of manufacture. The point of this technique is to avoid knowing in advance what behavior you will add in, to make it configurable!)

There are standard design patterns to mitigate this, namely Dependency Injection, Factories, and Decorators. By injecting a Factory (a function that manufactures objects) as a parameter to a function that needs to create objects, you allow a programmer to later change his mind about what Factory to inject; and this means the programmer can change the concrete types of objects as his heart desires. And by using Decorators, the programmer can mix and match functionality easily, stack one thing on top of another like so many legos. Let’s look at an example.

Here I have a Query object, with methods like #execute(). I want to add timeouts around all queries. I start by creating a QueryProxy that routes all method invocations through an over-ridable method: #delegate:

    abstract class QueryProxy(query: Query) extends Query {
      def select[A](f: ResultSet => A) = delegate(query.select(f))
      def execute() = delegate(query.execute())
      def cancel() = query.cancel()

      protected def delegate[A](f: => A) = f
    }

Then, to implement timeouts, I create a Query Decorator:

    class TimingOutQuery(timeout: Duration, query: Query) extends QueryProxy(query) {
      override def delegate[A](f: => A) = {
        try {
          Timeout(timeout) {
            f
          } {
            cancel()
          }
        } catch {
          case e: TimeoutException =>
            throw new SqlTimeoutException
        }
      }
    }

This Decorator delegates to the underlying query object the execution of the query, but it wraps that execution in a Timeout.

As an aside, it is interesting to note that the Decorator pattern is just the Object-Oriented equivalent of function composition in a functional language. Scala makes this especially explicit since everything is both an Object and a Function (it is a function if it is an object that responds to the method #apply()). A Decorator around an object that only implements #apply() is pure Function-composition as you would see in Haskell, ML, and so forth. I might phrase this as: function composition is a degenerate case of the Decorator pattern.

The implementation of the Timeout function is shown for the curious. It uses threads and is weird but cool.

    object Timeout {
      val timer = new Timer("Timer thread", true)

      def apply[T](timeout: Duration)(f: => T)(onTimeout: => Unit): T = {
        @volatile var cancelled = false
        val task = if (timeout.inMillis > 0) Some(schedule(timeout, { cancelled = true; onTimeout })) else None
        try {
          f
        } finally {
          task map { t =>
            t.cancel()
            timer.purge()
          }
          if (cancelled) throw new TimeoutException
        }
      }

      private def schedule(timeout: Duration, f: => Unit) = {
        val task = new TimerTask() {
          override def run() { f }
        }
        timer.schedule(task, timeout.inMillis)
        task
      }
    }

(An alternative implementation of Timeout could use Futures, but that’s a subject for another blog post)

Modularity and testing techniques

One of the principal advantages of (or stated another way, one of the principal motivations for) writing Decorator-oriented code is how easy it is to write isolated unit tests of that code. To test the timeout functionality of the TimingOutQuery we don’t need to interact with a database at all. We can write behavioral/mockish tests like this:

    val latch = new CountDownLatch(1)
    val query = new FakeQuery(List(resultSet)) {
      override def cancel() = { latch.countDown() }

      override def select[A](f: ResultSet => A) = {
        latch.await(2.second.inMillis, TimeUnit.MILLISECONDS)
        super.select(f)
      }
    }
    val timingOutQuery = new TimingOutQuery(query, timeout)

    timingOutQuery.select { r => 1 } must throwA[SqlTimeoutException]
    latch.getCount mustEqual 0

If the timeout functionality was just inlined into the #select() method of the source code of the Query class, or “bolted on” as an alias_method_chain in Ruby (or added as “advice” in some AOP shit) you could not write this test without talking to the database and somehow finding a query that takes long enough that it will actually hit the timeout. Because we instead use Decorators, to test the code we can use a fake query that implements the Query interface but that doesn’t talk to the database at all. Here we use a CountDownLatch to “halt” execution for a bounded amount of time, thus triggering the timeout.

Tying it together with Factories

Back to our original mission. So now we have a way of layering on timeout functionality on top of a Query object. But how do we ensure that Timeouts get used when we want them to? The thing that glues this all together is to make sure that everybody that needs to instantiate a Query object never ever calls new Query directly. We provide instead a Factory as a parameter to the method that needs to manufacture the object. The programmer chooses which Factory to provide at runtime. Here is a Factory that makes TimingOutQueries:

    class TimingOutQueryFactory(queryFactory: QueryFactory, timeout: Duration) extends QueryFactory {
      def apply(connection: Connection, query: String, params: Any*) = {
        new TimingOutQuery(queryFactory(connection, query, params: _*), timeout)
      }
    }

Since TimingOutQueries are decorators around regular Queries, to manufacture a TimingOutQuery you have to first manufacture a regular Query. In this example, the TimingOutQueryFactory takes another Factory as an argument. This could be a simple QueryFactory or something more complex–allowing Factories to be composed indefinitely. With this we stack together timeouts, logging, statistics gathering, and debugging like so many pieces of legos. This smacks of the oft-ridiculed Java AbstractFactoryFactoryInterface. But let me put it bluntly: AbstractFactoryFactoryInterface's are how you write real, modular software–not little fart applications.

This seems like a bit of a mind-fuck because we here have Factory Decorators that take Decorated Factories that make Decorated Queries. It’s so meta! (Actually, “meta” in Greek means nothing like “meta” in English. “Meta” plus the accusative means “after” so Aristotle’s Metaphysics is actually just a book “after [the book on] physics”. Anyway.) So all these crazy FactoryFactoryDecorators sound kind-of scary at first but it is just the kind of abstraction on top of abstraction and closure under composition that allows complex software to be made simple. Manage complexity by taking many things and re-conceiving of them as just one thing; this one thing is then combined with many other things and the process is repeated up the ladder of abstraction until you reach the Godhead.

Taking this to the next level

To take this even further, let’s add a new feature: per-query timeouts. At one point in the history of FlockDB, there was a global 3-second timeout. This was really stupid given that our most common query has a latency of 0.5ms and a standard deviation of 2ms. If you have a global timeout you must set your timeout around your most expensive query not your most common query (otherwise, your most expensive query will always timeout!). But for a production system, cheap frequent queries, if they start exceeding 2 standard deviations, can take down your site. So a sensible timeout for these frequent queries is like 5ms. But we had it set to 3,000 ms!! Yikes. So let’s change it!

    class PerQueryTimingOutQueryFactory(queryFactory: QueryFactory, timeouts: Map[String, Duration])
      extends QueryFactory {

      def apply(connection: Connection, query: String, params: Any*) = {
        new TimingOutQuery(queryFactory(connection, query, params: _*), timeouts(query)) // YAY
      }
    }

That’s it. We’ve now implemented a new Timeout strategy in one line of code! And to wire it all together it is a piece of cake! Querulous makes no assumptions about how best to implement a timing-out strategy, it doesn’t even assume you’ll want timeouts (in fact, there are some cases you don’t want any timeouts). Querulous achieves modularity by providing an “injection point” for the programmer to layer on custom functionality. It takes QueryFactories as a parameter to the method, which can return arbitrarily decorated Queries.

I love this example because it’s so simple but yet it’s no toy. It also emphasizes the value of Dependency Injection more generally than just with Factories. We could have written the TimingOutQuery with a static global constant (probably the most common programming technique):

    class TimingOutQuery(query: Query) extends QueryProxy(query) {
      val TIMEOUT = 3.seconds

But intead it is injected as a parameter to the constructor to the TimingOutQuery:

     class TimingOutQueryFactory(queryFactory: QueryFactory, timeout: Duration) extends QueryFactory {
        def apply(connection: Connection, query: String, params: Any*) = {
          new TimingOutQuery(queryFactory(connection, query, params: _*), timeout)
        }
      }

This enables the TimingOutQueryFactory to invoke a function to choose the appropriate timeout for this query. In this case, we just look some shit up in a hash table (timeouts(query)) and we’re done.

Yes, all this FactoryFactory bullshit is exactly what you hate about Java. But it’s amazing not how just short this code is but that it could be configured by any programmer anywhere, regardless of whether they have access to the source code that actually instantiates and executes queries. Any user of Querulous can decide if she want timeouts or not, and she can decide if they also want debugging, stats gathering, and so forth–Querulous hard-codes no assumptions. So, yay modularity.

Source:http://magicscalingsprinkles.wordpress.com/2010/02/08/why-i-love-everything-you-hate-about-java/


JAVA  COMPARISON  MODULARITY  API 

Share on Facebook  Share on Twitter  Share on Weibo  Share on Reddit 

  RELATED


  0 COMMENT


No comment for this article.