JRuby Internal Design

From JRubyWiki

Jump to: navigation, search

A quick writeup for those interested. This is from memory, I'm missing some details for sure. Disclaimer: Tom's the parser guy, so direct questions in that area to him.

This is based on what we plan to ship for JRuby 1.0...

Parsing/lexing:

  • JRuby has a YACC/BISON-based parser, basically Ruby's parser ported to Java and using the Jay parser generated.
  • The lexer is hand-written, as in Ruby.
  • The resulting AST has positioning information for both start/end lines and start/end offsets. This is to support IDE use.
  • We also save comments as they're parsed, to allow round-tripping the AST.
  • We're considering an experimental migration to XRuby's ANTLR-based parser post 1.0. I don't guess it would take more than a week.

Runtime layout:

  • The "Ruby" class is the entry point for the runtime. All objects created have a reference to the runtime, and do not / will not migrate across runtimes without marshalling.
  • Every thread using a given runtime is assigned a ThreadContext object, stored in a threadlocal. We minimize threadlocal lookup by frequently passing the ThreadContext on the call stack as a parameter.

More on threading:

  • Ruby threads are mapped 1:1 to JRuby threads, except in thread pooling mode (experimental) where a pool of threads is maintained from which to fuel Ruby threads. The pooling was added to offset the overhead of APIs that spin up many threads, assuming they're green/lightweight.
  • Java threads entering the system from "outside" the runtime are assigned a ThreadContext and "RubyThread" object. In our terminology, they are "adopted" by the runtime.
  • Ruby's unsafe thread operations (criticalization of a single thread, Thread#kill, Thread#raise) are supported, but we make no deterministic guarantees about them. Criticalization does not guarantee other threads have stopped before proceeding, and kill/raise require that the target thread eventually reach a checkpoint. Kill may wait forever if the target thread never reaches the "die yourself" checkpoint.

Interpreter:

  • The current interpreter engine is a "big switch", with each AST node having a switch value. Each case calls out to a separate method; this layout appeared to work best for HotSpot (presumably because the switch gets inlined into the cases, rather than vice-versa being too complicated to optimize).

Methods and dynamic dispatch:

  • Java based methods are generally bound to names using bytecode-generated adapter classes that directly invoke. This showed a substantial gain over e.g. reflection. We tried a number of other binding mechanisms, and this was easiest to maintain and gave the best performance.
  • There is a per-class method cache with a global flushing mechanism. In experiments, adding an inline cache has not shown any gains over a per-class cache, and since we control the class data structure we've had no motivation to use an inline cache.
  • We also support a fast dispatch mechanism based on Selector Table Indexing. All core classes are assigned an index, and all STIable method names are assigned an index. We then maintain an in-memory table mapping class index and method index to a switch value. Core classes then have a fast dispatch path based on that switch value, otherwise falling back to hash-lookup and method caches. STI shows a major performance boost for core method implementations. Reassigning/redefining any STI method zeros that position in the table, reverting to slow dispatch.

Compiler:

  • The compiler's AOT mode compiles one .rb file to one class. The resulting class is primarily a "bag of functions", with all methods static. When the methods are actually bound to specific names, the same adapter-generation happens to bind them.
  • Blocks and class bodies are compiled in the same way; one method in the resulting class per closure.
  • The compiler's JIT mode watches for interpreted calls to exceed a simple threshold; 20 calls seemed to be a good cutoff. It then attempts to compile it, permanently using the compiled code if it succeeds or reverting to interpreted if it fails.
  • The compiler is not optimized in any major way. It's almost entirely a duplication of the interpreter logic.
  • The resulting bytecode does not decompile to anything very Java-like. This is primarily because I only use Ruby local variables (in a local variable data structure) and just juggle items on the stack for the rest. The complexity of managing Java local variables across a visitor-like compiler design outweighed any gains.
  • Perhaps 50% or more of the interpreter code is now represented in the compiler. The remaining 50% is harder constructs I just haven't tackled yet.

Compatibility:

  • JRuby 1.0 is very close to 100% compatible with the Ruby language. The core libraries we *can* support are close to 100% compatible. There are libraries we can't support, generally in the areas of POSIX functions not supported by Java or platform-specific features like fork and symlinks.
  • JRuby on Rails is very close to passing all tests 100% now. By most accounts JRoR runs extremely well.
  • A number of C-based extensions have been ported to JRuby, with more on the way.

Pain:

  • When classes are generated at runtime, they are each wrapped with a classloader to allow them to be garbage collected. Super gross, but no other way to safely generate arbitrary numbers of classes without exceeding permgen. We need a "dynamic method" or "anonymous method" concept, or even a super lightweight class/classloader that would support the same semantics.
  • In order to support Ruby's wilder features, we need to be able to decorate Java's call stack with additional information. The only way to do this is to construct our own stack frames per-call and either keep them on ThreadContext (threadlocal) or pass them on the stack. This is pain point #1 for performance.
  • The other big performance pain is representing scope as an external data structure rather than as local variables. This is primarily to support closures and eval, which need to capture variables in scope. I'm not fond of Groovy's mechanism for this, which copies variables in and out. But our mechanism does mean more object overhead per-call.
Personal tools