With the latest I/O conference, Google has finally publicly announced its plans for its new runtime on Android. The Android RunTime, ART, is the successor and replacement for Dalvik, the virtual machine on which Android Java code is executed on. We’ve had traces and previews of it available with KitKat devices since last fall, but there wasn’t much information in terms of technical details and the direction Google was heading with it.

Contrary to other mobile platforms such as iOS, Windows or Tizen, which run software compiled natively to their specific hardware architecture, the majority of Android software is based around a generic code language which is transformed from “byte-code” into native instructions for the hardware on the device itself.

Over the years and from the earliest Android versions, Dalvik started as a simple VM with little complexity. With time, however, Google felt the need to address performance concerns and to be able to keep up with hardware advances of the industry. Google eventually added a JIT-compiler to Dalvik with Android’s 2.2 release, added multi-threading capabilities, and generally tried to improve piece by piece.

However, lately over the last few years the ecosystem had been outpacing Dalvik development, so Google sought to build something new to serve as a solid foundation for the future, where it could scale with the performance of today’s and the future’s 8-core devices, large storage capabilities, and large working memories.

Thus ART was born.

Architecture

First, ART is designed to be fully compatible with Dalvik’s existing byte-code format, “dex” (Dalvik executable). As such, from a developer’s perspective, there are no changes at all in terms of having to write applications for one or the other runtime and no need to worry about compatibilities.

The big paradigm-shift that ART brings, is that instead of being a Just-in-Time (JIT) compiler, it now compiles application code Ahead-of-Time (AOT). The runtime goes from having to compile from bytecode to native code each time you run an application, to having it to do it only once, and any subsequent execution from that point forward is done from the existing compiled native code.

Of course, these native translations of the applications take up space, and this new methodology is something that has been made possible today only due to the vast increases in available storage space on today’s devices, a big shift from the early beginnings of Android devices.

This shift opens up a large amount of optimizations which were not possible in the past; because code is optimized and compiled only once, it is worth to optimize it really well that one time. Google claims that it now is able to achieve higher level optimizations over the whole of an applications code-base, as the compiler has an overview of the totality of the code, as opposed to the current JIT compiler which only does optimizations in local/method chunks. Overhead such as exception checks in code are largely removed, and method and interface calls are vastly sped up. The process which does this is the new “dex2oat” component, replacing the “dexopt” Dalvik equivalent. Odex files (optimized dex) also disappear in ART, replaced by ELF files.

Because ART compiles an ELF executable, the kernel is now able to handle page handling of code pages - this results in possibly much better memory management, and less memory usage too. I’m curious what the effect of KSM (Kernel same-page merging) has on ART, it’s definitely something to keep an eye on.

The implications to battery life are also significant - since there is no more interpretation or JIT-work to be done during the runtime of an app, that results in direct savings of CPU cycles, and thus, power consumption.

The only downside to all of this, is that this one-time compilation takes more time to complete. A device’s first boot, and an application’s first start-up will be much increased compared to an equivalent Dalvik system. Google claims that this is not too dramatic, as they expect the finished shipping runtime to be equivalent or even faster than Dalvik in these aspects.

The performance gains over Dalvik are significant, as pictured above; the gains are roughly a 2x improvement in speed for code running on the VM. Google claimed that applications such as Chessbench that represent an almost 3x increase are a more representative projection of real-world gains that can be expected once the final release of Android L is made available.

Garbage Collection: Theory and Practice
POST A COMMENT

137 Comments

View All Comments

  • name99 - Wednesday, July 2, 2014 - link

    The obvious thing to do would be for at least some Android vendors to take responsibility for this, liaise with Google, and deliver this as a value-add for their phones.
    But that would require an Android vendor to think more creatively than "for my next phone, I shall triple the screen resolution, double the number of cores, halve the battery life, and thereby take over the market".

    Maybe Xiaomi could do this? They seem to be the only Android vendor that isn't 100% idiot. Amazon is the other possibility, and they actually have the tech skills to do it --- but of course there is the tricky problem of their negotiating with Google... (Do Amazon and other forkers even get ART or is that now a Google exclusive?)
    Reply
  • tipoo - Thursday, July 3, 2014 - link

    What I don't get is why people say the Windows Phone stores cloud back compiles to native every time someone downloads an app. Can't they just keep the native code for each device, and compile it to the full amount of optimization just once? Yes they have more devices than Apple, but they also tightly control which SoCs WP uses. Reply
  • dealcorn - Tuesday, July 1, 2014 - link

    Does ART eliminate residual X86 compatibility issues? If so, ARM loses home field advantage. Reply
  • johncuyle - Tuesday, July 1, 2014 - link

    Unlikely. Java applications should be the ones that worked anyway on x86. The applications least likely to work would be native applications, which a developer may not compile and distribute for x86. Those are most likely to be games, particularly since Google (bafflingly) discourages use of the NDK. Reply
  • Flunk - Wednesday, July 2, 2014 - link

    Why do you find Google discouraging the use of the NDK baffling? The whole reason is the subject of your conversation. Poor compatibility with multiple ISAs. Reply
  • johncuyle - Wednesday, July 2, 2014 - link

    It's not a great reason to discourage the NDK. Many applications are written to be cross platform and run successfully on multiple architectures. It doesn't even increase your test load, since even if you're writing your application in Java you still do need to fully test it on every platform. Test is usually the expensive part. The exact wording of the page is just odd. It says that preferring C++ isn't a good reason to write your application in C++. That's a pretty obviously false assertion. Preferring C++ is a great reason to write your application in C++. Reply
  • Exophase - Monday, July 7, 2014 - link

    Not directly, since this applies to DEX which never had a compatibility issue. But it might convince some app developers to stop using NDK due to the improved performance of DEX binaries. Reply
  • johncuyle - Tuesday, July 1, 2014 - link

    The frame drop counts seem very odd with respect to the total milliseconds delayed. (Or I'm bad at math.) At 60 FPS a frame is 16 ms. A 4ms GC sweep might drop a single frame at 60fps. The output indicates it dropped 30 frames. That's 750fps. Plausible if you're running without vsync or framerate limiting on a static screen like a splash screen, but that's not really a meaningful example, nor is it especially noticeable to the end user. More interesting would be the frequency of a frame drop in an application with extensive animation running at an average of 30fps. That's going to be a situation where you notice every frame drop. Reply
  • kllrnohj - Tuesday, July 1, 2014 - link

    It's a mistake in the article. Those log lines have nothing to do with each other. The Choreographer is reporting a huge amount of dropped frames because <unknown> took a really long time on the UI thread, *NOT* because specifically the GC took that time. This is actually pretty normal, as when an application is launched the UI loading & layout all happens on the UI thread, which the Choreographer reports as "dropped" frames even though there wasn't actually any frames to draw in the first place as the app hadn't loaded yet. So the 30 dropped frames there means the application took about 500ms to load, which isn't fantastic but it's far from bad. Reply
  • Stonebits - Tuesday, July 1, 2014 - link

    "Overhead such as exception checks in code are largely removed, and method and interface calls are vastly sped up"

    This doesn't make sense to me--are exceptions handled some other way, or do you just keep executing?
    Reply

Log in

Don't have an account? Sign up now