Michal Cierniak – In my opinion modularity will not bring significant performance wins. I agree, but that’s not the point. Java needs modularity and ahead of time compilation so that it can start fast and share memory between processes, otherwise Steve Jobs has a point.
I don’t hear many folks, that run Java applications, complaining that Java’s performance is poor once the application is running. However they do complain that the applications take a long time to start and that when running many Java applications LOTs of memory is consumed. I work for a company that makes use of three significant java applications. No one would claim that they start fast, but my biggest complaint is that these three applications, that are all built on the same code base, share no memory at runtime, none, zip, nadda, nothing.
It’s like we have gone back 40 years in computing history. There used to be a time that programs didn’t share any memory but that went away with the introduction of dynamic linking and shared libraries. Most programs use DLLs (Windows) and Shared Objects (Unix) to ensure fast startup and efficient memory sharing and all of this is managed by the OS. If two concurrently running programs both decide to use the same spell checking library then that library is memory mapped into the address space of the two programs by the operating system i.e. there is only one copy resident in memory. In java this is not true, there is a unique copy resident in each address space.
So why can’t Java programs share memory given that shared libraries are a technology that’s been in widespread use for at least 25 years? It’s really for two reasons. The first is that up until this point there has really been no way to define a shared library in Java i.e. here’s a bunch of code and here is its external interface. JSR294 (modularity for Java) addresses this point. The second is that Java has no efficient AOT compilation and so unlike most other statically typed languages it can’t utilize the traditional OS functions to load and share its libraries. I have belabored this point in the other "Worlds Apart" posts here and here, pointing out that this is a key differentiation for CLI implementations such as .net or mono. Miguel (mono lead) also has an excellent post that provides a good overview of mono’s AOT support and the rationale behind it. JSR294 could be designed in such a way to provide for efficient AOT, and I belabor this point here.
It’s also not like the problem of memory sharing and startup isn’t well understood in the Java world. Sun produced a paper 5 years ago trying some crazy dynamic memory sharing thing and followed that with a paper the very next year that deduced instead that shared libraries were the way to go as the first "dynamic" approach "required complex engineering" while using shared libraries was "simpler" and "relies on existing software and OS mechanisms".
However the opportunity to produce shared libraries from Java code has not been seized upon (IMO it’s too hard with custom classloaders) and there have been many other attempts at sharing memory in Java, non of which have yet proven successful. C# and CLIs, which were developed with the benefit of experience with JVMs, made efficient AOT a priority and reap the benefits of faster startup and better sharing.
So why not take the opportunity that JSR294 is providing and level the playing field with CLIs by using it to provide efficient AOT for Java that in turn allows Java code to compile to shared libraries? As a result Java will start faster and share memory between processes. Without this CLIs will maintain a significant advantage and they have enough of those already.
http://www-128.ibm.com/developerworks/java/library/j-ibmjava4/index.html
Hi Rob,
I think that we’re in agreement. In the email quoted by you I didn’t claim that AOT makes no sense. I was just commenting a really minor point that many people miss. This point is that in many AOT implementations (and I gave early CLR implementations as examples), AOT doesn’t provide noticeable performance improvements in IN THE STEADY STATE (in fact it provides worse perf) but it provides huge wins for startup times and that is considered by many people (including apparently you and me) a great tradeoff for the client platform.
Keep writing the great posts!
Michal
P.S. I can help to note that you managed to misspell both my first an last name. :)
Hi Michal,
For client platforms, I would not reduce performance of AOT compiled code solely to start-up time.
You’re talking about STEADY STATE and you are right. But you did not mention HOW FAST this state can be reached for __desktop__ apps. (obviously, that’s not an issue for server applications that run for hours and days)
A simple counterexample is FLAT application profile – instead of a few hot methods which can be easily compiled by an adaptive optimizer, there are lots of “warm†methods which should be
– detected (profiled)
– compiled one by one, if any
Both tasks take time and the app runs slowly because it’s not yet reached the steady state (and that’s in the eye of the beholder). Moreover, such warm methods may simply become below “optimization thresholdâ€. An example is the Swing API and I used to write about it
http://linux.sys-con.com/read/46901.htm
————-
>>in fact it (AOT) provides worse perf
Are you pretty sure? ;)
In theory, profile info enhances inline, BB placement and regalloc to some extent. However, at large, it mostly saves compilation time which is a critical resource for dynamic compilers.
In practice, on many tests, the AOT compiler I’ve been working on (for a long long time ;) outperforms dynamic JVMs after they have reached the STEADY STATE, including such a “monster of rock†as HotSpot6 Server.
Of course, there are tests on which the JVMs beat AOT compiled code too. Not to mention that they beat each other on different tests in various combinations. ;)
Summary:
There’s no “world’s fastest JVM†(despite one large vendor used to claim ;)
“JIT is better than AOTâ€, “AOT is better that JIT†are two popular myths.
Both technologies will meet at some point in the future: AOT will use off-line profile (yes, we have a proto ;),. JITs will use AOT compiled code what IBM J9 already does since Java 5.0 for certain platform classes.
Take care,
–Vitaly
P.S. BTW, did you try to implement the AOT approach for Star JIT like IBM did for J9?
Hi Vitaly,
I think that it is again a case of having the same point of view. I’m not sure why my position on this issue keeps being moved into some direction. You quote me as saying:
>>in fact it (AOT) provides worse perf
but taking it out of context like this makes no sense. Here’s my full statement:
> in many AOT implementations (and I gave early
> CLR implementations as examples), AOT doesn’t
> provide noticeable performance improvements
> in IN THE STEADY STATE (in fact it provides
> worse perf)
and yes, I am sure of this statement. The slides by one of the Microsoft architects that I referred to in my original email clearly say that for the early CLR implementations, the steady state performance is worse. Those slides even give numbers for slowdowns of specific applications. I don’t see how this simple fact can be debated.
Clearly this doesn’t generalize to statements about comparisons of two arbitrary AOT and JIT implementations and I fully expect that one can come up with examples to show better perf of either approach. So, I have no problem believing your claim that your AOT system is better than HotSpot6.
Furthermore I really think that I agree with your more general comments. I think that we absolutely need more R&D in AOT implementations and I’m glad that it is being done. I think that in the foreseeable future there will be space for both implementation strategies. Specifically the PreJIT approach for CLR made a lot of sense. I do not have much experience with AOTs for Java but I certainly believe that it is also likely that this is a great implementation strategy.
> did you try to implement the AOT approach for
> Star JIT like IBM did for J9?
I didn’t actually work on StarJIT, so I will let someone from that team answer that question.
Michal
FWIW we AOT-compile all the Java code shipped in Fedora core using GNU gcj.
For various reasons gcj doesn’t perform as well as the best available JITs. And given things like the somewhat dynamic nature of Java, I doubt it could ever be as good as, say, HotSpot for some domains. E.g., things like binary compatibility incur a heavy performance penalty in our approach.
However for some areas it is better in other ways: startup time is quicker (and could be made even faster), and use of shared libraries makes it a bit friendlier for use in desktop applications.
Hi Michal,
>>but taking it out of context like this makes no sense. Here’s my full statement:
Correct, my fault.
>>The slides by one of the Microsoft architects that I referred to in
>>my original email clearly say that for the early CLR implementations,
>>the steady state performance is worse
Of course I read this preso and I noticed that two cited performance tests are bytemark (simple microbenchmark) and ASP.net (ASP engine, that is, a server app). On such tests, JVMs easily reach the steady state but this is not always the case for desktop apps.
I also noticed that the author of this preso added “Future: PreJit will have better performance†as the last point on this slide.
>>So, I have no problem believing your claim that your AOT system
>> is better than HotSpot6
Please do not take it out of context. I said “on some tests†;)
–Vitaly
I’m all for AOT in Java, perhaps it will solve this http://singlethreaded.org/blog/2007/06/05/how-fast-is-fast-enough/ problem