background: 10K+ compiled class files and sources that got out of sync1; need to figure out which sources are valid, and which ones are not.
decompiling things is the last resort, since sources produced are not easily
diff‘able against the sources you’ve got. the likes of diffj is not much help either, and i did not even want to go down the rabbit hole of normalization through obfuscators.
so if you do not feel like wielding antlr or javacc to normalize two sources, the obvious approach is to simply recompile and compare with the existing class files (just beware of missing class files that might not have any sources at all).
however, keep in mind that
javac by default includes line number table in the class file it produces2. this means that even if you added or removed a line of comments or even a blank line before any sources, it would result in a classfile that is different from the original.
sometimes you have another class inside the .java file (not to be confused with inner classes). in this case it gets compiled into a separate class file. so if your main class’ source code has changed, it will affect the line number table of the other class as well. this means that even though another class’ source has not changed, its generated class file will be different.
in my case i also had to check which jdk compiler produced the class files. one can always opt for
javap that prints minor and major versions, or if you are feeling manly enough, whip out your favorite hex editor and check bytes 6 and 7 (according to the vm spec). in general,
javap is the easiest way to check the internal structure of the class file.
finally, to diff files and directories i simply used svn – check the originals into the local repo, then put your stuff in a working copy – it will do the rest. after all, this is what it’s good at.
oh and why
CAFE BABE? look it up
1 this is a whole different and interesting topic – how any sort of generated content creates a possibility of this disconnect. all these xdoclets, jaxb-generated sources, and even compiled classfiles create artifacts that now have a potential of getting out of sync with the sources. yes, proper engineering practices mitigate the risks, but all things being equal, i like the fact that with scripting languages this problem is largely non-existent. what you see is what you run.
2 you could always run
-g:none to get rid of line numbers, but it was no help in my case.