Game engines only moderately benefit from multithreading. The main reason being that all of the work still has to be accomplished by the next frame otherwise the framerate drops, and there is only so much work that can be done in parallel (running at 60 fps means doing all of that work in 16 ms or less). Most game engines will be written in C or C++ and then there is a JNI layer that calls through to the native code layer. In fact, the Android SDK has tutorials on how to use the native development kit (NDK) to write an app without ever really having to go through the Java layer. There's a secondary benefit in that much of the logic can sometimes be reused cross platform (cf. Unity).
The second major thing that game engines try to avoid is creating garbage that has to be collected. The spikes that you're observing are due to the virtual machine running the code needing to pause to clean up the results of intermediate computations. My guess is that on your device it takes about 30-40 seconds for the main memory available for the app to be exhausted and then the garbage collector (GC) runs. This frees up some memory enough until a few seconds later at which point the freed memory has been exhausted again and the GC runs. Rinse and repeat. The likely culprit for this is in the ChangeFrame function where you are toggling between two different images every other frame. App Inventor does not cache images, so the causes the image to be read from the assets every time, scaled, etc. and then drawn. The previous image will hang around in memory until the GC runs to clean it up, resulting in the spikes while the world is paused for the GC operation. Some devices use parallel GC so that this is not needed but the actual functionality will depend on the Android version, etc.
One optimization you could do, although it is a bit of work,* would be to double the number of image sprites. This probably sounds counterintuitive, but the way it would work is that you would have 2 image sprites per entity, one with the rock_anim1.png image and the other with rock_anim2.png. Then, rather than swapping the images in/out you toggle the visibility of the two image sprites. This way, the amount of memory stays fairly constant once all of the image sprites have been initialized and won't require triggering the GC as frequently.
We use a similar technique in the PICaboo tutorial when swapping images of the crying/happy babies. By having the images already preloaded in memory it is much faster to change the visibility of the images between frames rather than reload the images on every frame.
* Technically all optimizations are work so it might go without saying...