i suspected this is likely at least partially responsible for the unexpected results. see: https://forum.inaturalist.org/t/seems-like-the-cv-suggestions-in-the-web-upload-screen-might-be-based-on-a-cropped-image/51000
i supect that some of these images are already challenging for the computer vision to analyze in the first place for various reasons (ex. that first caterpillar really does sort of look like a cactus even to human eyes), but then if you add some unexpected cropping, you make things even more challenging for the CV.
for example, compare the results that the CV returns for the two images below. the uncropped image shows the entire outline of the beetle, but the square center crop makes the bottom edge of the beetle fall just outside of the bounds of the image, and look how much that changes the results. in particular, notice how the score of the first suggestion drops dramatically. (i assume this is because losing that bottom edge of the beetle introduces some uncertainty – ex. does it have a tail?)
uncropped:
square center cropped: