Hm … I honestly do not really see the point for this – any human would immediately see the cat in your example without that bounding box? There are harder examples for sure where the animal is very small in the pic, but I think that in these cases a crop is preferable in any case?
AI can be useful, but it can always hallucinate and has an immense energy footprint; I personally would not use it where I do not see a substantial benefit.