First, this is technical about the image processing in the pipeline for the AI model.
In an image recognition project I was working on we could improve the accuracy of the model by filling up the image to a square with transparent pixels before we cropped it down to the requested size. We transformed the images to PNG. The only issue we had was the needed hardware upgrade due to the higher workload with the increase of the image sizes and the additional transformation step.
I also don’t know if our data scientists had this idea by themself or read it somewhere and it is known to the AI community.
Hope this helps.
I would love to learn more about what “Cropping Change” entails, as it seems to have a significant positive effect on arachnid IDs, and maybe some other groups that I don’t care as much about ;)
Each bar shows the accuracy from Computer Vision alone (dark green), Computer Vision + Geo (geen), and Computer Vision + Geo + Cropping Change (light green). “Cropping Change” is a slight modification to the way images are prepared before they are sent to the CV model that resulted in an average 2.1% improvement.
puter Vision + Geo + Cropping Change (light green). “Cropping Change” is a slight modification to the way images are prepared before they are sent to the CV model that resulted in an average 2.1% improvement.
What kind of project was it and how much changed, improved the accuracy of the model ?
he cropping change resulted from some method improvements to how images are processed before they are sent to the computer vision model that we made between v2.11 and v2.12. We didn’t have capacity to make additional method improvements between v2.12 to v2.13, but that cropping improvement is still in place and is reflected in the accuracy of v2.13 (compared to what it would have been had we not made those changes).
The cropping change steps from the fact that the computer vision model needs to examine a square image, which means when dealing with non-square images we have options like squeezing and clipping. Based on some experiments, we made some small changes to this processing pipeline which yielded the ~2% improvemen
We agree there would be certain advantages to using the same 1000 photos to evaluate all models, but we’re not currently doing that because of the complexity involved in holding out that test set given the dynamic nature of iNat. There’s also significant taxonomic drift between data/model versions which add complexity and is why we’re currently focused on just comparing 2 models (the previous model to the new model) rather than trying to track improvements across multiple models - though we agree that would be ideal.
The project was about the identification of objects from data which were taken mostly by cellphones. (I can’t get into more details as I had to sign an obligation of confidentiality :-( The point is, that the images could be taken in landscape or portrait format. In conclusion, we could not use a rectangular image format for the model.
For the required data recognition the proportions were needed, so squeezing could not be used. This was the key point to make the image square before you downsize it to the model’s required size.
I am not a data scientist (but I worked with neuronal networks since my PhD) and was not involved too deeply in the tuning process of the model. I was dealing more with the management of the AI team, but I followed the daily standups and the technical discussions.
The system is productive now, and a different company is responsible for running it and retraining, so no more information about fine-tuning results.
In the fine-tuning during system development, we got the system from around 93% to nearly 96%. The SLA was > 95%, so we did not continue, time is money :-( , but this improvement was not only due to our preprocessing, other hyper-parameters were tuned as well.
The key point was that the preprocessing step almost always had a positive impact, reaching from something around -0,1% up to around +0,8%.
But as i stated earlier this is a rather difficult forum with too many options and a works far different from the other forums i use…so in general i should not post on this forum (but only read)
From the discussions here I suspect with cropping you speak from data argumentation. Especially in AI for image analysis, you transform the images before you feed them into the input layer of the network. The AI frameworks provide functions for that. I don’t know what is used here, most likely Tensorflow, or PyTourch, but they are all quite similar. In most of the AI projects, we are using Tensorflow . (In academics more PyTorch)
What you are doing is transforming the image, for example, you crop a part of it, you can rotate it, mirror it, usually horizontally and so on. During training, this is applied to increase the number of pictures and reduce overfitting. For human recognition, these transformed images will look quite the same, but not for the network. If you have a color image, for example 512x512, one neuron “sees” one pixel, so you have 512x512x3 neurons in the input layer (3 times for the colors RGB), which means the single neuron will always “see” something different.
If you want to improve the quality of the images you transform them additionally before feeding them in the network. Same as you do in Photoshop improving your pictures (I found an example: https://www.geeksforgeeks.org/ai-in-image-processing/).
Yes, but, on the iNat forum, use your mouse to select text, then hit “quote” on the thing that pops up. Then write what you want to with under the quote.
To that post. Technically I don’t understand the point that it is generally difficult to always select the same images as testdata. I don’t know your system architecture but I suspect that you have a database to store the images themselves, or you store the location of the images. If you want always the same images, I would flag them in the database with an additional boolean column, like isTestImage.