Better binning for seasonality plots

bradklee · March 29, 2020, 3:05pm

For species with high observation counts, it is fine to bin by day. Doing so may reveal interesting features, such as spikes in the above data during the last week of April. As the bin width increases by row, W = 1,5,10,15,20,30, small-scale variations are lost in smoothing. Is there scientific rationale for excessively smoothing data?

My suggestion would be to make bin width a user-controlled parameter, and / or allow the graph to adapt to sample density, plotting by week or by day depending on sample size. I also voted for the previous request:

https://forum.inaturalist.org/t/plot-seasonality-by-week-instead-of-by-month/5559

But think that allowing binning by week is only a start to what can be done in terms of improvement.

–Brad

astra_the_dragon · March 29, 2020, 7:02pm

Hey, welcome to the forum! :)

deboas · March 29, 2020, 10:13pm

@bradklee nice suggestion! Just a reminder that you can and should vote for your own feature requests

tiwane · April 8, 2020, 5:46am

Adding a week of year binning is something we might provide as an option or default, but it’s unlikely binning by day will be implemented. Are there cases where you think binning by day would be truly necessary?

FWIW, the spike in the last week of April should be due to the City Nature Challenge.

bradklee · April 11, 2020, 1:36pm

Hi tiwane,

Thanks for the info about the spike.

I don’t know about interesting scenarios that favor binning
by day, but I’m sure there are some out there. Whenever there
is enough data, that would be my default. If there is even more
data, then binning by hour might be interesting.

The casual user planning a sight-seeing adventure might not
care for fine-grained statistics, but science users probably will.

For example, on the data above: If you baseline the papilionids
from Eurytides marcellus, you’re left with a late season peak.
The peak has essentially one parameter, it’s width. Too see the
similarity, the most easy direct way is to just bin the data by
{11, 20, 13, 17} for glaucus, polyxenes, troilus, cresphontes
respectively. All four have quickly emerging logistic peaks, with
roughly the same shape (You can sort of see this in the data
above, but it’s more prominent after baseline correction).

–Brad

jwidness · April 11, 2020, 1:45pm

Sure, but science users (and anyone else for that matter) can download the data and do any sort of manipulations they please with it. I think when you start talking about logistic peaks and baseline corrections, you may be going beyond what to expect from a website interface.

bradklee · April 11, 2020, 1:54pm

Yes, that is a good point. I don’t want to complain too much,
especially because data download is easy. Ultimately it may
be possible to have algorithmic analysis, which automates
statistical summaries, Ex. number of broods observed, peak
observation day, and peak width. That would add value.

–Brad

Topic		Replies	Views
Plot seasonality by week instead of by month Feature Requests	5	808	July 10, 2020
Graph data for night, dawn, day, dusk broken out by month General	5	757	September 4, 2021
Helpfull comments on Seasonality General	5	380	June 30, 2020
Display normalized phenological plots on taxon pages Feature Requests web	8	258	April 30, 2024
Filtering seasonality graphs by place General question	4	976	May 23, 2020

Better binning for seasonality plots

Related topics