Better binning for seasonality plots

For species with high observation counts, it is fine to bin by day. Doing so may reveal interesting features, such as spikes in the above data during the last week of April. As the bin width increases by row, W = 1,5,10,15,20,30, small-scale variations are lost in smoothing. Is there scientific rationale for excessively smoothing data?

My suggestion would be to make bin width a user-controlled parameter, and / or allow the graph to adapt to sample density, plotting by week or by day depending on sample size. I also voted for the previous request:

But think that allowing binning by week is only a start to what can be done in terms of improvement.


Hey, welcome to the forum! :)


@bradklee nice suggestion! Just a reminder that you can and should vote for your own feature requests


Adding a week of year binning is something we might provide as an option or default, but it’s unlikely binning by day will be implemented. Are there cases where you think binning by day would be truly necessary?

FWIW, the spike in the last week of April should be due to the City Nature Challenge.

Hi tiwane,

Thanks for the info about the spike.

I don’t know about interesting scenarios that favor binning
by day, but I’m sure there are some out there. Whenever there
is enough data, that would be my default. If there is even more
data, then binning by hour might be interesting.

The casual user planning a sight-seeing adventure might not
care for fine-grained statistics, but science users probably will.

For example, on the data above: If you baseline the papilionids
from Eurytides marcellus, you’re left with a late season peak.
The peak has essentially one parameter, it’s width. Too see the
similarity, the most easy direct way is to just bin the data by
{11, 20, 13, 17} for glaucus, polyxenes, troilus, cresphontes
respectively. All four have quickly emerging logistic peaks, with
roughly the same shape (You can sort of see this in the data
above, but it’s more prominent after baseline correction).


Sure, but science users (and anyone else for that matter) can download the data and do any sort of manipulations they please with it. I think when you start talking about logistic peaks and baseline corrections, you may be going beyond what to expect from a website interface.


Yes, that is a good point. I don’t want to complain too much,
especially because data download is easy. Ultimately it may
be possible to have algorithmic analysis, which automates
statistical summaries, Ex. number of broods observed, peak
observation day, and peak width. That would add value.


1 Like