hmmm… my non-expert reading of the description of doParallel is that it’s intended to split compute work across your CPU cores, which isn’t exactly what i’m thinking of. i think a package like furrr might do more what i’m thinking of, which is splitting tasks (API requests) across workers.
…
for example, suppose it takes the API 0.5 sec to respond to each of 5 requests…
if you execute the requests sequentially, with a 1 sec delay between response and the next request, it would take you 6.5 secs to finish executing the requests:
however, if you stagger the requests by 1 sec (and run them as multiple parallel threads), then you can complete the requests in 4.5 secs (savings = 2 secs):
…
the difference in total execution time becomes more apparent if it takes the API 2 secs to respond each of the requests…
if executed serially, with 1 sec delay between response and next request, 5 requests would take 14 secs:
if executed in parallel, with start times staggered by 1 sec, the requests would take 6 secs (savings = 8 secs):
in the 2-sec response parallel case, you’re not hitting the server (scheduler) with multiple requests at the same time, and the server (workers) is never running more than 2 of your requests at a time. so you’re not in danger of overloading the server. it’s possible that the server may take longer to respond to each request, in which case, there is more overlapping of threads, but i think in that case, the server is smart enough to limit its work(ers) to, say, 4 of your threads at a time.
anyway, this parallel approach is how i do it in JS. so i was just wondering if there was an equivalent in R.