Code to extract annotations from exported JSON

hmmm… my non-expert reading of the description of doParallel is that it’s intended to split compute work across your CPU cores, which isn’t exactly what i’m thinking of. i think a package like furrr might do more what i’m thinking of, which is splitting tasks (API requests) across workers.

for example, suppose it takes the API 0.5 sec to respond to each of 5 requests…

if you execute the requests sequentially, with a 1 sec delay between response and the next request, it would take you 6.5 secs to finish executing the requests:
image

however, if you stagger the requests by 1 sec (and run them as multiple parallel threads), then you can complete the requests in 4.5 secs (savings = 2 secs):
image

the difference in total execution time becomes more apparent if it takes the API 2 secs to respond each of the requests…

if executed serially, with 1 sec delay between response and next request, 5 requests would take 14 secs:
image

if executed in parallel, with start times staggered by 1 sec, the requests would take 6 secs (savings = 8 secs):
image

in the 2-sec response parallel case, you’re not hitting the server (scheduler) with multiple requests at the same time, and the server (workers) is never running more than 2 of your requests at a time. so you’re not in danger of overloading the server. it’s possible that the server may take longer to respond to each request, in which case, there is more overlapping of threads, but i think in that case, the server is smart enough to limit its work(ers) to, say, 4 of your threads at a time.

anyway, this parallel approach is how i do it in JS. so i was just wondering if there was an equivalent in R.

1 Like