When partitioning the work amongst threads, dividing the number of
objects by the number of threads may return 0 when there are less
objects than threads; this will cause the subsequent code to segfault
when accessing list[sub_size-1]. Allow some threads to have
zero objects to work on instead of barfing, while letting others
to have more.