Why does starting training take a while?

I’ve noticed that there’s always a pause between when the Trainer says that it starts training, to when training actually begins. This can be seen in the stderr of a training run, like so:

2019-09-25 02:41:45,616 - INFO - allennlp.training.trainer - Training
  0%|          | 0/1245 [00:00<?, ?it/s]
span_tag_accuracy: 0.9644, pos_accuracy: 0.0000, loss: 0.1180 ||:   0%|          | 1/1245 [00:35<12:25:58, 35.98s/it]
span_tag_accuracy: 0.9571, pos_accuracy: 0.0000, loss: 0.2097 ||:   2%|2         | 26/1245 [00:46<8:34:11, 25.31s/it]
span_tag_accuracy: 0.9588, pos_accuracy: 0.0000, loss: 0.1950 ||:   4%|4         | 51/1245 [00:59<5:55:40, 17.87s/it]
span_tag_accuracy: 0.9591, pos_accuracy: 0.0000, loss: 0.1972 ||:   6%|5         | 71/1245 [01:12<4:08:42, 12.71s/it]
span_tag_accuracy: 0.9596, pos_accuracy: 0.0000, loss: 0.1909 ||:   7%|7         | 88/1245 [01:22<2:54:59,  9.07s/it]
span_tag_accuracy: 0.9601, pos_accuracy: 0.0000, loss: 0.1857 ||:   8%|8         | 105/1245 [01:32<2:04:06,  6.53s/it]
span_tag_accuracy: 0.9603, pos_accuracy: 0.0000, loss: 0.1824 ||:  10%|9         | 122/1245 [02:11<1:38:13,  5.25s/it]
span_tag_accuracy: 0.9597, pos_accuracy: 0.0000, loss: 0.1815 ||:  11%|#1        | 142/1245 [02:21<1:10:19,  3.83s/it]
span_tag_accuracy: 0.9595, pos_accuracy: 0.0000, loss: 0.1819 ||:  13%|#3        | 163/1245 [02:31<51:02,  2.83s/it]  
span_tag_accuracy: 0.9589, pos_accuracy: 0.0000, loss: 0.1843 ||:  15%|#5        | 190/1245 [02:42<36:49,  2.09s/it]
span_tag_accuracy: 0.9581, pos_accuracy: 0.0000, loss: 0.1970 ||:  17%|#7        | 217/1245 [02:52<27:02,  1.58s/it]
span_tag_accuracy: 0.9577, pos_accuracy: 0.0000, loss: 0.2000 ||:  20%|#9        | 244/1245 [03:02<20:22,  1.22s/it]
span_tag_accuracy: 0.9574, pos_accuracy: 0.0000, loss: 0.2001 ||:  22%|##1       | 270/1245 [03:24<17:58,  1.11s/it]
span_tag_accuracy: 0.9575, pos_accuracy: 0.0000, loss: 0.1998 ||:  23%|##3       | 291/1245 [03:35<14:43,  1.08it/s]
span_tag_accuracy: 0.9574, pos_accuracy: 0.0000, loss: 0.2017 ||:  26%|##5       | 320/1245 [03:46<11:45,  1.31it/s]
span_tag_accuracy: 0.9578, pos_accuracy: 0.0000, loss: 0.1995 ||:  28%|##7       | 347/1245 [04:00<10:26,  1.43it/s]
span_tag_accuracy: 0.9581, pos_accuracy: 0.0000, loss: 0.1980 ||:  29%|##9       | 366/1245 [04:12<09:46,  1.50it/s]
span_tag_accuracy: 0.9579, pos_accuracy: 0.0000, loss: 0.1978 ||:  32%|###1      | 393/1245 [04:22<08:14,  1.72it/s]
span_tag_accuracy: 0.9580, pos_accuracy: 0.0000, loss: 0.1979 ||:  34%|###3      | 420/1245 [04:34<07:27,  1.84it/s]
span_tag_accuracy: 0.9581, pos_accuracy: 0.0000, loss: 0.1958 ||:  36%|###5      | 443/1245 [04:47<07:13,  1.85it/s]
span_tag_accuracy: 0.9581, pos_accuracy: 0.0000, loss: 0.1983 ||:  38%|###7      | 469/1245 [04:57<06:30,  1.99it/s]

Note how the first batch takes ~36seconds (time between initialization of tqdm reader and completion of first iteration), and subsequent batches are much faster (so it it/s goes down from there). What causes that pause in the first batch?

Probably relevant section of code: https://github.com/allenai/allennlp/blob/30c4271f7f04babb1cb546ab017a104bda011e7c/allennlp/training/trainer.py#L309

1 Like

I don’t know, but I think this is something we should look into at some point. @brendan has been looking at efficiency stuff lately, and this is definitely related. @nfliu, can you open an issue on github for this, so we can track it?