Here’s the problem: I have 1 timeline, but 2 types of events appear on this timeline. I want to use events of type 1 as borders in my timeline, these borders will separate events of type 2 into groups.
For example: I want to know which bird sings first every morning after sunrise. I have a list of sunrises for the last month and a list of singing events. How do I get the result without doing a 2-level nested loop?
A 2-level loop will always work (but it will be the slowest)
I want to end up with a list of pairs of the sunrise time and the first song time.
A loop like this will always work:
for sunrise in sunrises: for bird_song in bird_songs: # if the song happened before the sunrise, check the next song if bird_song.start_at < sunrise.at: continue # if the song happened on the next day, then let's move to the next sunrise if bird_song.start_at > sunrise.day_ends_at: break sunrise.first_bird_song = bird_song # once we found our first song, we can move on to the next sunrise break
But it is looping over the
bird_songs too many times. For each sunrise, we start looping over
bird_songs from the 1st song on. Time-wise a classical 2-level loop is considered to take about $O(N^2)$ time.
True, in our case, we loop over 2 different lists, we have S number of sunrises and B number of bird songs, so we get O(S * B). Then we also
break out from the loop when we find the correct song, so we don’t really do the full B number of loops for every sunrise. But these estimations are always made with the worst-case scenario in mind, which is in our case: we have 10 million bird songs and all of them happen on the first day and before the sunrise, which brings us right into O(S * B).
One loop, two counters moving it forward
To make the above loop faster, we would need to loop over
bird_songs in alternation. We need to stop the
bird_songs loop when we find the desired song, move
sunrises 1 iteration further and then continue looping over
bird_songs at the exact location, where we stopped looping before.
sunrise_counter = 0 song_counter = 0 while sunrise_counter < len(sunrises) and song_counter < len(bird_songs): bird_song = bird_songs[song_counter] sunrise = sunrises[sunrise_counter] # if the song happened before the sunrise, check the next song if bird_song.start_at < sunrise.at: song_counter += 1 continue # if the song happened on the next day, then let's move to the next sunrise if bird_song.start_at > sunrise.day_ends_at: sunrise_counter += 1 continue sunrise.first_bird_song = bird_song # once we found our first song, we can move on to the next sunrise and next song song_counter += 1 sunrise_counter += 1
while-loop is extremely similar to the 2-level loop from above. It has the same
if-statements and the same
comments, but its time complexity is $O(S + B)$ instead of $(S * B)$. With 30 sunrises and 10M bird songs, we land at ~10M operations instead of 300M operations.
The biggest drawback of this approach: it’s much more difficult to read and maintain. This too is an important aspect of the code.
Choose the one, which brings you more benefits.