Aimred Developer Blog February 2010 Archive
Neat Ruby Tricks: take_while and drop_while
Enumerable is the swiss army chainsaw of Ruby, the functionality it provides to classes which include it is probably used countless times a day in your code. So when new features are added to Enumerable it’s probably a good thing to learn them because they have the potential to make your coding a little bit easier. In this edition of Neat Ruby Tricks we’ll look at two new methods that were introduced to Enumerable in 1.8.7: take_while and drop_while.
take_while is a function that takes a block and will return all the preceding elements of the enumerable object that satisfy the block and will then stop iterating when an element is found that does not satisfy the block. The following example illustrates this.
1 >> [ 1, 2, 5, 4, 3 ].take_while{ |num| num < 5 } 2 => [1, 2]
Only the first two elements are returned as they satisfy the condition specified in the block. Although both the last two elements (4 and 3) satisfy the block iteration stops at the third element (5) which does not satisfy the block.
drop_while is the inverse of take_while. It will exclude all preceding elements that satisfy the condition block and return all subsequent elements as shown:
1 >> [ 1, 2, 5, 4, 3 ].drop_while{ |num| num < 5 } 2 => [5, 4, 3]
So why use drop_while and take_while? Well it’s useful for certain types of queries, for instance if you had a list of temperature readings and wanted to find a number of readings that stayed above a certain threshold before dropping below it a combination of drop/take_while is much easier than using select. Given the following list of readings [22, 24, 25, 26, 27, 24, 25], if I want the first incident of temperature readings above 24 I would use
1 readings.drop_while{ |temp| temp <= 24 }.take_while{ |temp| temp > 24 }
which return [25, 26, 27]
If we tried to use a naive select to do the same thing
1 readings.select{ |temp| temp > 24 }
we would get [25, 26, 27, 25] which is incorrect because the last 25 element occurs after we have already dropped below the 24 threshold. It would take a bit of extra code using select to actually get the same result as using drop/take_while.
Another advantage of using drop_while and take_while is performance with sorted data. If we are interested in elements at the beginning of a sorted list it can be extremely advantageous to use drop/take_while as it can cut down substantially on the number of comparisons required. In the following benchmark I created an array containing a million numbers between 0 and 9999 and sorted it. I then used select and take_while to find all numbers less than 1000, 5000 and 10000, and to do the inverse with drop_while and reject.
1 require 'benchmark' 2 3 numbers = Array.new( 1000000 ){ rand( 10000 )}.sort 4 5 [ 1000, 5000, 10000 ].each do |index| 6 puts "\nBenchmark for x < #{ index }" 7 Benchmark.bm do |bmark| 8 bmark.report('select:'){ numbers.select{ |x| x < index }} 9 bmark.report('take: '){ numbers.take_while{ |x| x < index }} 10 bmark.report('reject:'){ numbers.reject{ |x| x < index }} 11 bmark.report('drop: '){ numbers.drop_while{ |x| x < index }} 12 end 13 end
Running the benchmark produces the following results:
1 Benchmark for x < 1000 2 user system total real 3 select: 0.460000 0.270000 0.730000 ( 0.724475) 4 take: 0.060000 0.010000 0.070000 ( 0.068649) 5 reject: 0.480000 0.260000 0.740000 ( 0.748344) 6 drop: 0.050000 0.030000 0.080000 ( 0.068893) 7 8 Benchmark for x < 5000 9 user system total real 10 select: 0.490000 0.230000 0.720000 ( 0.724887) 11 take: 0.190000 0.150000 0.340000 ( 0.340946) 12 reject: 0.500000 0.250000 0.750000 ( 0.740833) 13 drop: 0.220000 0.120000 0.340000 ( 0.341785) 14 15 Benchmark for x < 10000 16 user system total real 17 select: 0.500000 0.220000 0.720000 ( 0.732840) 18 take: 0.450000 0.240000 0.690000 ( 0.683183) 19 reject: 0.420000 0.300000 0.720000 ( 0.729714) 20 drop: 0.430000 0.260000 0.690000 ( 0.684751)
Due to the fact that the data is sorted we know that if x < 1000 is no longer true then it won’t be true for the rest of the data, so take_while will return the same result as select but on average do it in one tenth of the time. As the index gets larger the execution of take/drop_while tends towards to the same performance of select/reject as expected.
While take/drop_while will probably not be used as often as the workhorses of select/reject, in certain cases remembering that they are available can add a bit of speed to your code.
2010 Cape Town Ruby Brigade Dates
Mark your calendars for the following Cape Town Ruby Brigade meeting dates:
- 10th March 2010
- 14th April 2010
- 12th May 2010
- 9th June 2010
- 14th July 2010
- 11th August 2010
- 8th September 2010
- 13th October 2010
- 10th November 2010
Announcements for each meeting will be made closer to the time via Cape Town Ruby Brigade Mailing List.