Rediscovering Ruby: Abbrev
The Ruby API is full of hidden-in-plain-sight libraries that you’ll probably find extremely useful if only you knew they were there in the first place.For instance if you go to the Ruby Doc Standard Library right at the top, the only entry under ‘A’ is Abbrev, a one function API that might save some trouble when it comes to text processing.
The single function of the Abbrev library is to “calculate the set of unique abbreviations for a given set of strings”. Put slightly more simply given an array of strings, Abbrev will return a list of unique non-ambiguous prefixes for each string such that no two strings have the same prefix. So given the words ‘fox’ and ‘fig’ the unique abbreviations for ‘fox’ are ‘fo’ and ‘fox’ and for ‘fig’ is ‘fi’ and ‘fig’ with the prefix of ‘f’ being ambiguous and therefore excluded.
Abbrev has one method abbrev that can either be called as a
module method on the Abbrev module or mixed in to
Array. This method will return a hash comprising of the
abbreviation as the key and the original word as the value.
1 >> require 'abbrev' 2 >> [ 'Fig', 'Fox' ].abbrev 3 => {"Fi"=>"Fig", "Fo"=>"Fox", "Fig"=>"Fig", "Fox"=>"Fox"}
The first usage that I had for Abbrev was to generate short codes to use as labels given a list of names for a graph. Usually the labels all begin with a unique letter but in the odd case where that is not the case a two letter short label will suffice. First we get the list of abbreviations:
1 labels = [ "Fox", "Fax", "Dog" ] 2 abbreviations = labels.abbrev
Group the abbreviations by the label:
1 grouped_abbreviations = abbreviations.group_by{ |abbreviation,label| label }
And then create a hash grouping the abbreviations to the label
1 label_abbreviations = grouped_abbreviations.inject( {} ) do |hash, grouped_abbrevs| 2 label = grouped_abbrevs[ 0 ] 3 abbrevs = grouped_abbrevs[ 1 ].map{ |a| a[ 0 ] } 4 hash.merge( label => abbrevs ) 5 end
which produces the following hash:
{"Fox"=>["Fo", "Fox"], "Fax"=>["Fa", "Fax"], "Dog"=>["Do", "D", "Dog"]}
For each group of abbreviations calculate the shortest abbreviation
and create a hash to reference the short abbreviation given the
label:
1 short_abbreviations = label_abbreviations.inject( {} ) do |hash, label_abbrevs| 2 label = label_abbrevs[ 0 ] 3 abbrevs = label_abbrevs[ 1 ] 4 shortest_abbrev = abbrevs.sorty_by{ |abbrev| abbrev.length }[ 0 ] 5 hash.merge( label => shortest_abbrev.upcase ) 6 end
which produces the hash we want:
{"Fox"=>"FO", "Fax"=>"FA", "Dog"=>"D"}
In terms of visual appeal of abbreviated labels I’ve found that sometimes it’s best to strip vowels out of words. For instance for ‘Fox’ and ‘Fig’, ‘FX’ and ‘FG’ are much more readable than ‘FO’ and ‘FI’.
Farrel Lifson is a lead developer at Aimred.