7 daily use cases of Ruby String

This is an archive of blog post I wrote during my third venture (PullReview).

Strings are everywhere. You deal with String instances not only every day, but probably every minute. They came from files, databases, REST APIs, or you simply use them to print results. It's a pervasive representation, and Ruby provides plenty to ease its manipulation. But String comes with its own share of problems and you won't always find a quick solution in the doc like how to deal with invalid byte sequence or convert back a String to a Date with an uncommon format.

Below, I share 7 common use cases of String I met very often and should be useful to you.

  1. How to remove enclosing characters like parenthesis?
  2. How to compare two Strings insensitively?
  3. How to clean a String by removing newline, carriage return, leading and trailing spaces?
  4. How to convert a String to a Regexp?
  5. How to convert a String to a Date with uncommon format?
  6. How to remove accents?
  7. How to clean up invalid byte sequences?

1. How to remove enclosing characters like parenthesis?🔗

You retrieve data from a file:

line = '(this is a data from the file)'

The data is always wrapped by () that you don't want. You can get the data without them as following:

# => 'this is a data from the file'

This actually removes the first and last characters of the String.

reference: String#[], String#slice

2. How to compare two Strings insensitively?🔗

In an online shop, the user can search for product by their name:

products = [
# ...

search_string = "apple"

You don't want to force your users to be case sensitive (they aren’t), so you can do:

products.select { |product| product.casecmp(search_string) == 0 }
# => ["Apple"]

As pointed out by apeiros in the comments, it won’t work for any non-ASCII characters. So, a better version would be:

products.select { |product| product =~ /#{Regexp.escape(search_string)}/i }

reference: String#casecmp

3. How to clean a String by removing newline, carriage return, leading and trailing spaces?🔗

In a little script, you read data from the console:

command = gets

You can clean up what you read by simply doing:

command = gets.strip

Thanks to apeiros for making me notice that strip removes also trailing newline and carriage return. It's not needed to use chomp.

reference: String#chomp, String#strip

4. How to convert a String to a Regexp?🔗

You're developing a Gem that maintains a database of locations of files. In the configuration file, the user can blacklist specific pathnames with regular expressions. You retrieved one of them (you don't want to locate the files in your trash):

blacklisted_pathname = "\.\/\.trash"

You can then use it to directly check:

file_paths.select { |file_path| !(file_path =~ /#{blacklisted_pathname}/) }

or keep it for later:

r = Regexp.new blacklisted_pathname

Don't forget to correctly escaping in your String, or even better, use Regexp.escape as suggested by apeiros.

reference: Regexp#new, Regular Expressions literal and interpolation

Aside note: for that particular use case, you should probably use File#fnmatch

5. How to convert a String to a Date with uncommon format?🔗

In an old database, the dates are stored as String as following:

date = "1979;4;2"

You can convert it as a Date as following:

Date.strptime(date, "%Y;%m;%d")
# => ;

reference: Date#strptime

6. How to remove accents?🔗

You're building a web app and you'd like to allow a quick search among the members like

"Kurt Gödel"

Accents and diacritics could be easily forgotten, and you want to remove them from the searched string. You can do it easily with ActiveSupport

require "active_support/inflector" # not necessary if Rails

ActiveSupport::Inflector.transliterate("Kurt Gödel")
# => 'Kurt Godel'

reference: Inflector#transliterate

7. How to clean up invalid byte sequences?🔗

You read a text File encoded in UTF-8. You manipulate each of its line, but at some point you get the error

ArgumentError: invalid byte sequence in UTF-8

on the line

line = 'This is an invalid byte sequence 44'
# => 'This is an invalid byte sequence \xA4'

Well, you need to clean up your line before working on it by using scrub:

# if < 2.1, use the backport gem string-scrub
# by adding to your Gemfile
# gem "string-scrub"
# if >= 2.1, it"s part of String (yeah \o/)

# => 'This is an invalid byte sequence '

Old way:

line.encode!("UTF-8", "binary", invalid: :replace, undef: :replace, replace: "")
# => 'This is an invalid byte sequence'

But if you have non-ASCII characters, it will remove them. In that case, you should go for a solution as apeiros' one.

references: String#encode!, String#scrub! (2.1)

read also: Fight Back UTF-8 Invalid Byte Sequences

What's your daily use cases of Strings?🔗

Compared with daily use cases of Array, Hash, or Regexp, some for String are very basic, but very common. I love how Ruby allows you to deal with slicing for instance. It's very elegant.

What’s yours? What are your typical daily use cases of String and elegant Ruby to resolve them?

If you have any comment, question, or feedback, please share them with me.

Atom feed icon Subscribe to the blog!