Ruby Email Validation with Regex

Last Updated Aug 22, 2022

Table of Contents:

Regular expression, or regex, has been around us for many years, but it is one of the most widely used programming syntaxes today. You can define dynamic rules within a compact string to check conditions or extract matched strings out of the large text. A regex is also an essential tool in the Natural Language Processing (NLP) machine learning domain.

Plus, machine learning engineers and data scientists use this syntax to clean a vast amount of text before training an ML model. Since regex works fast and optimally with large text datasets, data engineers often use it to transform text data.

Ruby also supports regular expression. One of the most frequent use cases of using regex is validating email addresses. As you read through the article, you will be able to learn various email validation methods and get familiar with Ruby regular expressions.

Don't reinvent the wheel.
Abstract's APIs are production-ready now.

Abstract's suite of API's are built to save you time. You don't need to be an expert in email validation, IP geolocation, etc. Just focus on writing code that's actually valuable for your app or business, and we'll handle the rest.

Get started for free

Ruby regex syntax

Let’s start with learning how we can use regex in Ruby in general. You can use the standard ruby library to apply regex validation.

Forward slashes with equal tilde symbols


# Capture the word, "regular"
# This prints out 10
puts "Let's use regular expression in Ruby." =~ /regular/

The syntax result will return 10 which is the index of the first occurrence that matches the regex. If the text does not contain a string that is matched, it will return nil.

Match function

If you want to use the if statement with regex, you can use the match function.


if "Do you want to use regex in Ruby?".match(/regex/)
  puts "Matched"
end

The result will be printing out “Matched” since the text contains the word, “regex”.

These two options are useful when you want to detect a word or words in a text. If you want to check characters, you can use a different regex syntax.

Character group

A group of characters can be included within square brackets ‘[]’. 


def contains_abc(str)
  str =~ /[abc]/
end

puts contains_abc("this is alphabet") # returns 8
puts contains_abc("regex")  # returns nil

In the contains_abc function, it uses the equal tilde symbols to return the first occurrence index. At least one letter must be matched by the group. The result of the first function call returns index 8 and the next one returns nil.

Range

You can use a range in the regular expression and this can make regex very compact. 


# Returns 42
puts "Let's learn regular expression in Ruby in 5 minutes." =~ /[0-9]/

# Returns 21
puts "1 2 3 4 5 6 7 8 9 10 these are numbers." =~ /[a-z]/

The first regex, [0-9], means it wants to match a number between 0 and 9. In the first sentence, there is the number, 5, where the regex check stops and returns the index. The second regex, [a-z], tries to find any alphabet character between a-z that exists at index 21 from the sentence. When you use the alphabet range, note that it is case-sensitive.

Character class

The character class in the regex works like a function. Using this, you do not have to list entire characters or numbers. Among many classes, these three will be most frequently used.

  • \w matches any word that contains numbers.
  • \d matches any numbers.
  • \s matches a space.

You can use the character classes below.


# Return 5
puts "!@## find a word block." =~ /\w/

# Return 22
puts "Let's find the number 7 in this text." =~ /\d/

# Returns 5
puts "Hello world!" =~ /\s/

Why do we validate emails using regex?

We learned how we can use the regular expression in Ruby. Before jumping into developing email regex patterns, let’s discuss the reasons why we want to do validating email addresses.

Validating email format

When you sign up for an online service and create a new account, you always have to provide an email account these days. What happens when you type your email is that the website checks if it has the right format. According to the email address standard, an email pattern must meet the following conditions.

  • Alphabet letters, numbers, and specific special characters including underscores, periods, and dashes
  • Underscores, periods, and dashes must be followed by one or more letters or numbers.

Let's have a look at some of the valid emails.

  • sample@valid.com
  • sample.abc@valid.com
  • sample_abc@valid.com

Some of the invalid email addresses look like the following.

  • sample-@invalid.com
  • sample..abc@invalid.com (only one period is allowed)
  • .sample@invalid.com
  • sample#sample@invalid.com

If a company fails to validate the email format from the user, the firm loses a means to contact the person. Also, when a wrong email flows into the company system, it can cause a problem in the system. Thus, email validation is a basic but essential task.

Extracting email from unstructured text

Regex in ruby is not only to validate email format. When data is structured or semi-structured, it is easy to retrieve email data. However, when you have to deal with unstructured data such as plain text and you want to collect email data, using Ruby regex can be helpful. Regex is often used to extract data out of the text. As you capture email addresses from the free text, you can convert the email data into clean structured data.

Replacing email from unstructured text

In addition, a company may want to replace the email address (which is considered PI, or personal information) with some random string to de-personalize the text data. To do so, you will first have to match all emails and then replace them with the string you provide. Sometimes, a company wants to perform this task following its security policy.

Checking prefix

An email address consists of the prefix and the domain. The prefix appears to the left of the symbol, @, and the domain appears to the right. So, an email contains two types of information. Your company might want to use the prefix to check if the person tries to create duplicate accounts, for example, to take advantage of free services. Or, you may just want to use the prefix as a username for the person. All these become easy when you first can extract email addresses from plain text, for example, and then split each email by ‘@’.

Validating domain

When you have email addresses in a large amount of plain text and if you want to validate the email domain, as the first step, you will want to extract emails from them. The email data in the text might not have a valid domain as some people just put a random email to bypass validation. You can use regex to match emails and then split it by ‘@’ to get the domain. You can perform follow-up research, for example, to check if the server exists.

Using email regex in Ruby

Let’s learn how we can use regex in Ruby for the possible scenarios with sample codes.

Validating email format

You can use the email regular express in the sample code to validate an email address. Using the if-else statement, you can differentiate between valid and invalid email addresses.


# Email pattern
EMAIL_REGEX = '/^\w+([\.-]?\w+)*@\w+([\.-]?\w+)*(\.\w{2,3})+$/'

# This is a valid email
if "sample@valid.com".match(EMAIL_REGEX)
  puts "This is a valid email!"
else
  puts "This is not a valid email!"
end

The sample email above has a valid email format, so it will return “This is a valid email!”. Alternatively, you could check the index and if the index returned equals 0, then you can consider it as a valid email.


EMAIL_REGEX = /^\w+([\.-]?\w+)*@\w+([\.-]?\w+)*(\.\w{2,3})+$/

# They all return 0
puts "sample@valid.com" =~ EMAIL_REGEX
puts "sample-sample@valid.com" =~ EMAIL_REGEX
puts "sample.abc@valid.com" =~ EMAIL_REGEX
puts "sample.123@valid.com" =~ EMAIL_REGEX
puts "s1ample123@valid.com" =~ EMAIL_REGEX

How about invalid emails? Let’s test it.


EMAIL_REGEX = /^\w+([\.-]?\w+)*@\w+([\.-]?\w+)*(\.\w{2,3})+$/

# They all return nil
puts "sample-@invalid.com" =~ EMAIL_REGEX
puts "sample..abc@invalid.com" =~ EMAIL_REGEX
puts ".sample@invalid.com" =~ EMAIL_REGEX
puts "sample#sample@invalid.com" =~ EMAIL_REGEX

Since they are all invalid emails, all of them will return nil.

Extracting email in unstructured text

Validating and extracting are different tasks. In extracting, you may get one or more matched email addresses. 


# Regex to extract emails from large text
EMAIL_REGEX = /\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b/i

# Sample text with emails
text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim sample@email.com veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate sample2@email.com velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint sample3@email.com occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."

# It returns a list of extracted emails
# sample@email.com
# sample2@email.com
# sample3@email.com
puts text.scan(EMAIL_REGEX)

There are three emails in the text and as a result of text.scan(EMAIL_REGEX), a list containing the extracted emails will be returned. You can access an index by the following.


# It gives you the first matched email
puts text.scan(EMAIL_REGEX)[0]

This gives you sample@email.com.

Replacing email in unstructured text

Imagine that you have to deidentify PI information in documents and remove or replace email addresses. To find email addresses, you need to use regular expressions. To replace matched strings using regex, you can use the gsub function.


# Regex to extract emails from large text
EMAIL_REGEX = /\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b/i

# Sample text with emails
text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim sample@email.com veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate sample2@email.com velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint sample3@email.com occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."

replaced_text = text.gsub(EMAIL_REGEX, 'DEIDENTIFIED@EMAIL.COM')
puts replaced_text

The matched email addresses will be replaced with DEIDENTIFIED@EMAIL.COM. The outcome from the above code is the full text with the email addresses changed to the deidentified email address.

Checking prefix

Let’s say that you have extracted email addresses from documents and want to use the prefix as the username. 


# Say there are the following emails in the list
# sample@email.com
# sample2@email.com
# sample3@email.com
email_list = text.scan(EMAIL_REGEX)

# This will print
# sample
# sample2
# sample3
for email in email_list do
  puts email.split('@')[0]
end

Using the same text, we will assume that the result of the scan returns the three emails. The for loop will return individual emails and using the split by ‘@’ function, it will return an array. The first index will be the prefix and the second will be the domain.

Validating domain

Similar to the checking prefix, we can extract the domain from each email address.


# Say there are the following emails in the list
# sample@email.com
# sample2@email.com
# sample3@email.com
email_list = text.scan(EMAIL_REGEX)

# This will print
# email.com
# email.com
# email.com
for email in email_list do
  puts email.split('@')[1]
end

In each array, the second index holds the domain.

Useful regex resources

Using regex in Ruby requires a combination of regex knowledge and Ruby coding skills. There are great online resources where you can learn and test your regular expressions. Also, you can download text editing tools that support regex. Some suggested tools include:

  • Regex101: This website provides a user interface where you can put text and regex. As you type your regex, it will match the text in real-time and run regex validation to check your regex syntax. It explains the syntax used and the steps on how it detects the patterns. You can try regex in different programming languages too.
  • Sublime Text: This text editor is a flexible and versatile IDE for many languages. You can use it not only for coding but also for testing regular expressions with large text. You can replace, extract and find all that is matched by your regex.
  • Notepad++: Notepad has been around us for many years yet it is one of the most popular text editing tools. It is compact and free. You can use it as IDE and run your regex against the text.
  • HackerRank: Try to solve the regex quiz and get some credits for your profile. This provides simple regex problems and challenging ones. Using the website, you can develop patterns and apply them to solve quizzes. The problems are categorized by the difficulty level and success rate of other users.

Wrapping up

We learned various email validation techniques in Ruby and the reasons why we want to validate email addresses. Using the regex, we validate, extract, and replace email addresses. All these can be performed by Ruby's standard library. Learning regex to capture emails can be an extremely handy skill in the Ruby program. Also being able to develop a regex pattern is one of the sought-after programming needs.

To be familiar with regex in Ruby, try the regex resources and hone your skills!

FAQs

To boost your regex knowledge, we prepared three frequently asked questions regarding regex.

How can we express date format in regex?

Along with the email validation regex, the regex to validate or capture a date format is one of the most frequently used patterns. If the date format that you want to detect is as below:


1990/12/16 or 1990.12.16

You can use the following regex to capture that pattern:


\b(?:19\d{2}|20[01][0-9]|2020)[-/.](?:0[1-9]|1[012])[-/.](?:0[1-9]|[12][0-9]|3[01])\b

This will not only validate the date format but also the values themselves. This will capture the birthdate between the years 1900 and 2020.

What is the regex pattern that can select the one between two strings?

Let's say you want to capture what is between '<' and '>'.


Your phone number is <123-456-789>.

You can use the regex below to select "123-456-789".


(?<=<)(.*)(?=>)

If you have different anchor strings, you can change '<' and '>' to whatever strings you have.

What other languages support regex?

Of course, Ruby is not the only language that supports regex. The regular expression is a universal language that can be used in any modern programming language such as Python, Java, C#, Go, Java, Kotlin, PHP, Swift, and many others. In addition, you can use regex in SQL queries in relational databases and data lakes.

Validate email instantly using Abstract's email verification API.

Get started for free
Validate email instantly using Abstract's email verification API.
Get started