Best regex for email address pattern validation

Last Updated Feb 28, 2022
Emma Jagger

Engineer, maker, Google alumna, CMU grad

If you’re here, chances are you’re aware of

  • the need of validating email addresses and finding invalid addresses
  • how regular expressions play a pivotal part in the said validations

but are unfortunately stuck in the rabbit hole of finding the best regex to do the job.

Hence, we thought of shedding some light on possibly the best regex you could validate email addresses with, what an email address is in the first place, what sort of pattern a regex validating it should have to match email addresses, the official standard regex provided by the guys who specify the IMF, and last but definitely not least, the harms of having faith on a regular expression match for email validation.

So, before you add yet another fake e mail address to your mailing list, let’s get down to business and identify invalid email addresses!

Don't reinvent the wheel.
Abstract's APIs are production-ready now.

Abstract's suite of API's are built to save you time. You don't need to be an expert in email validation, IP geolocation, etc. Just focus on writing code that's actually valuable for your app or business, and we'll handle the rest.

Get started for free

The best all-around regex to find valid email addresses

Before we get ourselves to the bottom of the said rabbit hole, let’s jump straight to the best regex for email validation just to save your time:


([-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@([0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?(\.[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?)*|\[((25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3}|IPv6:((((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){6}|::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){5}|[0-9A-Fa-f]{0,4}::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){4}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):)?(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){3}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,2}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){2}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,3}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,4}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,5}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,6}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)|(?!IPv6:)[0-9A-Za-z-]*[0-9A-Za-z]:[!-Z^-~]+)])

Source: https://stackoverflow.com/questions/201323/how-can-i-validate-an-email-address-using-a-regular-expression?page=2&tab=active#answer-14075810

You may simply copy-paste the above regular expression but please note that it’s more than recommended to keep on reading. This is since as you’ll find in the upcoming sections, there’s more to validating user input email addresses through regular expressions than simply copying a code.

Image courtesy of pixabay

This somewhat complex regex validating email addresses

  • allows Latin characters ("a" - "z" or "A" - "Z") within the email address.
  • permits digits (0 - 9) in the email address.
  • enforces domain part restrictions.
  • allows hyphens (-) inside the domain as long as they don't lead or trail the domain.
  • allows IP address literals surrounded with square brackets ([]) for the domain names.
  • restricts sub-domains to a maximum length of 63 characters.
  • applies local part restrictions.
  • permits the set of special characters allowed by RFC 5322 ("!", "#", "$", "%", "&", "'", "*", "+", "-", "/", "=", "?", "^", "_", "`", "{", "|", "}", "~”) to reside in the local part.
  • lets local part comprise as double quotes housing one or more sequences of ASCII characters.
  • doesn't allow trailing, leading, or consecutive periods anywhere within the email address.

While the above regular expression covers most of the email address-related rules and regulations, it does have some shortcomings:

  • doesn't allow Unicode characters.
  • doesn't check for the entire length of the email address to be less than or equal to 253 characters.
  • ignores obsolete syntax-related rules.

Want to get a bit more practical? Refer to our email regex guide for a full list of code examples by language.

General user input email patterns and regular expressions

Before your company’s corporate mail gets swarmed with SQL injection attacks or your personal emails get sent to the wrong recipients, let’s get a brief idea of email addresses and regular expressions just so you’ll know exactly what to look out for.

A general email address looks like this

According to the currently used Internet Message Format (IMF) named RFC 5322, a general email pattern takes this form: local-part@domain

The elements that make up this email pattern are:

  • Local-part – a locally interpreted string constrained by a collection of rules the currently active IMF enforces. RFC 5322 lets the local-part conform to a dot-atom or a quoted string.
  • “@” sign – symbol separating the local-part from the domain. An ASCII character of value 64 is used to represent this element.
  • Domain – a string holding the name of the web service to which the email should be delivered. RFC 5322 asks the domain of valid email addresses to consist either of a dot-atom or a domain-literal within square brackets ([]).

What about regular expressions?

A regular expression — or its more commonly used term, a regex — is simply a search pattern defining what a particular string that wishes to match with it should and/or shouldn’t have. 

Use cases of regular expressions

Regular expressions are widely used for string searching and string replacing tasks such as

  • Validate email addresses.
  • Web scraping.
  • Credit card number format validation.
  • Validating password input string against complexity requirements.
  • Removing unwanted characters from strings, e.g., punctuation, extra space.

For a detailed look into how different programming languages handle regular expressions, have a look at our email validation regex guide.

The basic format of a regular expression

Regular expressions comprise textual patterns holding

Metacharacters

A collection of ‘characters and sequences of characters’ reserved by regular expressions to represent specific patterns. 

For instance, a caret symbol (^) and a dollar symbol ($) would mean the start and end of a string consecutively. Similarly, a period (.) inside a regular expression would mean “any character”.

Hence, a regular expression such as ^.$ would act as a case insensitive matching option matching with any single character like “D”, “g”, “5”.

A simple googling would bring you cheatsheets holding these regular expression metacharacters you could easily refer to.

Regular characters

Usual characters that’d be matched for their literal value. 

Adding to the above regex, regular expressions like ^.ed$ will match with strings such as “bed”, “fed”, or “Ted”.

RFC 5322 official standard regular expression to validate email addresses


([-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@([-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)*|\[[\t -Z^-~]*])

The above regular expression conforms to the RFC 5322 standard and matches with basic email addresses.

Let’s walk through each section of this regular expression, shall we?

  • local-part matches with one or the other of two subsections:
  • [-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)* – match with a dot-atom-text.
  • "([]!#-[^-~ \t]|(\\[\t -~]))+" – match with a quoted-string within double-quotes. This regex subsection excludes whitespace-related rules RFC enforces for a quoted-string since they’re irrelevant when validating emails.

Similarly, the domain matches with either one of two subsections:

  • [-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)* – match with a dot-atom-text.
  • \[[\t -Z^-~]*] – match a domain-literal; note that this regex subsection ignores whitespace-related rules RFC defines for domain-literals which are irrelevant when you need to match email addresses.

As for further limitations, notice that 

  • its source informs the regular expression isn’t “optimized for performance”.
  • this regular expression overlooks rules related to RFC’s obsolete syntax.

Supplemental additions

A few more changes to the previous regular expression could improve its accuracy:


([-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@([0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?(\.[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?)*|\[((25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3}|IPv6:((((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){6}|::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){5}|[0-9A-Fa-f]{0,4}::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){4}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):)?(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){3}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,2}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){2}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,3}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,4}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,5}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,6}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)|(?!IPv6:)[0-9A-Za-z-]*[0-9A-Za-z]:[!-Z^-~]+)])

How about we explore the new additions and modifications added to the above regular expression?

  • (25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3} – match with IPv4-address-literals.
  • IPv6:((((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){6}|::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){5}|[0-9A-Fa-f]{0,4}::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){4}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):)?(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){3}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,2}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){2}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,3}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,4}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,5}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,6}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::) – match with IPv6-address-literals.
  • [0-9A-Za-z-]*[0-9A-Za-z]:[!-Z^-~]+ – match with general-address-literals.
  • [0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])? – enforce subdomains to have a maximum length of 63 characters.

Why regex might not be your best friend for validating email addresses

Time for the plot twist! Up until this point, we lay the groundwork of validating email addresses through regular expressions and explored the best regular expressions to do so. But, what if I told you that using regular expressions to validate email addresses is actually more hazardous than you might’ve imagined?

A hit on performance

With the RFC 5322 standard’s complexities, a regular expression honoring all its rules & regulations may end up as one large expression requiring high CPU loads to process. Hence, these complex regex instances could end up lagging your company’s servers, and what’s worse? A hacker could exploit this and launch a ReDoS attack completely halting your web service.

An inconvenience to maintain

Assume you’re currently using the most optimal and updated regular expression to validate email addresses. But, since a regular expression isn’t some program you can add/remove some modules to/from, the moment IMF changes to a new standard you’d be back at square one, and you’d have to search for the new “best regex” once again.

What’s better?

Due to the aforementioned drawbacks among others, it could be more appropriate to resort to an API to find valid email addresses. Just to help you out, here are the best email validation and verification APIs that currently exist on the market.

Conclusion

Because of its precise search pattern matching and compact nature, a regular expression can be your best bet to validate user input email addresses in most everyday scenarios. Having said that, in more cases than not, using an API to validate email addresses can be called a good alternative for the same purpose.

If you want a deeper dive into how specific programming languages and frameworks find valid email addresses, check out how the Python, Ruby, PHP programming languages and the jQuery framework face these e mail address verifications.

Indeed, we discussed all the essentials about regular expressions and how they may help your email address verification endeavors. But, let us end this article with a quite important guideline:

Always test your chosen regular expression on the website, app, server, etc., the location you’d be using it in instead of simply copy-pasting it so as to save yourself from matching invalid addresses and the heaps of terrible dilemma that’d follow.

Validate emails instantly using Abstract's email verification API.

Get started for free
Validate emails instantly using Abstract's email verification API.
Get started