Regex intermediate url extract text-processing

How to Extract URLs from Text Using Regex?

Extract URLs from Text is a Regex pattern that this regex pattern captures all urls starting with http or https.. Formula Genius generates and validates this formula automatically from a plain-English prompt.

Extracting URLs from text can be challenging, but with the right regex pattern, you can easily find and list them all.

The Formula

Prompt

"Extract all URLs from a block of text, including http and https links"

Regex
(https?://[\w.-]+(?:\.[\w.-]+)+[/\w.-]*)

This regex pattern captures all URLs starting with http or https.

Step-by-Step Breakdown

  1. The pattern starts with 'https?://' to match both 'http://' and 'https://'.
  2. Next, '[\w.-]+' matches the domain name, allowing letters, numbers, dots, and hyphens.
  3. The '(?:\.[\w.-]+)+' part captures the top-level domain (like .com, .org).
  4. Finally, '[/\w.-]*' matches any additional path or parameters in the URL.

Edge Cases & Warnings

  • URLs without a scheme (http/https) won't be captured.
  • URLs with unusual characters may not be matched correctly.
  • Long URLs that span multiple lines may be partially matched.
  • URLs embedded in HTML tags may require additional handling.

Examples

Prompt

"Check out https://www.example.com and http://test.com for more info."

Regex
https://www.example.com, http://test.com
Prompt

"Visit our site at http://my-site.org or https://secure-site.net."

Regex
http://my-site.org, https://secure-site.net

Frequently Asked Questions

Can this regex extract URLs from HTML?

No, this regex is designed for plain text and may not handle HTML tags.

What if the URL contains query parameters?

The regex will capture URLs with query parameters as long as they follow the specified format.

Is this regex case-sensitive?

By default, regex is case-sensitive; however, you can add a case-insensitive flag to modify this behavior.

Can't find what you need?

Describe any formula in plain English and Formula Genius will generate, explain, and validate it — instantly.