Skip to main content

Overview

Scala URL Detector is a robust Scala library that detects and extracts URLs from unstructured text with support for multiple content formats. It is based on the fork of LinkedIn Engineering team's open-source library in the following repository.

Features

  • Multiple Detection Modes: Support for HTML, XML, JSON, JavaScript, and plain text
  • Smart URL Parsing: Handles URLs with or without schemes, protocol-relative URLs, and encoded characters
  • Host Filtering: Allow or deny specific hosts with intelligent subdomain matching
  • Format-Aware Extraction: Context-aware detection for different content types
  • IPv4 & IPv6 Support: Recognizes both IPv4 and IPv6 addresses
  • Type-Safe API: Uses scala-uri for strongly-typed URL representations
  • Thread-Safe: Immutable data structures safe for concurrent use
  • Cross-Platform: Published for Scala 2.12, 2.13, and 3.x

Installation

To use the latest release of Scala URL Detector in your project add the following to your build.sbt file:

libraryDependencies += "io.lambdaworks" %% "scurl-detector" % "1.3.0"

Quick Start

import io.lambdaworks.detection.UrlDetector
import io.lemonlabs.uri.AbsoluteUrl

// Basic usage
val detector = UrlDetector.default
val urls: Set[AbsoluteUrl] = detector.extract("Visit https://example.com")

// With specific options
import io.lambdaworks.detection.UrlDetectorOptions
val htmlDetector = UrlDetector(UrlDetectorOptions.Html)
val htmlUrls = htmlDetector.extract("<a href='https://example.com'>Link</a>")

// With host filtering
import io.lemonlabs.uri.Host
val filtered = UrlDetector.default
.withAllowed(Host.parse("example.com"))
.extract("Visit example.com and other.com")

Documentation

Getting Started

Advanced Topics

Help & Support

Next Steps