Selenium 4 Unveiled: A Detailed Breakdown of How It Differs from Selenium 3

Selenium, a highly popular open-source automation tool, has become the cornerstone of web browser automation. Whether you’re a seasoned tester or a developer, Selenium provides a robust suite of tools to automate interactions with web applications across different browsers. Its ability to support multiple programming languages like Java, Python, and C#, along with cross-browser testing, makes it a versatile choice for developers and testers alike.

The Origins of Selenium

The name “Selenium” has an interesting backstory. It stems from a humorous email exchange by Jason Huggins, one of Selenium’s creators, who jokingly remarked that “you can cure mercury poisoning with selenium supplements.” The name stuck, and what began as a clever retort became the moniker for one of the most widely used testing frameworks.

Why Selenium?

Selenium’s open-source nature allows a wide range of tools and libraries under its umbrella to enhance browser automation. It caters to different testing needs:

Cross-Browser Compatibility: Selenium is compatible with all major web browsers, including Firefox, Chrome, Safari, and Edge.
Multiple Language Support: Selenium offers flexibility by allowing testers to write scripts in their preferred programming languages, such as Java, Python, C#, and Ruby.
Platform Independence: Selenium is platform-agnostic, running on Windows, macOS, and Linux environments, making it ideal for teams working across different operating systems.

The Key Components of Selenium

Selenium comprises several components, each contributing to different aspects of browser automation:

1. Selenium IDE

Selenium Integrated Development Environment (IDE) simplifies the automation process for non-programmers by providing a record-and-playback feature. It’s available as a browser extension for Firefox and Chrome, enabling testers to record their actions on a web page, edit them, and export them as scripts.

The IDE allows testers to write scripts in Selenese, a special domain-specific language (DSL) designed for Selenium, offering commands to interact with web elements. This simplifies the automation process, especially for those unfamiliar with traditional programming.

2. Selenium WebDriver

WebDriver, the heart of Selenium, facilitates browser automation by providing a more reliable approach than its predecessor, Selenium RC. WebDriver controls the browser directly using its native automation capabilities, bypassing JavaScript-based control. This makes WebDriver faster, more stable, and less prone to browser-specific quirks.

Here’s an example of a basic Selenium WebDriver script written in Java to search for “cheese” on Google:

import org.openqa.selenium.By;
import org.openqa.selenium.Keys;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.firefox.FirefoxDriver;
import org.openqa.selenium.support.ui.WebDriverWait;
import static org.openqa.selenium.support.ui.ExpectedConditions.presenceOfElementLocated;
import java.time.Duration;

public class HelloSelenium {
    public static void main(String[] args) {
        WebDriver driver = new FirefoxDriver();
        WebDriverWait wait = new WebDriverWait(driver, Duration.ofSeconds(10));
        try {
            driver.get("https://google.com/ncr");
            driver.findElement(By.name("q")).sendKeys("cheese" + Keys.ENTER);
            WebElement firstResult = wait.until(presenceOfElementLocated(By.cssSelector("h3")));
            System.out.println(firstResult.getAttribute("textContent"));
        } finally {
            driver.quit();
        }
    }
}

3. Selenium Grid

Selenium Grid allows parallel execution of tests across different browsers, operating systems, and machines. This feature is invaluable when running large test suites on multiple configurations simultaneously, significantly reducing test execution time.

Transition from Selenium RC to WebDriver

Selenium Remote Control (RC) was the initial solution for browser automation, but it relied on a server-based approach to inject JavaScript commands into browsers. As web technologies advanced, Selenium RC’s limitations became apparent, leading to the development of WebDriver. With its direct browser communication, WebDriver emerged as a faster and more flexible solution.

The Evolution Towards Standardization

The rise of WebDriver also sparked efforts toward standardizing browser automation. In 2012, the World Wide Web Consortium (W3C) began drafting a recommendation to make WebDriver the de facto standard for browser automation, culminating in its official release as a recommendation in 2018. Today, WebDriver forms the backbone of modern web testing.

Core Components of Selenium Architecture:

Selenium Client Libraries (Language Bindings): Selenium supports multiple programming languages, including Java, Python, C#, JavaScript, and Ruby. These client libraries allow developers to write automation scripts in their preferred language, abstracting the complexities of communicating directly with browsers.
JSON Wire Protocol: Selenium uses the JSON Wire Protocol (in versions before Selenium 4) to transfer data between the client (test script) and the server (browser drivers). This protocol encodes commands (like clicking a button or entering text in a field) in JSON format and sends them over HTTP to the WebDriver, which then interacts with the browser.
Browser Drivers: Each browser has its corresponding browser driver, responsible for accepting the commands from the Selenium WebDriver and translating them into actions on the browser. For instance:

GeckoDriver for Firefox
ChromeDriver for Google Chrome
EdgeDriver for Microsoft Edge

Browser drivers are crucial because they enable Selenium to work seamlessly across different browsers without needing to modify test scripts.

Selenium WebDriver: WebDriver is the interface between the test script and the browser driver. It sends commands (via JSON Wire Protocol) from the client code to the browser and retrieves results. WebDriver interacts directly with the browser, ensuring more accurate test results and better browser control than its predecessor, Selenium RC.

2. Selenium Grid: Selenium Grid is used to distribute test execution across different environments (browsers, operating systems, machines). It allows parallel testing, which speeds up the test suite execution by running multiple tests simultaneously. Selenium Grid consists of a central Hub (managing test distribution) and multiple Nodes (machines where tests are executed).

Working of Selenium:

The working of Selenium can be broken down into the following steps:

Test Script Execution: A test script, written using one of the supported languages (e.g., Java), sends commands to the Selenium Client Library.
JSON Wire Protocol: The client library communicates with the browser driver using the JSON Wire Protocol, which encodes the test commands in JSON format and sends them over HTTP.
Browser Driver Interaction: The browser driver receives the commands and translates them into the browser’s native actions (e.g., clicking a button, typing in a field). Each browser has a specific driver that handles these interactions.
Browser Actions: The browser executes the commands sent by the WebDriver, and the results of these actions (e.g., element found, page loaded) are returned to the WebDriver.
Response to the Test Script: WebDriver captures the browser’s response and sends it back to the test script, either confirming that the command was executed successfully or returning an error if the command failed.

Selenium 4 Architecture: Key Improvements

Selenium 4 introduces some significant architectural improvements over Selenium 3, particularly in the way it communicates with browsers.

W3C WebDriver Protocol: The most notable change in Selenium 4 is its complete reliance on the W3C WebDriver Protocol. While Selenium 3 used the JSON Wire Protocol to communicate between the WebDriver and browser drivers, Selenium 4 has moved entirely to the W3C standard protocol. This change brings several advantages:

Fewer Compatibility Issues: The W3C protocol is a standard, ensuring that browser drivers and WebDriver behave consistently across different browsers.
Faster Execution: By eliminating the need to convert commands from JSON Wire Protocol, Selenium 4 enhances execution speed and simplifies communication between WebDriver and browsers.
Enhanced Stability: Direct communication with browsers makes the tests less prone to failures due to protocol translation errors, improving reliability.

2. Native Support for Chrome DevTools Protocol: Selenium 4 provides native integration with the Chrome DevTools Protocol (CDP). This allows developers to leverage advanced browser debugging features directly in their automation scripts, such as:

Intercepting network requests and modifying them
Simulating geolocation and network conditions
Accessing performance metrics and logs

This integration gives more control over the browser and enables testers to validate advanced scenarios, like performance testing and network request monitoring.
Improved Selenium Grid: Selenium Grid has been revamped in Selenium 4, featuring:

Distributed Mode: Selenium Grid can now run in a distributed mode, meaning that the Hub and Nodes are decoupled and can be scaled independently, improving scalability and test distribution.
Graphical User Interface (GUI): Selenium 4 Grid introduces a GUI, making it easier to manage test environments, monitor the status of nodes, and view test execution progress.
Support for Docker Containers: Selenium Grid now works seamlessly with Docker, enabling testers to set up and run Grid instances in containerized environments.

3. Relative Locators: Selenium 4 introduces Relative Locators, which allow testers to locate web elements based on their position relative to other elements. For instance, testers can now easily identify an element above, below, to the right of, or to the left of another element. This makes writing locators more intuitive and reduces the reliance on fragile XPath selectors.

Better Documentation and Improved Selenium IDE: The Selenium IDE has also seen significant improvements. It now supports more browsers and offers better integration with CI tools like Jenkins. Selenium 4 also features improved documentation, making it easier for new users to get started and for experienced users to find detailed information on advanced features.

Major Differences Between Selenium 3 and Selenium 4:

Conclusion

Selenium 4 marks a significant evolution in the world of browser automation, with its adherence to the W3C WebDriver protocol, enhanced support for DevTools, and improvements in Selenium Grid. Whether you’re performing simple tests or handling complex browser interactions, Selenium 4 provides the tools and flexibility to automate web applications effectively while delivering faster and more reliable results.

Selenium continues to evolve, with the latest version, Selenium 4, pushing the boundaries of automation with new features like improved support for the WebDriver Protocol and enhanced IDE functionality. Whether you’re automating simple browser tasks or managing large test suites across multiple browsers, Selenium remains the go-to framework for web automation.

For anyone starting with web automation, Selenium’s diverse tools — ranging from the simplicity of Selenium IDE to the power of WebDriver — make it an excellent choice to achieve scalable, reliable, and efficient test automation.

Selenium’s architecture is designed to support cross-browser automation by leveraging browser drivers and the WebDriver protocol to communicate between test scripts and browser instances. The architecture ensures seamless automation across different browsers, operating systems, and programming languages. To understand how Selenium works, it’s crucial to grasp its components and interactions.

Search This Blog

Guide to Being Software Tester