What happens when you type google.com in your browser and press Enter
People all around the world use the internet and visit different websites every day. But have you ever stopped to think about what actually happens when you type in the name of a website you want to visit and press that “enter” button? It might seem like magic, but in reality, it’s a fascinating series of steps that work together to make a web page appear on your screen in just a matter of seconds.
Today, we’re going to take a closer look at this process by explaining what occurs when you type “google.com” into your browser’s address bar and then hit that “Enter” key. It’s like uncovering the behind-the-scenes magic of the internet!
Computers and devices connected to a network can exchange information using their respective IP addresses. An IP address can be thought of as a digital equivalent of a physical address in the real world. Within a network, no two devices can share the same IP address; each must have a unique one.
For humans, it’s nearly impossible to remember the IP addresses of all the websites and applications we use. Instead, we rely on domain names to find and access them. For example, rather than having to remember “172.217.28.100,” we can simply type “google.com,” saving us the trouble of recalling IP addresses. However, this raises a critical question: How can we access a desired website by merely knowing its domain name when, as mentioned earlier, devices require IP addresses to communicate?
Before we delve into how to obtain the IP address, let’s closely examine what happens when you input “google.com” into your browser’s address bar:
When you type a web address like “google.com” into your browser, the browser first checks whether it’s a real website or just a search query. In our case, “google.com” is indeed a legitimate website.
Next, the browser determines the method for establishing a secure connection with the website. It opts for HTTPS, a secure communication method, and designates a specific secure entry point known as “port 443” for this purpose
But, there’s a missing piece: the IP address This is where the concept of DNS (Domain Name System) comes into play.
In practice, when you visit a website, the IP address associated with that domain is typically stored in your browser’s cache memory. This caching serves a crucial purpose: it aims to eliminate the need for a DNS (Domain Name System) request, ensuring a fast loading of your desired webpage. However, what happens if the IP address isn’t found in the browser’s cache memory? In such cases, the browser initiates a request to the operating system, inquiring about the IP address for the entered domain.
Upon receiving the request, the Operating System initiates a search for the requested IP address. If the Operating System successfully locates the desired IP address, it promptly sends this IP address back as a response to the browser, which subsequently stores it in its cache memory. However, if the Operating System is unable to find the requested IP address, it takes the step of sending a request to the DNS Resolver, typically managed by your Internet Service Provider (ISP), to inquire about the IP address associated with the entered domain.
Once the DNS Resolver receives the request, it initiates a search for the requested IP address from the Operating System. If the DNS Resolver successfully locates the desired IP address, it promptly sends this IP address as a response to the Operating System, which then stores it and forwards it as a response to the browser, where it is also cached. However, in cases where the DNS Resolver cannot find the desired IP address, what happens then?
Now, in this phase, the DNS Resolver initiates an examination of the top-level domain (TLD) for the domain requested by the client. In our illustrative example of “google.com,” the top-level domain is “com.” Subsequently, the DNS Resolver endeavors to obtain the IP address of the “.com” TLD server. However, if the DNS Resolver fails to locate the “.com” TLD server, it proceeds to search among the 13 root servers, from where it can retrieve the IP address of the “.com” TLD server
Root servers comprise a global network of hundreds of servers dispersed across numerous countries worldwide. These servers are arranged within the DNS root zone, consisting of 13 distinct named authorities, each designated from “a” to “m,” as represented by [letter].root-servers.net.
After arriving at any of the 13 root servers, the DNS Resolver commences its inquiry for the IP address of the “.com” Top-Level Domain (TLD) server, which houses the needed IP address for “google.com.” Subsequently, the root server dispatches the IP address of the “.com” TLD server in response to the DNS Resolver’s request. The DNS Resolver then preserves this IP address in its memory for future use
With the IP address of the “.com” Top-Level Domain (TLD) server in hand, the DNS Resolver proceeds to send a request for the required IP address directly to the “.com” TLD server. When the requested domain truly exists, the “.com” TLD server responds by providing the DNS Resolver with the authoritative name servers associated with that specific domain name. In our case authoritative name servers for google are ns[1–4]. google.com
Once the DNS Resolver has obtained the authoritative names for the domain name, it proceeds to send a final request to one of the authoritative name servers. Subsequently, the authoritative name server responds by providing the DNS Resolver with the IP address corresponding to the desired domain name, in our case, “google.com.” The DNS Resolver retains this IP address and then relays it to the Operating System, which likewise stores it, and finally, the Operating System forwards it to the browser, where it, too, is stored in the cache memory.
In small websites and applications, it’s common for the hosting to be handled by just a single server, managing the entirety of the traffic. However, in large-scale applications, the demand for seamless content delivery under high traffic conditions necessitates the use of multiple servers. This prevents users from experiencing slow website performance or having to wait due to the sheer volume of traffic. For instance, in the case of Google, they employ an astonishing 2.5 million servers! This extensive server infrastructure is essential because Google experiences an extraordinarily high volume of traffic worldwide, as millions of people use their services daily, and they maintain top-notch speed and reliability to meet these demands effectively.
When discussing small websites and applications, the IP address provided by the DNS Resolver will typically direct you straight to the server hosting the webpage. However, in the context of large-scale websites and applications, this IP address will instead lead you to a load balancer.
The primary objective of the load balancer is to distribute incoming traffic across the servers owned by the hosting entity. This ensures that the traffic isn’t concentrated on a single server, preventing the remaining servers from remaining idle.
You might be wondering how the load balancer determines which server should handle the traffic to maintain a balanced distribution among multiple servers. The answer lies in its utilization of a range of algorithms to make this decision. Some of these algorithms are static, while others operate dynamically.
Dynamic load balancing algorithms
• Least connection
• Weighted least connection
• Weighted response time
• Resource-based
Static load balancing algorithms
• Round robin
• Weighted round robin
• IP hash
Deploying multiple servers has effectively mitigated the risk of a single point of failure and also alleviated the traffic load, as only a portion of the traffic is directed to each server. However, when it comes to the load balancer, a similar issue arises. Now, the load balancer itself becomes a single point of failure, as all traffic passes through it.
While managing traffic is typically not a major concern for load balancers since their role primarily involves directing requests to the right server, the real challenge lies in addressing the single point of failure issue. So, how can we go about resolving this?
Just as we’ve implemented multiple servers to address our needs, we can apply the same strategy to load balancers. Having multiple load balancers allows us to distribute traffic effectively and resolve the single point of failure problem.
We can classify our multiple load balancers using two approaches:
• Active/Passive Load Balancers: In this setup, one load balancer manages all the traffic for a particular IP address. If that load balancer becomes inactive, a passive node steps in to take over the IP address.
• Active/Active Load Balancers: With this configuration, the same traffic IP is configured on multiple load balancers. Incoming traffic is distributed to all load balancers, but an algorithm is employed to determine which load balancer should respond.
One of the distinguishing factors that sets your website or application apart is its reliability and security. Traditional HTTP requests transmit data without encryption, leaving it susceptible to interception by malicious parties, such as hackers or potential eavesdroppers. This vulnerability can expose sensitive information like passwords and credit card details. To enhance the security and reliability of your data transmission, it’s imperative to implement an additional layer known as TLS (Transport Layer Security).
TLS, often considered the modern and more secure successor to SSL (Secure Sockets Layer), serves the crucial role of encrypting data during transmission. This encryption ensures that even if an unauthorized entity intercepts the data, they won’t be able to decipher or misuse it. In standard HTTP requests without TLS, data travels over port 80 using the TCP (Transmission Control Protocol) transport. However, when you add the security layer, it transforms into HTTPS. This secure variant utilizes port 443 and maintains a TCP connection to provide a safe and protected channel for data exchange.
To enhance the security of your website, it’s essential to install an SSL certificate on your load balancer. By implementing an SSL certificate, you introduce an additional layer of security to your website, ensuring the safety of your traffic. Moreover, an SSL certificate enables you to automatically redirect HTTP requests to HTTPS, thereby guaranteeing a secure and encrypted traffic flow.
Once the load balancer is set up, it’s imperative for it to examine incoming traffic for potential threats. This scrutiny helps identify whether the incoming traffic is originating from a hacker or malicious software. Additionally, when the load balancer sends data back in response, it must similarly assess the outbound traffic to ensure it’s secure and devoid of any risks.
Now, let’s introduce the role of a firewall. A firewall plays a critical role in verifying the security of both incoming and outgoing traffic to ensure its safety. It’s crucial to configure a firewall not only on servers but also on load balancers to comprehensively secure the entire network.
Now, the HTTP request sent to the load balancer needs to be forwarded securely using an SSL certificate to a server, employing one of its algorithms. But what type of servers will accept this request? The answer lies in web servers.
Web servers are a specialized server type engineered to receive web requests, process them, and then deliver the requested HTML and its assets in response. These web servers comprise both the HTTP server component and the website’s files.
In such scenarios, relying solely on a web server is insufficient for effective web page delivery. The primary function of a web server is limited to serving specific files. However, when we require functionalities such as executing specific scripts, downloading programs, or updating website content, we necessitate the presence of another server type known as an application server.
To enable the dynamic updating of website content through application servers, the integration of a database is a fundamental requirement. While application servers handle various tasks, including executing scripts and managing user interactions, they rely on a database to store, retrieve, and update the content that makes the website dynamic.
Here’s how it works: When a user interacts with a website, the application server processes their requests, generates dynamic content, and communicates with the database when necessary. The database stores data such as user profiles, product information, blog posts, or any other content that requires constant updates
By maintaining this separation of responsibilities, websites can efficiently manage data-driven operations and deliver a seamless user experience. The database acts as the data repository, ensuring that the content displayed on the website remains up-to-date and responsive to user interactions.
To ensure that servers operate optimally and reliably, server monitoring is implemented. Its primary purpose is to assess server health and performance. Various metrics, such as CPU usage, memory, and network traffic, are continually checked to identify any issues or anomalies in real-time. This helps maintain server efficiency and contributes to the uninterrupted functioning of digital services and websites.
Now that we have gained a comprehensive understanding of how website content is presented, from acquiring the IP address of the target website to the process of retrieving and regenerating pages from the servers, it’s important to note that all these steps are integral components of the TCP/IP protocol. This protocol serves as the foundation for interconnecting various network devices. It comprises four distinct layers:
1. Physical (Link) Layer: This layer encompasses protocols that exclusively operate at the link level.
2. Network Layer: Responsible for handling packets and establishing connections between autonomous networks, ensuring the seamless transfer of packets
across network boundaries.
3. Transport Layer: This layer plays a pivotal role in maintaining end-to-end communication throughout the network. TCP (Transmission Control Protocol)
governs communication between hosts and provides flow control. UDP (User Datagram Protocol) is another transport protocol within this layer, offering a
connectionless, lightweight alternative.
4. Application Layer: The highest layer, the Application Layer, facilitates standardized data exchange for applications. It manages the actual data transmission.
Notable protocols within this layer include HTTP, HTTPS, and DNS, each tailored to specific application needs.
These four layers collectively form the TCP/IP protocol suite, which underpins the functioning of the internet and enables seamless communication between networked devices.
Summary:
The web page retrieval process involves several steps. Initially, the DNS Resolver is employed to obtain the IP address of the desired destination. Subsequently, a load balancer directs you to the suitable web server, where the application server reconstructs the web page. Finally, the completed web page is transmitted back through the network until it arrives at the client’s web browser.
Credits
• Nginx
• AWS
• Cloudflare
• Wikipedia
• Kaspersky
• Webopedia
• MDN
• G2
• Tech Target
• Solarwinds
• Check Point
• How DNS works
• Digital Ocean