Thursday, 22 February 2018

The SEO’s essential guide to web technology

As an SEO professional, your role will invariably lead you to interactions with people in a wide variety of roles including business owners, marketing managers, content creators, link builders, PR agencies, and developers.

That last one – developers – is a catch-all term that can encompass software engineers, coders, programmers, front- and back-end developers, and IT professionals of various types. These are the folks who write the code and/or generally manage the underlying various web technologies that comprise and power websites.

In your role as an SEO, it may or may not be practicable for you to completely master programming languages such as C++ and Java, or scripting languages such as PHP and JavaScript, or markup languages such as HTML, XML, or the stylesheet language CSS.

And, there are many more programming, scripting, and markup languages out there – it would be a Herculean task to be a master of every kind of language, even if your role is full-time programmer and not SEO.

But, it is essential for you, as an SEO professional, to understand the various languages and technologies and technology stacks out there that comprise the web. When you’re making SEO recommendations, which developers will most likely be executing, you need to understand their mindset, their pain points, what their job is like – and you need to be able to speak their language.

You don’t have to know everything developers know, but you should have a good grasp of what developers do so that you can ask better questions and provide SEO recommendations in a way that resonates with them, and those recommendations are more likely to be executed as a result.

When you speak their language, and understand what their world is like, you’re contributing to a collaborative environment where everyone’s pulling on the same side of the rope for the same positive outcomes.

And of course, aside from building collaborative relationships, being a professional SEO involves a lot of technical detective work and problem detection and prevention, so understanding various aspects of web technology is not optional; it’s mandatory.

Web tech can be complex and intimidating, but hopefully this guide will help make things a little easier for you and fill in some blanks in your understanding.

Let’s jump right in!

The internet vs. the World Wide Web

Most people use these terms interchangeably, but technically the two terms do not mean the same thing, although they are related.

The Internet began as a decentralized network of independent interconnected computers.

The US Department of Defense was involved over time and awarded contracts, including for the development of the ARPANET (Advanced Research Projects Agency Network) project, which was an early packet switching network and first to use TCP/IP (Transmission Control Protocol and Internet Protocol).

The ARPANET project led to “internetworking” where various networks of computers could be joined into a larger “network of networks”.

The development of the World Wide Web is credited to British computer scientist Sir Tim Beners-Lee in the 1980s; he developed linking hypertext documents, which resulted in an information-sharing model built “on top” of the Internet.

Documents (web pages) were specified to be formatted in a markup language called “HTML” (Hypertext Markup Language), and could be linked to each other using “hyperlinks” that users could click to navigate to other web pages.

Further reading:

Web hosting

Web hosting, or hosting for short, are services that allow people and businesses to put a web page or a website on the internet. Hosting companies have banks of computers called “servers” that are not entirely dissimilar in nature to computers you’re already familiar with, but of course there are differences.

There are various types of web hosting companies that offer a range of services in addition to web hosting; such services may include domain name registration, website builders, email addresses, website security services, and more.

In short, a host is where websites are published.

Further reading:

Web servers

A web server is a computer that stores web documents and resources. Web servers receive requests from clients (browsers) for web pages, images, etc. When you visit a web page, your browser requests all the resources/files needed to render that web page in your browser. It goes something like this:

Client (browser) to server: “Hey, I want this web page, please provide all the text, images and other stuff you have for that page.”

Server to client: “Okay, here it is.”

Various factors impact how quickly the web page will display (render) including the speed of the server and the size(s) of the various files being requested.

There are three server types you’ll most often encounter:

  1. Apache is open-source, free software compatible with many operating systems such as Linux. An often-used acronym is “LAMP stack” referring to a bundling of Linux, Apache, MySQL (relational database) and PHP (a server-side scripting language).
  2. IIS stands for “Internet Information Services” and is proprietary software made by Microsoft. An IIS server is often referred to as a “Windows Server” because it runs on Windows NT operating systems.
  3. NGINX – pronounced “Engine X”, is billed as a high-performance server able to also handle load balancing, used as a reverse proxy, and more. Their stated goals and reason for being include outperforming other types of servers.

Further reading:

Server log files

Often shortened to “log files”, these are records of sever activity in response to requests made for web pages and associated resources such as images. Some servers may already be configured to record this activity, others will need to be configured to do so.

Log files are the “reality” of what’s happening with a website and will include information such as the page or file requested, date and time stamp of the request, the user agent making the request, the response type (found, error, redirected, etc.), the referrer, and a few other items such as bytes served and client IP address.

SEOs should get familiar with parsing log files. To go into this topic in more detail, read JafSoft’s explanation of a web server log file sample.

FTP

FTP stands for File Transfer Protocol, and it’s how you upload resource files such as webpages, images, XML Sitemaps, robots.txt files, and PDF files to your web hosting account to make these resource files available and viewable on the Web via browsers. There are free FTP software programs you can use for this purpose.

The interface is a familiar file-folder tree structure where you’ll see your local machine’s files on the left, and the remote server’s files on the right. You can drag and drop local files to the server to upload. Voila, you’ve put files onto the internet! For more detail, Wired has an excellent guide on FTP for beginners.

Domain name

A domain name is a string of (usually) text and is used in a URL (Uniform Resource Locator). Keeping this simple, for the URL https://www.website.com, “website” is the domain name. For more detail, check out the Wikipedia article on domain names.

Root domain & subdomain

A root domain is what we commonly think of as a domain name such as “website” in the URL https://www.website.com. A subdomain is the www. part of the URL. Other examples of subdomains would be news.website.com, products.website.com, support.website.com and so on.

For more information on the difference between a domain and a subdomain, check out this video from HowTech.

URL vs. URI

URL stands for “Universal Resource Locator” (such as http://ift.tt/2EZXxC8) and URI stands for “Uniform Resource Identifier” and is a subset of a full URL (such as /this-is-a-page.html). More info here.

HTML, CSS, and JavaScript

I’ve grouped together HTML, CSS, and JavaScript here not because each don’t deserve their own section here, but because it’s good for SEOs to understand that those three languages are what comprise much of how modern web pages are coded (with many exceptions of course, and some of those will be noted elsewhere here).

HTML stands for “Hypertext Markup Language”, and it’s the original and foundational language of web pages on the World Wide Web.

CSS stands for “Cascading Style Sheets” and is a style sheet language used to style and position HTML elements on a web page, enabling separation of presentation and content.

JavaScript (not to be confused with the programming language “Java”) is a client-side scripting language to create interactive features on web pages.

Further reading:

AJAX & XML

AJAX stands for “Asynchronous JavaScript And XML. Asynchronous means the client/browser and the server can work and communicate independently allowing the user to continue interaction with the web page independent of what’s happening on the server. JavaScript is used to make the asynchronous server requests and when the server responds JavaScript modifies the page content displayed to the user. Data sent asynchronously from the server to the client is packaged in an XML format, so it can be easily processed by JavaScript. This reduces the traffic between the client and the server which increases response time and speed.

XML stands for “Extensible Markup Language” and is similar to HMTL using tags, elements, and attributes and was designed to both store and transport data, whereas HTML is used to display data. For the purposes of SEO, the most common usage of XML is in XML Sitemap files.

Structured data (AKA, Schema.org)

Structured data is markup you can add to the HTML of a page to help search engines better understand the content of the page, or at least certain elements of that page. By using the approved standard formats, you provide additional information that makes it easier for search engines to parse the pertinent data on the page.

Common uses of structured data are to markup certain aspects of recipes, literary works, products, places, events of various types, and much more.

Schema.org was launched on June 2, 2011, as a collaborative effort by Google, Bing and Yahoo (soon after joined by Yandex) to create a common set of agreed-upon and standardized set of schemas for structured data markup on web pages. Since then, the term “Schema.org” has become synonymous with the term “structured data”, and Schema.org structured data types are continually evolving with new types being added with relative frequency.

One of the main takeaways about structured data is that it helps disambiguate data for search engines so they can more easily understand information and data, and that certain marked-up elements may result in additional information being displayed in Search Engines Results Pages (SERPs), such as review stars, recipe cooking times, and so on. Note that adding structured data is not a guarantee of such SERP features.

There are a number of structured data vocabularies that exist, but JSON-LD (JavaScript Object Notation for Linked Data) has emerged as Google’s preferred and recommended method of doing structured data markup per the Schema.org guidelines, but other formats are also supported such as microdata and RDFa.

JSON-LD is easier to add to pages, easier to maintain and change, and less prone to errors than microdata which must be wrapped around existing HML elements, whereas JSON-LD can be added as a single block in the HTML head section of a web page.

Here is the Schema.org FAQ page for further investigation – and to get started using microdata, RDFa and JSON-LD, check out our complete beginner’s guide to Schema.org markup.

Front-end vs. back-end, client-side vs. server-side

You may have talked to a developer who said, “I’m a front-end developer” and wondered what that meant. Of you may have heard someone say “oh, that’s a back-end functionality”. It can seem confusing what all this means, but it’s easily clarified.

“Front-end” and “client-side” both mean the same thing: it happens (executes) in the browser. For example, JavaScript was originally developed as something that executed on a web page in the browser, and that means without having to make a call to the server.

“Back-end” and “server-side” both mean the same thing: it happens (executes) on a server. For example, PHP is a server-side scripting language that executes on the server, not in the browser. Some Content Management Systems (CMS for short) like WordPress use PHP-based templates for web pages, and the content is called from the server to display in the browser.

Programming vs. scripting languages

Engineers and developers do have differing explanations and definitions of terms. Some will say ultimately there’s no differences or that the lines are blurry, but the generally accepted difference between a programming language (like C or Pascal) vs. a scripting language (like JavaScript or PHP) is that a programming language requires an explicit compiling step, whereas human-created, human-readable code is turned into a specific set of machine-language instructions understandable by a computer.

Content Management System (CMS)

A CMS is a software application or a set of related programs used to create and manage websites (or we can use the fancy term “digital content”). At the core, you can use a CMS to create, edit, publish, and archive web pages, blog posts, and articles and will typically have various built-in features.

Using a CMS to create a website means that there is no need to create any code from scratch, which is one of the main reasons CMS’ have broad appeal.

Another common aspect of CMS’ are plugins, which can be integrated with the core CMS to extend functionalities which are not part of the core CMS feature list.

Common CMS’ include WordPress, Drupal, Joomla, ExpressionEngine, Magento, WooCommerce, Shopify, Squarespace, and there are many, many others.

Read more here about Content Management Systems.

Content Delivery Network (CDN)

Sometimes called a “Content Distribution Network”, CDNs are large networks of servers which are geographically dispersed with the goal of serving web content from a server location closer to the client making the request in order to reduce latency (transfer delay).

CDNs cache copies of your web content across these servers, and then servers nearest to the website visitor serve the requested web content. CDNs are used to provide high availability along with high performance. More info here.

HTTPS, SSL, and TLS

Web data is passed between computers via data packets of code. Clients (web browsers) serve as the user interface when we request a web page from a server. HTTP (hypertext transfer protocol) is the communication method a browser uses to “talk to” a server and make requests. HTTPS is the secure version of this (hypertext transfer protocol secure).

Website owners can switch their website to HTTPS to make the connection with users more secure and less prone to “man in the middle attacks” where a third party intercepts or possibly alters the communication.

SSL refers to “secure sockets layer” and is a standard security protocol to establish communication encryption between the server and the browser. TLS, Transport Layer Security, is a more-recent version of SSL

HTTP/1.1 & HTTP/2

When Tim Berners-Lee invented the HTTP protocol in 1989, the computer he used did not have the processing power and memory of today’s computers. A client (browser) connecting to a server using HTTP/1.1 receives information in a sequence of network request-response transactions, which are often referred to as “round trips” to the server, sometimes called “handshakes”.

Each round trip takes time, and HTTPS is an HTTP connection with SSL/TSL layered in which requires yet-another handshake with the server. All of this takes time, causing latency. What was fast enough then is not necessarily fast enough now.

HTTP/2 is the first new version of HTTP since 1.1. Simply put, HTTP/2 allows the server to deliver more resources to the client/browser faster than HTTP/1.1 by utilizing multiplexing, compression, request prioritization, and server push which allows the server to send resources to the client that have not yet been requested.

Further reading:

Application Programming Interface (API)

Application is a general term that, simply put, refers to a type of software that can perform specific tasks. Applications include software, web browsers, and databases.

An API is an interface with an application, typically a database. The API is like a messenger that takes requests, tells the system what you want, and returns the response back to you.

If you’re in a restaurant and want the kitchen to make you a certain dish, the waiter who takes your order is the messenger that communicates between you and the kitchen, which is analogous to using an API to request and retrieve information from a database. For more info, check out Wikipedia’s Application programming interface page.

AMP, PWA, and SPA

If you want to build a website today, you have many choices.

You can build it from scratch using HTML for content delivery along with CSS for look and feel and JavaScript for interactive elements.

Or you could use a CMS (content management system) like WordPress, Magento, or Drupal.

Or you could build it with AMP, PWA, or SPA.

AMP stands for Accelerated Mobile Pages and is an open source Google initiative which is a specified set of HTML tags and various functionality components which are ever-evolving. The upside to AMP is lightning-fast loading web pages when coded according to AMP specifications, the downside is some desired features may not be currently supported, and issues with proper analytics tracking.

Further reading:

PWA stands for Progressive Web App, and it blends the best of both worlds between traditional websites and mobile phone apps. PWAs deliver a native app-like experience to users such as push notifications, the ability to work offline, and create a start icon on your mobile phone.

By using “service workers” to communicate between the client and server, PWAs combines fast-loading web pages with the ability to act like a native mobile phone app at the same time. However, because PWAs are JavaScript frameworks, you may encounter a number of technical challenges.

Further reading:

SPAs – Single Page Applications – are different from traditional web pages which load each page a user requests in a session via repeated communications with the server. SPAs, by contrast, run inside the browser and new pages viewed in a user session don’t require page reloading via server requests.

The primary advantages of SPAs include streamlined and simplified development, and a very fast user experience. The primary disadvantages include potential problems with SEO, due to search engines’ inconsistent ability to parse content served by JavaScript. Debugging issues can also be more difficult and take up more developer time.

It’s worth noting that future success of each of these web technologies ultimately depends on developer adoption.

Conclusion

Obviously, it would require a very long book to cover each and every bit of web technology, and in sufficient detail, but this guide should provide you, the professional SEO, with helpful info to fill in some of the blanks in your understanding of various key aspects of web technology.

I’ve provided many links in this article that serve as jumping off points for any topics you would like to explore further. There’s no doubt that there are many more topics SEOs need to be conversant with, such as robots.txt files, meta robots tags, rel canonical tags, XML Sitemaps, server response codes, and much more.

In closing, here’s a nice article on the Stanford website titled “How Does The Internet Work?” that you might find interesting reading; you can find that here.



from Search Engine Watch http://ift.tt/2Fkrwm0

No comments:

Post a Comment