All you need to know about Uptime, Downtime, and Runtime Errors!

All You Need To Know About Uptime, Downtime, And Runtime Errors!

Many times in the computer industry, we come across terms such as Uptime and Downtime, Runtime and Runtime error, etc. Here, we will look at what they mean and everything related to them.

 

Uptime and Downtime

The term Uptime refers to the time-period during which a computer is running or is operational. It is a standard of measurement and is done so in terms of a percentile. The standard is 99.999 percentile also called five 9s.

Downtime is the time-period for which the computer is unavailable or non-operational. It is also called outage duration. The unavailability of the system or network may be due to any system failures (unplanned event) which may occur from some kind of a software crash or from communication failures (network outages). Maintenance routines (this are planned events) can also lead to unavailability of the system and thereby causing Downtime.

 

Uptime and Downtime are critical for determining the success level of the real-time services or systems. These terms help us quantify the success value of these services or systems.  Generally, every service level agreements and contracts include an uptime/downtime ratio that shows for how long that service will be operational and available to the users.

In the IT sector, professionals uses these terms to refer to a total amount of operational time (for example if a system has been running constantly for three weeks it is said to have a “three-week uptime”).

In general, uptime is always referred to as a default or standard, whereas; downtime often requires specifics like maintenance downtime, downtime due to malfunctions, etc

Calculating Server Uptime Percentage:

In order to calculate the Uptime percentage, we first calculate the Downtime percentage for a given timeframe and subtract it from 100%. We will look at the breakdown of the calculation with an example:

 

Calculation of Downtime Percentage:

 

To calculate Downtime % we divide the total time your monitor was down in a given timeframe by the total time the monitor was being monitored, i.e. the timeframe itself.

Let us consider a website was monitored for 24 hours (= 86400 seconds). This is our timeframe.

Now in this timeframe say, the monitor was down for about 10 minutes (=600 seconds).

Thus, 600/86400=0.0069. In percentage = 0.69%. This is our Downtime %.

 

Calculation of Uptime %:

 

Now we subtract the downtime percentages from 100% to get the required quantity.

Uptime % = 100% – Downtime % = 100% – 0.69% = 99.31%. This is your Uptime %.

 

  • Monitor Status Transitions.

There are different monitor statuses like OK, Confirmed Error and Unconfirmed Error. Sometimes it is confusing because you cannot understand what to call the time between two such statuses: up or down. However, it is quite simple. Here, we will go through an overview of these status transitions and what they mean.

A. OK to Unconfirmed Error:

This status transition is considered as UP. This is because the error is unconfirmed and we are not sure if there has been an actual error; in order to get more information about the error a double-click needs to be performed.

B. Unconfirmed Error to Confirmed Error

This transition is DOWN, as the error is now confirmed.

C. Confirmed Error to Unconfirmed Error:

This transition is also considered to be DOWN, as the monitor is still in an error state and will remain so until an OK indication is detected.

D.Confirmed Error to OK:

This is also DOWN. A monitor is considered to be UP only after the OK indicator is detected.

E. Unconfirmed Error to OK:

This status transition like the first one is also considered to be UP. As before, here also, we are not clear if an actual error has occurred or not and need to double-click to know in details.

 

Factors affecting Server Uptime and Downtime.

 

Here, we will be looking at what affects the Uptime and Downtime % of a web server rather than a system. Keep in mind the terms are the same in case of a web server as well

 

A. Factors affecting Uptime percentage

 

There are four major factors: hardware, software, configuration, and strategy.

1. Hardware:
Hardware plays a major impact on speed, reliability, storage ability, and overall performance. Some hardware areas affecting Uptime are-

  • The infrastructure: – A high capacity infrastructure can effectively handle traffic spikes and future expansion in storage capacity
  • Number of central processing units (CPU’s): – With more number of CPUs information processing within central applications and database servers becomes faster.
  • Communications provided by the ISP: – Although it is difficult to maintain, but given the need it can effect performance drastically and increase uptime.
  • Storage medium: – Better and faster storage medium can access information quicker and decrease the load on the servers thereby increasing uptime.

Good hardware, therefore, can increase your server uptime drastically.

2. Software:

The software has similar effects on server performance. Some of these are:

  • The programming of the platform: – A well-written code will have contingency mechanisms for any potential issue that may arise and thus increases uptime.
  • Releases: – New releases help maximization of programs in softwares.
  • Versions: – New versions can root out bugs from the previous ones and improve performance and decreases server load.

These are the basic areas of software affecting uptime.

3. Configuration-

Configuration also affects overall performance.

  1. Number of clients: – Uptime is greatly influenced by the number of people accessing the server at any given time. A DoS attack can overload a server and causes downtime. This mainly occurs when multiple communication requests are sent to the server and it cannot respond to legitimate traffic. Thus if the server is capable of handling a large number of clients it can help the server maintain its functionality although it cannot prevent a DoS attack.
  2. Number of languages: – Even if a large number of languages, when installed in the software, can sometimes cause overloading, it still can increase uptime if the most popular ones are used more often.

These are the basic configuration areas affecting server uptime.

 

4. Strategy

This is the most critical aspect that affects server uptime. Having a solid strategy means to be prepared for worst-case scenarios.

  1. Strategy upgradations: – This should focus on upgrades to minimize downtime and reduce resources.
  2. Administration ability: – The administration controlling the systems affects the uptime most. The admins need to keep the systems healthy by constantly staying up-to-date with new technologies and upgrades available in the market.

These four are the primary factors in this case and focusing on them will keep the systems running smoothly and affect uptime drastically.

 

B. Factors affecting Downtime percentage

Causes

The following are the most common causes that lead to server Downtime:

 

  • Hardware FailureHardware means equipments that we use to run the servers; simply put it means the computers we use and just like everything else, they degrade over time. Neglecting these equipment, (storage drives, CPUs, memory and motherboards) will lead to erroneous functionality. Power outages directly lead to hardware failures until you have an auxiliary power source to back you up.
  • Software Failure-

There are various reasons for software failure on a sever, from issues in the OS, bugs in the logics, system overloads and much more. Thus, keep your softwares upgraded and secured.

  • Human Error –

As interactions between clients and servers are done manually, the risk of human error is always a possibility. Incorrect input during critical operations, poor configuration of servers etc. all lead to downtime on a server.

 

  • Remedies –
    You can take the following remedies to prevent a server downtime:

  • Always back up data-
    Backing up data will not only prevent data loss but help you avoid server downtime as well by keeping the UI functional even if the backend experiences problems. One way to do it is via virtualization.

  • Update software and hardware frequently-
    Always remain updated to the newest version of the OS and softwares that you use to run the server in order to avoid downtime issues. Similarly updating hardwares will reduce the work load impacted upon them and help run the server more smoothly.

  • Double check for configurations-
    A single error in configuration can lead to drastic effects. Always double check on configurations for preventing any issues.

  • Have strong policies in place

Runtime Errors

 

The time taken to execute a program is called its Runtime. During this time, if any error takes place then it is known as a Runtime Error. Unlike compilation errors that happen during program compilation, a runtime error occurs only during the execution of a program.

Runtime error notifications usually appear in a massage box showing a specific error code that contains the nature of error. It is very common for the system to slow down considerably prior to a runtime error.

A runtime error is generated by the system when an issue occurs and the software is unable to solve it. Keep in mind that the error is generated by the specific software and not by the OS. The software first performs a self-diagnosis and when it sees that it cannot proceed any further it generates the error. After the error is displayed and closed the software is generally exited; in some cases the OS is rebooted as well. There are various reasons that lead to such errors. Some of them are:

 

  • Poor Programming

This is a very common cause of runtime error. A poorly written program will always give a runtime error. It is very likely that such a program will generate a compilation error and even if the engineer somehow manages to compile it a runtime error is inevitable. The only solution to this is to write a robust and flawless program in the first place.

 

  • Memory Issues

Sometimes when an engineer loads his/her software with memory leaks or if the system he/she uses has bad memory storage a runtime error generation is very likely. There is no solution as the software itself is flawed and as for the system memory issue, the only way out is to increase the memory capacity (if possible).

 

  • Other Softwares

In case of a shared environment like in Windows, if a rouge application pokes into your area of work, there is a high probability for a runtime error. Furthermore, if a poorly written program is generated in such an environment, it can affect other softwares as well. In such cases, you can shut down all other applications and softwares and see if the software that generated the error still does so.

 

  • Damaged Hardware

The functionality of any software is largely dependent on the functionality of its associated hardware. Thus, if your hardware is damaged and you are still running it then be ready to face a runtime error because the hardware cannot fully support the software anymore. Say you are using an older computer or out-dated hard drives, or you recently had an electrical storm, etc. then any of these events can lead to a runtime error while executing your operations.

 

  • Malicious Virus or Ad Ware

Last but not the least computer viruses and/or malicious ad wares; they are a major reason for a lot more than just runtime errors. These programs run in the background and remain undetected. Although they do not directly cause a runtime error but can affect your software, which may lead to one. The only solution to these is getting rid of them. It is very essential to scan your system for them with some kind of antivirus software on a daily basis and whenever you find one do not hesitate to ‘kill’ it.

 

To Conclude –

 

To sum up we have seen what uptime and downtime is and how they are calculated. We have also seen what factors can affect server uptime and downtime and how to solve any issues regarding them. We have also seen what a runtime error is and how it happens, what causes it to happen and what the solutions to them are. Now you know what these terms mean and how to deal with them.