Overview
In this course, we'll give you the tools to quickly identify and solve real-world problems that you might come across in your IT role. We'll look at a bunch of different strategies and approaches for tackling the most common pitfalls of your code and IT infrastructure. You'll learn strategies for approaching almost any technical problem and then see how those apply to solving different real-world scenarios.
We picked examples that include general system issues, issues with software that someone else wrote, and issues with programs that we wrote. We'll talk about problems that can affect any operating system, and we'll also look at challenges specific to certain platforms and scripting languages. We strongly recommend that you’ve taken the prior courses in this program, or already have knowledge of Python and Linux so that you can follow along with our troubleshooting examples.
Syllabus
- Troubleshooting Concepts
- In this module, you’ll be introduced to the fundamentals of troubleshooting and you’ll learn different strategies and approaches to tackle problems that you might encounter. You’ll learn about the concept of debugging and how debugging is one of the core principles of troubleshooting. You’ll be introduced to some tools that will help you in the debugging process, like tcpdump, ps, top, itrace, and lots more. Next, you’ll explore how to “understand the problem.” This might sound like a no brainer, but it's not as easy as you might think! Next, we’ll dive into the different approaches when troubleshooting reproducing errors versus intermittent errors. Finally, you’ll learn about “binary searching a problem.” We’ll explore the different types of searches, including binary and linear searches. Then, we’ll learn about the concept of bisecting and how it can be used in your troubleshooting approach, and finish up with finding invalid data in a CSV file.
- Slowness
- In this module, you’ll learn about what factors can cause a machine or program to act slowly. You’ll dive into ways of addressing slowness by identifying the bottleneck that might be causing the slowness. You’ll learn about tools to identify which resources are being exhausted, including iotop, iftop, and activity monitor in MacOS. Next, you’ll learn how computers use resources, and understand the differences between CPU, RAM, and cache in order to help you find the possible causes for slowness in our machines or scripts.Next up, you’ll learn how to write efficient code, then explore profilers to help you identify where your code is spending most of its time. Next, you’ll dive into data structures and understand which ones are right for you to use. These include lists, tuples, dictionaries, sets, and expensive loops. Then, you’ll dive into complex slowness problems and how utilizing concurrency and adding a caching service can improve the execution of your code. Finally, you’ll understand how using threads can make the execution of your code much quicker.
- Crashing Programs
- In this module, you’ll get introduced to the age old question, “Why has my program crashed?” You’ll learn how to troubleshoot system crashes and application crashes, what tools can be used to help identify the cause of the crash, and what log files to look at in order to find what might have gone wrong. Next, you’ll dive into investigating why code crashes, and what you can do to prevent that from happening. Then, you’ll explore what happens when an unhandled error occurs and throws an exception. You’ll learn about several debugging techniques, which will help you identify these errors and exceptions. Finally, you’ll explore the concept of handling crashes and incidents at a much larger scale. You’ll delve into a scenario where a large eCommerce site will throw an error 20% of the time. Once that issue has been fixed, you’ll understand the importance of communication and documentation during these incidents, and how writing a post mortem can prevent issues from happening again.
- Managing Resources
- In this module, you’ll learn how to manage your applications. You’ll dive into some common issues that may cause your application to crash. You’ll also understand what memory leaks are, and how to troubleshoot and prevent them. Up next, you’ll run down managing disk space; you’ll see some scenarios of how this happens and how to identify what process or application is taking up all your disk space. Then, you'll learn what network saturation is, how it can be caused, and some useful tools and techniques to solve a network saturation problem. Next, we’ll shift from managing applications to managing your time. You’ll get tips on how to prioritize tasks, estimate how long a particular task will take before finishing, and communicate expectations when dealing with important tasks. The final lesson delves into how to deal with hard and complex problems by breaking it down into small, digestible chunks while keeping your eyes on a clear goal. You’ll learn that using proactive approaches, like continuous integration, can help you with future issues that might come up. You’ll also explore how to plan for future resource usage by making good use of monitoring.
Taught by
Tags
Reviews
5.0 rating, based on 1 Class Central review
4.6 rating at Coursera based on 2975 ratings
Showing Class Central Sort
-
I hope this course is very help full AND I NEED IT TO CHANGE MY VIEW. I AM MECHANICAL ENGINEER SO ITS VERY GOOD TO KNOW HOE TO TROUBLESHOOTING A PROBLEM TO GIVE SOLUTION.THIS COURSE IS VERY IMPORTANT TO EVERY BODY WORKING IN TECHNICAL AREA .