4.5 -

- 201 People Enrolled

- Advanced

Web Scraping with HTTPClient in C#

Retrieve and use data from the internet with the HTTPClient

This e-course will teach you how to use the HTTPClient to retrieve information from the Internet and use it for your own application. We will look on what the HTTPClient is, how to do POST and GET requests, parse HTML with the HTML Agility Pack, and how to store the information.

FREE

€ 5,99

This course includes:

8 Hours
Of self-paced video lessons

45 days access
Enjoy this e-course for 1 1/2 month!
(give or take)

45 days access to Discord
Talk with others about this e-course, ask questions, and help others.

Skills you will learn

This e-course is for

Software Developers
Fullstack Developer
Working with data

Topics In This Course

The number of stars indicates the level of focus on the topic.

Introduction to web scraping

Usage of the HTTPClient

Using the HTTPClientFactory

HTML Agility Pack

XPath

Strategy Pattern

Concurrency throttling mechanism

HTTP request resilience

SQLite databases

Dependency injection

Chapters Of This Course

Introduction

Introduction to this course.

Goals

Let’s set some goals of what we are going to learn.

The Idea Behind Web Scraping

What is web scraping and why do we use it? A small introduction to web scraping before we begin.

Legal and Ethical

Before we start this course, we need to talk about legal and ethical issues. You can’t go scraping all the websites you want. There are rules, terms of service, and even laws.

A Simple Class

We are going to make a simple class that holds all the information we will scrape from the websites. Each time we get the information from an online source we need to store the information somewhere. For now, that is in-memory.

Getting The HTML

The very first thing we need to do is to grab the HTML from the online source we want to scrape. I will start with the page that shows all the blogs on my website.

HTML Agility Pack

In the previous chapter, we have retrieved the HTML. Now we need to inspect that HTML and only extract that information we need. We do this with the package HTML Agility Pack, which is built to inspect HTML with XPath.

Using The Scraped Data

We now have the data from the website. Let’s show this information on the screen.

Scraping Another Source

Now that we know the basics of building one scraper, let’s add another one! But from a different website. The goal is to only build a new scraper and not change the whole application. Both scrapers (the previous one and the new one) should work along side each other.

Strategy Pattern and Scraping

The strategy pattern can be used to simply switch between different algorithms without changing the code’s logic. Ideal if you have multiple scrapers and you don’t want huge if-statements.

Manage Headers

Request headers store vital information, such as authentication/authorization, the method (POST, GET, PUT, DELETE), the content type, and much more. In this chapter, we will add the user-agent header since this is mostly required by websites.

Rate Limiting and Throttling

If you want to scrape a website with different pages you are launching a lot of requests to that one website and/or server. At one point the website and/or server is going to block you. To avoid this we are going to build a mechanism that restricts the number of requests within a specific time window. A throttling system will add delays between the requests.

Error Handling And Resilience

We are going to add a retry mechanism to our code. We will add a retry counter, check the request response and retry the request when the status code is 429, and repeat this routine until the maximum number of retries is reached.

Data Storage

Grabbing the information from the web pages is cool, but storing the data somewhere would be convenient. Let’s store some information in a SQLite database and use that stored information in our application.

Optimizing HTTPClient Performance

Time to add some performance changes. The code works, no worries there, but there are a few parts we can make better. In this chapter, we will look at socked exhaustion, pipelining, and the HttpClientFactory.

Conclusion

A small recap and some small notes after the course.

Stay up to date with news, deals, new courses, and much more!

Subscribe For Our Newsletter

Frequently Asked Questions

What is an e-course?

An e-course is a digital course you can follow or take.

An e-course is usually a written course with information on the subject you want to follow. It contains examples (code, images, graphs) and explanations.

E-courses are not live and you can start, pause, continue, and stop whenever you want.

It’s not only text and examples but also testing your knowledge with a quiz at the end of a chapter*. This is done with Kens Learning Paths, a dedicated testing platform where you can test your knowledge and check if you mastered the information.

How can I participate in this e-course?

To take the e-course you first need to create an account. Don’t worry, not much information is needed. With your account,you get your dashboard.
Once you have registered for an e-course, the e-course is added to your dashboard.

Start or continue an e-course from your dashboard.

How long is this e-course?

It depends on your speed. You can go through the e-course when and how you want.
But if you would go berserk on it, you could do it in two days.

Do I get any support for questions?

Yes and no.

The yes: You will be added to the Discord server. Here you can ask questions to other participants and sometimes a teacher will be online too.

The no: E-mails send to us about question on the subject are not answered. This is done to keep questions centralized to Discord.

But… If you have a problem with the e-course (bug, access problems, stuff like that) we would like you to send an e-mail or place it in our support Discord channel.

Is there any certification for this e-course?

Currently not. It is planned in the future. If you finish an e-course, and you stick around, you will get a certificate when it’s available.

How long can I access this free e-course?

You are allowed 45 days for this e-course. It’s not possible to extend this.

* = Some chapters do not include Kens Learning Paths. This is because the chapter doesn’t need one or Kens Learning Path is not ready for it.

Web Scraping with HTTPClient in C#

FREE

€ 5,99

Skills you will learn

This e-course is for

Topics In This Course

Chapters Of This Course

Stay up to date with news, deals, new courses, and much more!

Subscribe For Our Newsletter

Frequently Asked Questions

About

Useful links

Contact

Stuck at learning C# and .NET?
Then here is something for you!

Register and receive your free copy today!