How to Crawl Wikipedia for Expired Domains

It goes without saying that Wikipedia is one of the most trusted resources online.

It’s the 7th most popular web site in the world according to Alexa, and a domain authority of 100 – it’s clearly not getting any better than that.

Most, if not all, Wikipedia articles must cite their sources, and link out to those sources.

These sites, not unlike other sites on the net, will quite often be left to rot and die.

Whatever the topic, you’ll be able to find expired domains specific to your niche – and they’ll come preloaded with backlinks from Wikipedia.

While the links are “nofollow”, they are great links to have all the same – as each document in Wikipedia has been authored (or had its changes approved) by a real human being, who has been given special privilege to be an editor of the site.

As a result, Wikipedia links are considered “trustworthy” in the eyes of Google.

Using PBN Lab, you can quickly crawl those pages and find the expired domain names.

In this video, I demonstrate how to perform a quick Google search, grab the source code directly from the search engine results page (SERP) and have the job wizard automatically load the Wikipedia pages as the seed URLs for your crawl.

Then I take a quick look at the results which reveal a few strong expired domains, with links directly from Wikipedia, or from other authority sites whose page links to the expired domain.

2 Comments

  1. How did you mange to not show Google Adverts?

    I would rather not crawl them, just the SERP’s?

    Reply
    • Hey Oliver,

      The HTML parser automatically skips any links on any source your parse, where the site hostname includes the word Google.

      On the Google SERPs, for tracking purposes, all of their AdWord links actually point to a Google site – where they then redirect your browser to the advertised URL.

      So that’s how I avoided the Ads. It also avoids constantly trying to crawl the Google links found on the footer of the SERPs! =)

      Cheers,
      Scott

      Reply

Leave a Reply to Scott Cancel reply

Your email address will not be published.