브라우저가 최신 버전이 아닙니다. 사이트가 제대로 표시되지 않을 수 있습니다. 브라우저를 업데이트해 주세요.

지식 베이스
Semrush Toolkits
SEO Toolkit
Site Audit
Why are only a few of my website’s pages being crawled?

Why are only a few of my website’s pages being crawled?

Question

If you’ve noticed that only 4-6 pages of your website are being crawled (your home page, sitemaps URLs and robots.txt), most likely this is because our bot couldn’t find outgoing internal links on your Homepage. Below you will find possible reasons for this issue.

Problem with outgoing internal links

There might be no outgoing internal links on the main page or they might be wrapped in JavaScript. Our bot cannot parse JavaScript content, so if your homepage has links to the rest of your site hidden in JavaScript elements, we will not be able to read them and crawl those pages.

Although we cannot crawl JavaScript content, we can crawl the HTML of a page with JS elements, and we can review the parameters of your JS and CSS files with our Performance checks.

In both cases there is a way to ensure that our bot will crawl your pages. To do this, you need to change the crawl source from “website” to “sitemap” or “URLs from file” in your campaign settings:

Why are only a few of my website’s pages being crawled? image 1

“Website” is a default source. It means we will crawl your website using a breadth-first search algorithm and navigate through the links we see on your page’s code - starting from the homepage.

If you choose one of the other options, we will crawl links that are found in the sitemap or in the file you upload.

The Site Audit crawler could have been blocked

Our crawler could have been blocked on some pages in the website’s robots.txt or by noindex/nofollow tags. You can check if this is the case in your Crawled pages report:

Why are only a few of my website’s pages being crawled? image 2

You can inspect your Robots.txt for any disallow commands that would prevent crawlers like ours from accessing your website.

If you see the following code on the main page of a website, it tells us that we’re not allowed to index/follow links on it and our access is blocked. Or, a page containing at least one of the two: "nofollow", "none", will lead to a crawling error.

<meta name="robots" content="noindex, nofollow">

You will find more information about these errors in our troubleshooting article.

Your Home page is more than 2 MB
Site Audit is currently equipped to parse pages not larger than 2MB. In this case you will see the large HTML page size error:

 

최근 조회