Google quietly announced in an update to Googlebot’s help document that it will crawl the first 15 MB of a webpage. Anything after this point will be excluded from ranking calculations.
“Any resources referenced in the HTML such as images, videos, CSS and JavaScript are fetched separately. After the first 15 MB of the file, Googlebot stops crawling and only considers the first 15 MB of the file for indexing. The file size limit is applied on the uncompressed data.” Google said in a blog posted.
This implies that code of the website must be structured in such a way that the SEO-relevant information on the first 15 MB is placed in an HTML or supported text-based file as well as whenever possible, images and videos should be compressed rather than directly encoded into HTML.
Some in the SEO community wondered if this meant that text that fell below images at the cutoff in HTML files would be completely ignored by Googlebot.
“It’s specific to the HTML file itself, like it’s written,” John Mueller, Google Search Advocate, clarified via Twitter. “Embedded resources/content pulled in with IMG tags is not a part of the HTML file.”