Rules

Secret ingredients to quality software

Do you keep your files under the Google file size limit?

Last updated by Tiago Ara├║jo [SSW] on 09 Jun 2021 11:06 pm (6 months ago) See History

A maximum file size may be enforced per crawler. Content which is after the maximum file size may be ignored. Google currently enforces a size limit of 500kb.

Regarding other files:

  • All files larger than 30MB will be completely ignored.
  • HTML, the search appliance indexes up to 2.5MB of the document, caches it, and discards the rest.
  • A non-HTML format, the search appliance:
  • Downloads the non-HTML file.
  • Converts the non-HTML file to HTML.
  • If the converted content is less than 4,000,000 bytes, indexes the first 2MB of the HTML file. (Take note that 4MB=4,194,304 bytes.) If the converted content exceeds 4,000,000 bytes, the document is not indexed. However, the document and a link to it do appear in search results.
  • Caches the first 2MB of the HTML file.
  • Discards the rest of the HTML file and the non-HTML file.
Adam CoganAdam Cogan
Tiago AraujoTiago Araujo
Camilla Rosa SilvaCamilla Rosa Silva

We open source. Powered by GitHub