Last updated by Tiago Ara├║jo [SSW] on 09 Jun 2021 11:06 pm (6 months ago) See History
Do you keep your files under the Google file size limit?
A maximum file size may be enforced per crawler. Content which is after the maximum file size may be ignored. Google currently enforces a size limit of 500kb.
Regarding other files:
- All files larger than 30MB will be completely ignored.
- HTML, the search appliance indexes up to 2.5MB of the document, caches it, and discards the rest.
- A non-HTML format, the search appliance:
- Downloads the non-HTML file.
- Converts the non-HTML file to HTML.
- If the converted content is less than 4,000,000 bytes, indexes the first 2MB of the HTML file. (Take note that 4MB=4,194,304 bytes.) If the converted content exceeds 4,000,000 bytes, the document is not indexed. However, the document and a link to it do appear in search results.
- Caches the first 2MB of the HTML file.
- Discards the rest of the HTML file and the non-HTML file.