How to Sitemap ?

A Sitemap is an easy way for webmasters to inform robots of search engines about pages that make up its Web site. It is particularly suitable for Web sites that have no links as HTML sites with Flash or Javascript, but also information on the pages behind a form.

It's an XML that lists URLs of a site by adding information such as date of last update, the frequency at which data are induced to be altered and the relative priority of URL.

To use a Sitemap must comply with the protocol Sitemap Generator to make the sitemap of your web site.

Using the Sitemap protocol does not guarantee the inclusion of web pages in search engines, but provides guidance to crawlers and optimize their operation.

Sitemap XML

The Sitemap protocol format consists of XML tags. All data values in a Sitemap must use escape characters to entity.The file itself must be saved with a UTF-8.

A Sitemap is limited to 50 000 URLs and 10 MB (10 485 760 bytes). But, you can compress your Sitemap files using gzip to reduce the needs in terms of bandwidth, however, the sitemap file once unzipped should not exceed 10MB.

Exemple Sitemap XML

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <url>
      <loc>http://www.example.com/</loc>
      <lastmod>2008-01-01</lastmod>
      <changefreq>monthly</changefreq>
      <priority>1.0</priority>
   </url>
</urlset>

Tag Definitions Sitemap XML>

Mandatory
Tag Description
<urlset> Manages the xml file and references the current protocol standard with the xmlns attribute.
<url> Parent tag for each URL entry.
<loc> URL of the web page. This URL must begin with the protocol (http, for example). The URL must not contain more than 2 048 characters.

Optional
Tag Description
<lastmod> Date of the last modification of the file. This date should be formatted date and time W3C. This allows you to omit the time, if you want, and use the format YYYY-MM-DD..
<changefreq> Frequency is likely to change the page.This value provides search engines with general information and do not necessarily reflect how often they crawl the page. Valid values are: :

  • always
  • hourly
  • daily
  • weekly
  • monthly
  • yearly
  • never
<priority> The priority of this URL relative to other URLs on your site. Acceptable values are between 0.0 and 1.0. This value has no effect on your pages compared with other sites. It only lets the search engines which pages you deem most important for the crawlers..

The default priority of a page is equal to 0.5.

DTD Sitemap XML

<?xml version="1.0"?>
<!DOCTYPE sitemap [
 <!ELEMENT urlset (url+)>
 <!ELEMENT url (loc, lastmod?, changefreq?, priority?)>
 <!ELEMENT loc (#PCDATA)>
 <!ELEMENT lastmod (#PCDATA)>
 <!ELEMENT changefreq (#PCDATA)>
 <!ELEMENT priority (#PCDATA)>
]>

Index Sitemap XML

If you want to list more than 50 000 URL, you must create multiple Sitemap files. And list each Sitemap file in a Sitemap index file. The Sitemap index files may contain up to 1 000 Sitemaps and must not exceed 10 MB (10 485 760 bytes) and can be compressed. You can use multiple Sitemap index files.

Sample Sitemap XML

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"&g;
   <sitemap>
      <loc>http://www.example.com/sitemap1.xml</loc>
      <lastmod>2008-01-01</lastmod>
   </sitemap>
   <sitemap>
      <loc>http://www.example.com/sitemap2.xml.gz</loc>
      <lastmod>2007-12-01</lastmod>
   </sitemap>
</sitemapindex>

Tag Definitions Index Sitemap XML

Mandatory
Tag Description
<sitemapindex> Encapsulates information about all of the Sitemaps file.
<sitemap> Encapsulates information about an individual Sitemap.
<loc> Indicates the location of the Sitemap.

Tag Description
<lastmod> Indicates the time at which the corresponding Sitemap file was modified, not the hour at which one of the pages that Sitemap were changed. The value for the lastmod tag should be provided to format date and time W3C.
Indicating the date and time modified, you allow crawlers from search engine indexes only extract the only part of Sitemaps, such as those that have been modified since a certain date. This mechanism of extracting incremental Sitemaps enables rapid discovery of new URLs on very large sites.

DTD Index Sitemap XML

<?xml version="1.0"?>
<!DOCTYPE sitemapindex [
 <!ELEMENT sitemapindex (sitemap+)>
 <!ELEMENT sitemap (loc, lastmod?)>
 <!ELEMENT loc (#PCDATA)>
 <!ELEMENT lastmod (#PCDATA)>
]>

Sitemap file location

The location of the Sitemap file determines the set of URLs may be included in that Sitemap.
A Sitemap file located at http://exemple.fr/catalog/sitemap.xml can include any URLs starting with http://exemple.fr/catalog/, but can not include URLs starting with http://exemple.fr/images/..

For more information visit www.sitemap.org
Sitemap Generator