SharePoint: How to prevent your content from being crawled

on Friday, February 26, 2010

A client of mine did not want parts of their Intranet based on MOSS 2007 to be searched; now that’s quite easy to do with Crawl rules. As they wanted strict control over the content that can be crawled I trained them on how to control ‘the crawl’ behaviour for content within SharePoint.

There are three places within SharePoint where end users can change search visibility setting or exclude content for search.

  1. Change Visibility for a Web site: In the Site Settings page, site owners can click the Search Visibility link to go to the Search Visibility page SharePoint. In the Indexing Site Content group, selecting the No option to will exclude all content within the site. The crawler will in turn simply skip the site and not include any of its content in the index.
  2. Exclude Site Columns: In the Site Settings page, site owners can click the Searchable Columns link to go to the Search Settings for Fields page in SharePoint. This page will enumerate all the site columns defined at the current site level. Selecting the NoCrawl check box for the site columns will exclude them from future crawls.
  3. Exclude Lists: The last option for excluding content exists at the level of the SharePoint list settings. It enables list owners to exclude the list and all the content within it from crawls. In the settings page of the list, click the Advanced Settings link. Selecting the No option under Search Available for crawl tells the crawler to exclude the specific list from a crawl.

Also if there are certain pages or entire site content that site designers don’t want to be crawled, simply putting the no index directive in the head section of the master page or layout page’s mark-up will stop the SharePoint crawler from crawling the pages:

<META NAME=” ROBOTS” CONTENT=” NOHTMLINDEX”/>


0 comments: