SharePoint 2013 introduces a new improved version of search that is different from the previous
versions of SharePoint. The SharePoint search and FAST search has been combined
into a single search platform. Instead of the different versions of search like
WSS search, foundation etc., in 2013 there is only foundation search and SharePoint
server search. Along with these there are lot more new components and topology
changes to the search architecture of SharePoint 2013 search.
The Search architecture in SharePoint 2013 includes now
components for crawling, indexing content, administration and executing search
queries.
The main components of SharePoint 2013 search are:
- Admin Component
- Crawl Component
- Content Process Component
- Analytics Processing Component
- Index Component
- Query Processing Component
SharePoint 2013 search admin component
The admin component runs the system processes for search,
and performs provisioning of other search components within the topology. The main
responsibilities of the admin component includes, topology changes and search
provisioning, manage the search admin DB, scheduling the crawling and content
processing.
Crawl component
Crawling is simply a process of gathering documents from
various sources/repositories, making sure they obey by various rules and
sending them off for further processing to the Content Processing Component. The
crawl component is responsible for crawling content sources in SharePoint 2013.
The content sources can be SharePoint sites, Microsoft exchange server public
folders, BCS external content sources, file shares, SharePoint sites etc.
During the crawl process crawl component
connects to the content sources, passing crawled items to the content
processing component by invoking the appropriate indexing connector or protocol
handler for retrieving information.
SharePoint 2013 supports three different kinds of crawls:
- Full: During full crawl, the entire content source is indexed regardless of the fact that only specific items have changed since the last crawl. In short it crawls all content defined in the sources every time a crawl is scheduled
- Incremental: It crawls content that has been modified since the last crawl based on either a timestamp or a change log.
- Continuous : Continuous crawling is an option that can be used instead of an incremental crawl when we want a content to be continuously crawled. You can achieve maximum freshness of search index as the continuous crawling can be executed in parallel and does not expect the prior crawl to be completed before a new one is launched.
Some important points to consider in continuous crawling is:
- Continuous crawling can only be enabled on content type SharePoint sites
- The default interval is 15 minutes and can only be changed using the PowerShell cmdlet Set-SPEnterpriseSearchCrawlContentSource
- Once started it can’t be stopped or paused.
Content processing component:
The Content Processing receives crawled content from the
crawl component and performs does some analysis/processing on the content to
prepare it for indexing and sends it off to the Indexing Component. It takes
crawled properties as input from the Crawler and produces output in terms of
Managed Properties for the Indexer to be indexed. The content processing
component makes use of parsers to process the content to generate indexes. If
the content processing component is unable to parse a file, the search index
will only include the basic file properties.
Analytics processing component
The Analytics Processing Component performs search
analytics and usage analytics to improve search relevance. Search analytics
refer to the process of detecting analytic information like links, anchor test
etc. from the crawled content. The component also processes user initiated
analytics like clicks per item etc. which is referred to as usage analytics.
Both these analytics output are used to create search reports and generate
recommendations and deep links. The results from the analyses are added to the
items in the search index. Additionally, results from usage analytics are
stored in the analytics reporting database. This makes a lot of since to put
this under the Search umbrella for the simple fact that post analytic
processing, the analytic data is committed to the index and is used in a
variety of ways like boosting relevance of search result or viewing the number
of clicks when using the hover panel over a search result.
Index component
The index component is responsible for building the index
file. The index file contains crawled properties from content sources, along
with ACL that ensures that search results are displayed to users who has proper
rights to view the content. The index component stores both crawled items and
their associated properties. The component makes use of update groups to allow
partial updates for the changes in the
content which makes it more efficient as the change for the content is now only
updated within the index of the associated update group instead of the entire
content.
Query processing component
The Query Processing Component analyzes and processes
queries and results to optimize precision, recall and relevance. It is tasked
with taking a user query that comes from a search front-end and submits it to
the Index Component. It routes incoming
queries to index replicas, one from each index partition. Results are returned as a result set based on
the processed query back to the component, which in turn processes the result
set prior to sending it back to the search front-end. It also performs
linguistics processing such as word breaking and stemming before submitting the
query to the index component.
That's all for the
architecture introduction to SharePoint 2013 search, in the future posts we'll
look more into extending the SharePoint search
infrastructure and details.
No comments:
Post a Comment