Historical analytics data conversion to MongoDB for Sitecore 8.x – Batch Aggregation

Are you currently using Sitecore Analytics without MongoDB? Are you planning to upgrade Sitecore to 7.5 or upper version with MongoDB?

If yes, its time to decide whether you want to convert the historical data or want to go with a fresh MongoDB setup. Cause if you want the historical analytics data, you need to go though a tedious and lengthy Analytics conversion process. And the tediousness depends on the amount of analytics data you currently have. 🙂

Historical anaytics conversion is a four steps process:

  1. Data Optimization
  2. Data Conversion
  3. HashStoredIP Tool Execution
  4. Rebuilding the reporting database

1. Data Optimization

To perform data optimization you will need the Sitecore Analytics Data Optimization Tool (command line tool) to analyze and fix errors in the database that you want to convert.

  • Download the Data Optimization Tool.zip package from the same page as the latest version of Sitecore 7.5 on SDN.
  • Unzip the Sitecore Analytics Data Optimization Tool.zip package to a local folder on your computer.

Creating Indexes 

An important part of preparation for the conversion process is to create indexes in your Sitecore 7.2 SQL server database that is going to be converted to Sitecore 7.5. Creating indexes can help to speed up searching in the database and also increase data conversion speed. execute the following sql scripts in SQL Management Studio:

CREATE NONCLUSTERED INDEX [IDX_ExternalUser] ON [dbo].[Visitors] ([ExternalUser]); CREATE INDEX [IX_LocationId] ON [NotificationSubscriptions] ([LocationId]);   CREATE NONCLUSTERED INDEX [IDX_Language] ON [dbo].[Visits] ([Language]);

CREATE NONCLUSTERED INDEX [IX_Pages_VisitId] ON [dbo].[Pages] ([VisitId]) INCLUDE ([PageId].[VisitPageIndex],[DateTime]);

[Note: First three index are provided by Sitecore, we created the fourth one by analyzing SQL profiler cause the Optimization Process was taking too much time for us as we had 28 GB of analytics data in SQL to be optimized.]

Running the Data Optimization Tool

The Data Optimization Tool prepares, optimizes, and fixes analytics data stored in a Sitecore DMS SQL Server database. You need to run the Sitecore Analytics Data Optimization Tool before you upgrade Sitecore 7.2 analytics data to be compatible with the xDB collection database (MongoDB).

To run the data optimization tool:

  1. Open a command prompt window or Windows PowerShell.
  2. In the command prompt, enter a command that includes the SQL Server connection string parameter to the analytics database that you want to analyze. Required parameter format using SQL Server authentication:
    <PATH_TO_THE_TOOL>\DataOptimizationTool.exe “user id=<USER>; password=<PASSWORD>; Data Source=<SERVER_NAME>; Database=<DB_NAME>” Example:
    C:\dmsdot\DataOptimizationTool.exe “user id=sa; password=MyPassword; Data Source=.; Database=My_DMS20_DB”
  3. While the tool is running you can see the progress of the analysis.1
  4. If the Data Optimization Tool has finished without errors you can move on to the database conversion process. If you get errors, see the next section for some possible solutions.

Handling Data Optimization Errors

During the running of the Data Optimization Tool you may get errors. This section includes guidance on how to handle the following errors.

Note: Please refer the section 3.3 of the following PDF to handle the Data Optimization errors:

https://sdn.sitecore.net/upload/sitecore7/75/xdb_upgrade_and_conversion_guide_sc75-a4.pdf

 2. Database Conversion

Follow the steps in this chapter to convert historical data from an earlier version of Sitecore to be compatible with the Sitecore Experience Platform.

When you upgrade to the latest version of Sitecore you also need to use the Sitecore Analytics Conversion Tool (command line tool) to convert your historical analytics data. The conversion tool copies and merges historical analytics data from a Sitecore DMS SQL Server database to the MongoDB collection database.

Setting up the Sitecore Analytics Conversion Tools

  1. Download the Sitecore Analytics Conversion Tool zip file from the same page as the latest version of Sitecore 7.5 on SDN.
  2. Unzip the Sitecore Analytics Conversion Tool to a local folder on your computer.
  3. Copy your Sitecore license.xml file to the same folder as the conversion tool. If you forget to do this the tool will throw the following exception: Sitecore.SecurityModel.License.LicenseException: License is not valid: Not Present

Batch Aggregation8.0 Update-4 brings bulk aggregation. That should help getting experience analytics data in much faster. In the preliminary tests 15x boost in some cases. In App_Config\Include\Sitecore.Analytics.Processing.Aggregation.Services.config configure the historyWorker like:

  • Set MaximumBatchSize values bigger, for example 256 or 512 each.
  • Set MaxThreads according to possibilities of your servers, e.g. 40 as you had it.
  • Please carefully review the performance of your SQL and Sitecore servers while it is rebuilding. The more MaximumBatchSize and MaxThreads, the more load Sitecore makes on the SQL side.

Connection String Parameters

When you run the conversion tool you need to add the correct connection string parameters using the command line. This section contains guidance on the correct connection parameters to use for conversion.

The full command including connection parameters is written in the following format:

<PATH_TO_CONVERSION_TOOL>\ConversionTool.bat /sqlconnection “<DMS2.0 connection string>” /xdbconnection “xDB connection string” [optional_parameters]

Running the Conversion Tool

To run the conversion tool:

Note: If you are using ECM and WFFM Module, please make sure you run this tool for those tools as well.

  1. Open a command prompt window or Windows PowerShell.
  2. In the command prompt enter the following command. Ensure that you enter the correct connection string parameters:

<PATH_TO_CONVERSION_TOOL>\ConversionTool.bat /sqlconnection “<DMS2.0 connection string>” /xdbconnection “xDB connection string” [optional_parameters]

Example:
C:\ConversionTool.bat /sqlconnection “user id=sa; password=MyPassword; Data Source=.; Database=My_DMS20_DB” /xdbconnection “mongodb://localhost/My_xDB_DB” /threads 20

If the conversion tool is running normally from start to finish your output should look something like this:

1

Configuring the Conversion Tool Across Multiple Servers: If there are not enough resources on a single computer to complete the conversion process in a reasonable time or if you want to increase conversion speed, you can run the Sitecore Analytics Conversion Tool concurrently on multiple computers, provided they all connect to the same two databases. When new instances start, they reschedule the work dynamically.

If you follow this approach then we recommend you install SQL Server and MongoDB on separate server instances servers and run several instances of the Sitecore Analytics Conversion Tool. If you run multiple instances of the conversion tool you should ideally run them on different computers.

3. Execute the DMS75_MigrateDefinitions.sql script on your target Sitecore 7.5 primary reporting database. This script copies all the definition data from your source Sitecore 7.2 database to your target Sitecore 7.5 database. Definition data includes page event definitions, traffic types, visitor classifications and campaigns.

Note: Before you run the script update the @sourceDatabase reference at the top of the script, to specify the name of your source Sitecore 7.2 database.

3. Executing HashStoredIP.exe

As Sitecore 8.x doesn’t store the actual IP address but stores Hash IP, hence we need to execute this tool before rebuilding reporting database. This tool will Hash the IP addresses stored in MongoDB.

The standalone console application ‘HashStoredIPs.exe’ is used to hash existing IP addresses in the database. The tool connects directly to the database and can be executed from anywhere if it has access to the database.

Note: The tool can be downloaded from:

https://dev.sitecore.net/Downloads/Sitecore_Experience_Platform/Sitecore_81/Sitecore_Experience_Platform_81_Initial_Release.aspx

Example of usage:

HashStoredIPs.exe -update -status -connectionstring:”mongodb://user:password@localhost/analytics?authSource=admin” -salt:”f54RYb4RT4ylUn” > log.txt

– where

<ConnectionString> is the quoted connection string.
<Salt> is string which is added to the IP during hashing process. Please note that the same salt must be defined in configuration of geoIpManager. Due to security reasons it is recommended to have salt at least 6 characters.

Specify the -status argument to scan through the target database to detect the number of IP addresses that are not currently hashed.
Specify the -update argument to hash IP addresses in the target database.

You can specify both the status and update argument if you want to analyse and update the database.

Note: Remember to use double quotes for the connection string if it contains spaces.

It is safe to abort the tool using CTRL+C any time.

4. Rebuilding Reporting Database

Rebuild of the reporting database is the re-processing of interactions that have already been aggregated into the reporting database for use by Sitecore reporting applications. You need to perform this as the final step of the conversion process to merge historical data with live data making this data available to Sitecore reporting applications.

Reasons for rebuilding the reporting database:

  • After using the Sitecore Analytics Conversion Tool to populate the reporting database with analytics information from an earlier version of Sitecore.
  • After having amended information in the collection database, in order to reflect the amendments in reports when looking at older data. Such amendment can be for example assigning channels to referring sites.
  • If you have re-classified a search key word or traffic type, aggregated report data is not updated automatically.
  • If the reporting database has been lost or is irrecoverably out of sync with the collection database, for example, due to a disaster or if the details of two contacts have been merged.
  • In the Executive Insight Dashboard and other Sitecore reporting applications it is possible to reclassify data that has already been processed by the aggregation layer. This could cause the reporting database to become out of sync with the collection database.

Note: The process is semi-automated but also requires an administrator to attach, detach, or replace databases in SQL Server and also to modify some configuration files.
Note: To ensure that the reporting database contains the latest data changes, you need to rebuild it from time to time. Be aware that when you rebuild the reporting database, you overwrite its contents. For best results, we recommend that you use a clean copy of the Sitecore_Analytics database every time you perform a rebuild.

Please refer the URL mentioned below for rebuilding the reporting database:

https://doc.sitecore.net/sitecore_experience_platform/setting_up_and_maintaining/xdb/server_considerations/walkthrough_rebuilding_the_reporting_database

Hope this helps.

You can also refer the URLs mentioned below for further information and post conversion errors etc:

https://community.sitecore.net/developers/f/9/t/2956

https://kb.sitecore.net/articles/977445

https://doc.sitecore.net/sitecore_experience_platform/setting_up_and_maintaining/xdb/server_considerations/reasons_for_rebuilding_the_reporting_database

http://stackoverflow.com/questions/29155209/ rebuilding-reporting-database-sitecore-8

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Powered by WordPress.com.

Up ↑

%d bloggers like this: