SharePoint to AEM Migration

Migrating Sites from SharePoint to AEM

Introduction

Enterprise-scale content migrations are far more complex than simple file transfers — they require automation, governance, and intelligence.

This post shares how we migrated SharePoint-hosted agency/department websites into a unified Adobe Experience Manager (AEM) platform — a massive consolidation effort involving over hundreds of thousands of documents and tens of thousands of content pages.

The focus here is on the technical implementation, automation pipelines, and AI integration that made this possible.

Step 1: Discovery & Analysis

Technical Audit

  • Mapped a large number of SharePoint subdomains (e.g., department1.site.com, department2.site.com).
  • Extracted site metadata and content structures using SharePoint REST APIs.
  • Identified document libraries, embedded assets, and internal navigation dependencies.

Content Audit

  • Classified content types: services, policies, FAQs, announcements, and forms.
  • Flagged outdated or duplicated material.
  • Conducted stakeholder workshops across agencies to capture content priorities.

Step 2: Build Custom Migration Utilities

There’s no direct tool to migrate content from SharePoint to AEM — so we developed a custom ETL (Extract, Transform, Load) pipeline.

2.1 Extraction (Python + SharePoint REST API)

We extracted data using Python and the SharePoint REST endpoints:

import requests

def extract_content(site_url, list_name, access_token):
    endpoint = f"{site_url}/_api/web/lists/getbytitle('{list_name}')/items"
    headers = {"Authorization": f"Bearer {access_token}", "Accept": "application/json;odata=verbose"}
    response = requests.get(endpoint, headers=headers)
    response.raise_for_status()
    return response.json()['d']['results']

Features:

  • Incremental extraction using modified timestamps.
  • Recursive folder and subsite traversal.
  • JSON metadata output for transformation.

2.2 Transformation

All SharePoint content was normalized into AEM-compatible JSON schemas.

def transform_to_aem_format(item, agency):
    return {
        "jcr:primaryType": "cq:Page",
        "jcr:title": item["Title"],
        "body": sanitize_html(item.get("Body", "")),
        "cq:tags": [f"state:agency/{agency.lower()}"],
        "publishDate": item["Created"],
        "sourceSystem": "SharePoint",
        "migratedBy": "MigrationBot"
    }

Enhancements:

  • Sanitized HTML to remove legacy SharePoint markup.
  • Preserved author, creation date, and metadata.
  • Injected tag namespaces and structured taxonomy.

2.3 Upload to AEM

Used the Sling POST Servlet and AEM Assets HTTP API to import transformed content:

curl -u admin:admin \
  -F "file=@content.json" \
  -F "jcr:primaryType=nt:file" \
  https://state.com/api/assets/content/agency/health/

Pipeline Features:

  • Parallel uploads with retry logic.
  • Checksum validation for every upload.
  • Logging of source-target mappings.

Step 3: Restructure Site Architecture

Before: https://department.sharepoint.com
After:

  • https://sharepoint.com/department/<department-name>
  • https://sharepoint.com/services/<service-name>

Outcome:

  • Consistent design and content governance.
  • Better SEO performance.
  • Unified navigation experience.

Step 4: Metadata & Tagging

Technical Implementation

  • Tag namespaces: site:agency, site:services, topic, audience
  • Automated tag assignment during transformation
  • Embedding tags into AEM templates

Example Tag Matrix

TitleTags
Renew Driver’s Licensesite:departments/transport, site:services/driver-license
COVID-19 Resource Guidesite:agency/medical, topic:public-health, audience:residents

AI Tagging

  • Smart tagging via AEMaaCS
  • Contextual tags for unstructured content

Architecture Overview

The following diagram illustrates the end-to-end technical workflow implemented during the migration:

Architecture Layers

  • SharePoint Source Repositories: Unique content sites.
  • ETL Pipeline: Python-based utilities for extraction, transformation, and load.
  • Metadata & Tagging: AI-driven tagging with controlled vocabulary enforcement.
  • Quality Assurance: Validation for accessibility, SEO, and link integrity.
  • Unified AEM Platform: Consolidated site with reusable components and templates.

Step 5: Batch Migration & Automation

  • CLI pipeline: Extract → Transform → Tag → Upload → Validate
  • Batch size: 5–8 agencies per wave, ~2–3 weeks each
  • Automated QA: checksum validation, content diff reports, tag coverage dashboards

This automation pipeline handled each agency migration batch efficiently.

Automation Steps

  1. Extract – Pull metadata and assets from SharePoint.
  2. Transform – Convert content into AEM-compatible JSON.
  3. Tag – Assign metadata and topic tags.
  4. Upload – Push data to AEM using API endpoints.
  5. Validate – Perform QA and checksum verification.

Each batch processed 5–8 agencies, with full QA reports generated automatically.


Step 6: QA & Optimization

Technical QA

  • Link integrity checks across numerous sites
  • Lighthouse performance audits
  • Browser and device testing

Content QA

  • Accessibility (WCAG 2.1 AA)
  • Editorial review in staging
  • SEO optimization (titles, meta descriptions, canonical URLs)

Results

OutcomeMetric / Benefit
Unified AEM PlatformDozens of agencies, Tens of thousands of pages, 250k assets consolidated
Search & Discoverability30% improvement
Metadata GovernanceStandardized tagging, controlled vocabulary
Editorial EfficiencyReusable templates and components
Resident ExperienceConsistent UX and navigation

Key Takeaways

  1. Plan metadata and taxonomy before migration
  2. Batch agencies logically based on readiness
  3. Automate extraction, transformation, and QA
  4. Use AI for tagging, clustering, and summarization
  5. Governance and version control are essential


Discover more from The Modern Enterprise Insights by Sachin Magon

Subscribe to get the latest posts sent to your email.


Comments

Leave a comment