Migrating Sites from SharePoint to AEM

Introduction

Enterprise-scale content migrations are far more complex than simple file transfers — they require automation, governance, and intelligence.

This post shares how we migrated SharePoint-hosted agency/department websites into a unified Adobe Experience Manager (AEM) platform — a massive consolidation effort involving over hundreds of thousands of documents and tens of thousands of content pages.

The focus here is on the technical implementation, automation pipelines, and AI integration that made this possible.

Step 1: Discovery & Analysis

Technical Audit

Mapped a large number of SharePoint subdomains (e.g., department1.site.com, department2.site.com).
Extracted site metadata and content structures using SharePoint REST APIs.
Identified document libraries, embedded assets, and internal navigation dependencies.

Content Audit

Classified content types: services, policies, FAQs, announcements, and forms.
Flagged outdated or duplicated material.
Conducted stakeholder workshops across agencies to capture content priorities.

Step 2: Build Custom Migration Utilities

There’s no direct tool to migrate content from SharePoint to AEM — so we developed a custom ETL (Extract, Transform, Load) pipeline.

2.1 Extraction (Python + SharePoint REST API)

We extracted data using Python and the SharePoint REST endpoints:

import requests

def extract_content(site_url, list_name, access_token):
    endpoint = f"{site_url}/_api/web/lists/getbytitle('{list_name}')/items"
    headers = {"Authorization": f"Bearer {access_token}", "Accept": "application/json;odata=verbose"}
    response = requests.get(endpoint, headers=headers)
    response.raise_for_status()
    return response.json()['d']['results']

Features:

Incremental extraction using modified timestamps.
Recursive folder and subsite traversal.
JSON metadata output for transformation.

2.2 Transformation

All SharePoint content was normalized into AEM-compatible JSON schemas.

def transform_to_aem_format(item, agency):
    return {
        "jcr:primaryType": "cq:Page",
        "jcr:title": item["Title"],
        "body": sanitize_html(item.get("Body", "")),
        "cq:tags": [f"state:agency/{agency.lower()}"],
        "publishDate": item["Created"],
        "sourceSystem": "SharePoint",
        "migratedBy": "MigrationBot"
    }

Enhancements:

Sanitized HTML to remove legacy SharePoint markup.
Preserved author, creation date, and metadata.
Injected tag namespaces and structured taxonomy.

2.3 Upload to AEM

Used the Sling POST Servlet and AEM Assets HTTP API to import transformed content:

curl -u admin:admin \
  -F "file=@content.json" \
  -F "jcr:primaryType=nt:file" \
  https://state.com/api/assets/content/agency/health/

Pipeline Features:

Parallel uploads with retry logic.
Checksum validation for every upload.
Logging of source-target mappings.

Step 3: Restructure Site Architecture

Before: https://department.sharepoint.com
After:

https://sharepoint.com/department/<department-name>
https://sharepoint.com/services/<service-name>

Outcome:

Consistent design and content governance.
Better SEO performance.
Unified navigation experience.

Step 4: Metadata & Tagging

Technical Implementation

Tag namespaces: site:agency, site:services, topic, audience
Automated tag assignment during transformation
Embedding tags into AEM templates

Example Tag Matrix

Title	Tags
Renew Driver’s License	`site:departments/transport`, `site:services/driver-license`
COVID-19 Resource Guide	`site:agency/medical`, `topic:public-health`, `audience:residents`

AI Tagging

Smart tagging via AEMaaCS
Contextual tags for unstructured content

Architecture Overview

The following diagram illustrates the end-to-end technical workflow implemented during the migration:

Architecture Layers

SharePoint Source Repositories: Unique content sites.
ETL Pipeline: Python-based utilities for extraction, transformation, and load.
Metadata & Tagging: AI-driven tagging with controlled vocabulary enforcement.
Quality Assurance: Validation for accessibility, SEO, and link integrity.
Unified AEM Platform: Consolidated site with reusable components and templates.

Step 5: Batch Migration & Automation

CLI pipeline: Extract → Transform → Tag → Upload → Validate
Batch size: 5–8 agencies per wave, ~2–3 weeks each
Automated QA: checksum validation, content diff reports, tag coverage dashboards

This automation pipeline handled each agency migration batch efficiently.

Automation Steps

Extract – Pull metadata and assets from SharePoint.
Transform – Convert content into AEM-compatible JSON.
Tag – Assign metadata and topic tags.
Upload – Push data to AEM using API endpoints.
Validate – Perform QA and checksum verification.

Each batch processed 5–8 agencies, with full QA reports generated automatically.

Step 6: QA & Optimization

Technical QA

Link integrity checks across numerous sites
Lighthouse performance audits
Browser and device testing

Content QA

Accessibility (WCAG 2.1 AA)
Editorial review in staging
SEO optimization (titles, meta descriptions, canonical URLs)

Results

Outcome	Metric / Benefit
Unified AEM Platform	Dozens of agencies, Tens of thousands of pages, 250k assets consolidated
Search & Discoverability	30% improvement
Metadata Governance	Standardized tagging, controlled vocabulary
Editorial Efficiency	Reusable templates and components
Resident Experience	Consistent UX and navigation

Key Takeaways

Plan metadata and taxonomy before migration
Batch agencies logically based on readiness
Automate extraction, transformation, and QA
Use AI for tagging, clustering, and summarization
Governance and version control are essential

Migrating Sites from SharePoint to AEM

Introduction

Step 1: Discovery & Analysis

Technical Audit

Content Audit

Step 2: Build Custom Migration Utilities

2.1 Extraction (Python + SharePoint REST API)

2.2 Transformation

2.3 Upload to AEM

Step 3: Restructure Site Architecture

Step 4: Metadata & Tagging

Architecture Overview

Architecture Layers

Step 5: Batch Migration & Automation

Automation Steps

Step 6: QA & Optimization

Results

Key Takeaways

Discover more from The Modern Enterprise Insights by Sachin Magon

Comments

Leave a comment Cancel reply

Migrating Sites from SharePoint to AEM

Introduction

Step 1: Discovery & Analysis

Technical Audit

Content Audit

Step 2: Build Custom Migration Utilities

2.1 Extraction (Python + SharePoint REST API)

2.2 Transformation

2.3 Upload to AEM

Step 3: Restructure Site Architecture

Step 4: Metadata & Tagging

Architecture Overview

Architecture Layers

Step 5: Batch Migration & Automation

Automation Steps

Step 6: QA & Optimization

Results

Key Takeaways

Share this:

Discover more from The Modern Enterprise Insights by Sachin Magon

Comments

Leave a comment Cancel reply