Detecting and Removing Duplicate SharePoint Online Files using PnP PowerShell

Loading

Duplicate files in SharePoint Online can cause:
Storage bloat
Versioning confusion
Compliance issues

Using PnP PowerShell, we can detect and remove duplicate files efficiently.


Step 1: Connect to SharePoint Online

$siteUrl = "https://yourtenant.sharepoint.com/sites/YourSite"
Connect-PnPOnline -Url $siteUrl -Interactive
Write-Host " Connected to SharePoint Online"

✔ Establishes a secure connection.


Step 2: Retrieve Files from a Library

$libraryName = "Documents"
$files = Get-PnPListItem -List $libraryName -Fields "FileLeafRef", "FileRef"

Write-Host " Retrieved files from '$libraryName'"

✔ Fetches all file names and paths.


Step 3: Identify Duplicate Files by Name

$duplicates = $files | Group-Object -Property FileLeafRef | Where-Object { $_.Count -gt 1 }

If ($duplicates) {
Write-Host " Duplicate files found:"
$duplicates | ForEach-Object { Write-Host $_.Name }
} Else {
Write-Host " No duplicates found."
}

✔ Detects files with duplicate names.


Step 4: Detect Exact Duplicates by File Hash

Function Get-FileHashFromUrl($fileUrl) {
$fileBytes = Invoke-PnPRequest -Url $fileUrl -Method Get
return (Get-FileHash -InputStream ($fileBytes.BaseStream) -Algorithm SHA256).Hash
}

$hashTable = @{}
$duplicateHashes = @()

foreach ($file in $files) {
$fileUrl = "https://yourtenant.sharepoint.com" + $file["FileRef"]
$fileHash = Get-FileHashFromUrl $fileUrl

If ($hashTable.ContainsKey($fileHash)) {
$duplicateHashes += $file
} Else {
$hashTable[$fileHash] = $file
}
}

If ($duplicateHashes) {
Write-Host " Exact duplicate files found:"
$duplicateHashes | ForEach-Object { Write-Host $_["FileRef"] }
} Else {
Write-Host " No exact duplicates found."
}

✔ Compares file contents using SHA-256 hashes.


Step 5: Remove Duplicate Files

foreach ($duplicate in $duplicateHashes) {
Remove-PnPListItem -List $libraryName -Identity $duplicate.Id -Recycle
Write-Host " Deleted: $($duplicate['FileRef'])"
}
Write-Host " Duplicate files removed and moved to Recycle Bin."

✔ Moves duplicate files to the Recycle Bin.


Step 6: Automate Duplicate File Cleanup

Schedule a PowerShell script to run periodically:

$taskName = "SharePoint Duplicate Cleanup"
$scriptPath = "C:\Scripts\RemoveDuplicateFiles.ps1"

$action = New-ScheduledTaskAction -Execute "PowerShell.exe" -Argument "-File $scriptPath"
$trigger = New-ScheduledTaskTrigger -Weekly -DaysOfWeek Sunday -At 2AM
Register-ScheduledTask -TaskName $taskName -Action $action -Trigger $trigger -User "SYSTEM" -RunLevel Highest

Write-Host " Automated duplicate file cleanup scheduled."

✔ Ensures ongoing cleanup.

Leave a Reply

Your email address will not be published. Required fields are marked *