Duplicate files in SharePoint Online can cause:
Storage bloat
Versioning confusion
Compliance issues
Using PnP PowerShell, we can detect and remove duplicate files efficiently.
Step 1: Connect to SharePoint Online
$siteUrl = "https://yourtenant.sharepoint.com/sites/YourSite"
Connect-PnPOnline -Url $siteUrl -Interactive
Write-Host " Connected to SharePoint Online"
✔ Establishes a secure connection.
Step 2: Retrieve Files from a Library
$libraryName = "Documents"
$files = Get-PnPListItem -List $libraryName -Fields "FileLeafRef", "FileRef"
Write-Host " Retrieved files from '$libraryName'"
✔ Fetches all file names and paths.
Step 3: Identify Duplicate Files by Name
$duplicates = $files | Group-Object -Property FileLeafRef | Where-Object { $_.Count -gt 1 }
If ($duplicates) {
Write-Host " Duplicate files found:"
$duplicates | ForEach-Object { Write-Host $_.Name }
} Else {
Write-Host " No duplicates found."
}
✔ Detects files with duplicate names.
Step 4: Detect Exact Duplicates by File Hash
Function Get-FileHashFromUrl($fileUrl) {
$fileBytes = Invoke-PnPRequest -Url $fileUrl -Method Get
return (Get-FileHash -InputStream ($fileBytes.BaseStream) -Algorithm SHA256).Hash
}
$hashTable = @{}
$duplicateHashes = @()
foreach ($file in $files) {
$fileUrl = "https://yourtenant.sharepoint.com" + $file["FileRef"]
$fileHash = Get-FileHashFromUrl $fileUrl
If ($hashTable.ContainsKey($fileHash)) {
$duplicateHashes += $file
} Else {
$hashTable[$fileHash] = $file
}
}
If ($duplicateHashes) {
Write-Host " Exact duplicate files found:"
$duplicateHashes | ForEach-Object { Write-Host $_["FileRef"] }
} Else {
Write-Host " No exact duplicates found."
}
✔ Compares file contents using SHA-256 hashes.
Step 5: Remove Duplicate Files
foreach ($duplicate in $duplicateHashes) {
Remove-PnPListItem -List $libraryName -Identity $duplicate.Id -Recycle
Write-Host " Deleted: $($duplicate['FileRef'])"
}
Write-Host " Duplicate files removed and moved to Recycle Bin."
✔ Moves duplicate files to the Recycle Bin.
Step 6: Automate Duplicate File Cleanup
Schedule a PowerShell script to run periodically:
$taskName = "SharePoint Duplicate Cleanup"
$scriptPath = "C:\Scripts\RemoveDuplicateFiles.ps1"
$action = New-ScheduledTaskAction -Execute "PowerShell.exe" -Argument "-File $scriptPath"
$trigger = New-ScheduledTaskTrigger -Weekly -DaysOfWeek Sunday -At 2AM
Register-ScheduledTask -TaskName $taskName -Action $action -Trigger $trigger -User "SYSTEM" -RunLevel Highest
Write-Host " Automated duplicate file cleanup scheduled."
✔ Ensures ongoing cleanup.