How to delete files/blobs from #AzureSearch index?

Photo by Evgeni Tcherkasski on Unsplash

On one of my #AzureSearch projects I was using Azure Storage as data source. The purpose of the project was to search within Azure Storage file contents. My expectations from the #AzureSearch indexing service was to be able to track deleting files; however it was not the case. I had to investigate and see what is the root cause and what is the solution. Here is my findings:

Problem

The #AzureSearch index returns results from "Deleted Files" despite the files are deleted from Azure Storage and Indexer is reset.

Solution 1: Enable Delete Tracking in Data Source

In this Stackoverflow post, you will see the following explanation

"Only deletion policy currently supported by Azure search is Soft Delete.

To enable this for BLOB storage you have to create a metadata value on each BLOB (e.g. IsDeleted) and update this value to enable it to be captured by the Deletion policy."

PUT https://[service name].search.windows.net/datasources/blob-datasource?api-version=2016-09-01
Content-Type: application/json
api-key: [admin key]
{
"name" : "blob-datasource",
"type" : "azureblob",
"credentials" : { "connectionString" : "" },
"container" : { "name" : "my-container", "query" : "my-folder" },
"dataDeletionDetectionPolicy" : {
"@odata.type" :"#Microsoft.Azure.Search.SoftDeleteColumnDeletionDetectionPolicy",
"softDeleteColumnName" : "IsDeleted",
"softDeleteMarkerValue" : "true"
}
}
More details are in this documentation page.

Solution 2: Delete blobs from Index Using API/SDK

The next solution is to delete files/blobs from index. It is easy and all you will need to do what API you should call. Then provide right parameters to it. I personally use POSTMAN to execute queries.
As mentioned in the documentation, You can upload, merge or delete documents from a specified index using HTTP POST. For large numbers of updates, batching of documents (up to 1000 documents per batch, or about 16 MB per batch) is recommended and will significantly improve indexing performance.
{
"value": [
{
"@search.action": "upload (default) | merge | mergeOrUpload | delete",
"key_field_name": "unique_key_of_document", (key/value pair for key field from index schema)
"field_name": field_value (key/value pairs matching index schema)
...
},
...
]
}
Running Multiple APIs in one go
There was a situation in which I wanted to execute more than 1 API call in POSTMAN. You can watch this Youtube Video and learn how to execute multiple API calls in one go.



Comments

Popular Posts