azure solutions for analysing xml data

You can use Azure Blob Storage to upload that data to Azure. If that XML data is complex to analyze it you may have to transform it to a much simpler format like CSV format.

You can use Azure Data Factory to transform XML files to CSV files, then you can load all generated CSV files into corresponding tables in Azure SQL Data Warehouse via Polybase and a developed Stored Procedure activity on Azure Data Factory.

If you want to have a folder on BLOB storage for XML files not processed, another folder for XML files processed and another folder for the CSV files created and you want Azure Data Factory to move files from one folder to another during all the process, then you need to use HDInsight and create a map reduce activity for that purpose.

PolyBase allows to use T-SQL statements to access data stored in Hadoop or Azure Blob Storage and query it in an adhoc fashion. It also can let you query semi-structured data and join the results with relational data sets stored in SQL Data Warehouse. You should be able to create external tables pointing to data in Blob Storage using PolyBase. Then using a simple stored procedure you can select all the data in the external table and load them into the corresponding tables in SQL Data Warehouse.

Once all the transformed data is loaded into the corresponding tables, you can write stored procedures to generate the data for aggregation tables which you can then use to feed PowerBI dashboards.

Hope this helps.


Official documentation for Azure Data Lake's U-SQL say:

Parse unstructured data such as Web pages and emails, or semi-unstructured data such as XML/JSON.

Then you can combine that with Azure Stream Analytics to create alerts or connect to other Azure services like EventHub, etc..

Expensive, but quite reliable solution would be incorporate also Logic App with some connector to create workflow-like experience.


At the moment (2020-06), XML is still not supported by Azure Synapse (Azure DW).

A feasible approach is using Azure Functions to either parse the XML, or translate it to JSON (and then use SQL json_query/value). The second approach does mean losing Xquery functionality.

Classic Azure SQL Server (or it's hyperscale version) does support regular XML queries.