However, because the current example uses oauth2, there is one prerequisite that must be fulfilled - bearer token to be passed on a design time. This token is necessary to get authenticated during schema import, just because Azure Data Factory makes a call to API to get a sample data for further parsing and extraction of the schema. Because of this, following window will popup and it expects token to be entered:
To obtain the access token run a debug execution (1) and copy it from an output window (2) of a Login activity.
When the token is copied to a clipboard click on “Import schemas” again, paste the token in a popup window. As the result, the schema of a hierarchical format is imported. In the current example it contains:
__next(4) that holds a URL to another page. Later this value will be used for a pagination
As soon as schema imported a few more things to be finalized:
resultsand remove columns that do not have to be imported (3).
__nextfrom the mapping, it should not be included in the final export (4).
This time Copy activity loads 22 kb of data or 500 rows which is a right step forward:
However, the data source contains way more than 500 rows and the details page shows that only one object was read during fetching from API. This means that only a single page was processed:
Normally, REST API limits its response size of a single request under a reasonable number; while to return large amounts of data, it splits the result into multiple pages and requires callers to send consecutive requests to get the next page of the result. Therefore, to get all rows pagination to be configured for a source data store.
A second execution shows more realistic stats - 26 REST API calls (or pages) loaded into 12751 rows:
The second piece of the pipeline – a Copy activity now is finally looking complete. It does not just establishes connections with a REST data source, also it fetches all expected rows and transforms them from a hierarchical into a tabular format.
In the next post of the series - Azure Data Factory and REST APIs - Managing Pipeline Secrets by a Key Vault - is a time to touch another important aspect – security and a management of sensitive data like client secrets or passwords.
Many thanks for reading.