Implementation Guide

This series describes the technical implementation steps to generate and retrieve a dataset. The dataset generation process is a series of steps using Data API endpoints. Once generated, the dataset can be downloaded, or used with Reports API to render a completed report, depending on the dataset type.

A dataset is a set of data files (sometimes many thousands of files) containing aggregated results and analysis in JSON and CSV formats. The exact contents and format of a dataset will vary depending on the dataset type. The dataset is generated asynchronously using Data API. Once completed, the dataset can be retrieved via Data API or rendered in-browser using Reports API, if the dataset type supports it.

Report generation is performed asynchronously as follows:

  1. Initialize a new dataset using Data API: Create and configure a new dataset. The endpoint returns a dataset_id and one or more URLs to which input files need to be uploaded.

    If input files are not required to be uploaded, the dataset_type and a job_reference are returned, and you can proceed directly to step 3 below.

  2. Upload input files:

    1. Upload input files: Upload the input files required for the dataset. The data is uploaded to a specific URL returned from step 1.
    2. Commence dataset generation job using Data API: Notify Data API that input files have been uploaded and kick off the dataset generation job.

  3. Poll for job completion using Data API: The status of the dataset generation job can be obtained by polling the /jobs endpoint.
  4. Retrieve results via Data API or Reports API (if applicable): On completion, raw report data can be retrieved by your server application using Data API. If the dataset type has a corresponding report type in Reports API, the report can render the new dataset, and raw data can also be accessed from Reports API's public methods.