Restiefrom SopherApps

Details

  • Name Restie
  • Summary A CLI app that extracts data from any REST API serving XML, JSON (etc), transforms it into Client style JSON and sends it to a PubSubQ instance to publish to its subscribers
  • Price (Monthly) 1,000,000 UGX
  • Free Trial 7 days
  • Category ETL
  • Tags etl rest pubsub
  • Download

Description

This is a simple app that queries a given REST API at a given interval (including once at the start), transforms the data returned into client-style JSON and pushes it over to a PubSubQ instance. It handles REST APIs that return XML, JSON, or CSV.

What is Client-style JSON?

Client-style JSON is the kind of JSON where a given dataset is represented by an object instead of an array. The object's keys are composite keys constructed from a list of primary key fields. The object's values the records. This has the advantage of ensuring client apps don't do any extra processing in case they are to display the data in list-like components like charts, tables, lists etc.

The client-style JSON can also have multiple datasets but they also will be in an object where the keys are the names of the datasets and the values are the client-style JSON for those datasets.

Client-style JSON has three main keys: - "data" - where the data is found - "meta" - where any extra information like the source, the primary keys used etc. are found. - "isMultiple" - to distinguish between data that has multiple datasets and data that has a single dataset

Here are examples of client-style JSON

{
  "isMultiple": false,
  "meta": {
    "source": "https://example.com/volume",
    "primaryFields": ["date", "time"],
    "separator": "--",
    "cron": "@every 1h30m"
  },
  "data": {
    "2021-07-23--11:00:00 GMT": {
      "amount": 90.8,
      "date": "2021-07-23",
      "time": "11:00:00 GMT",
    },
    "2021-07-23--12:00:00 GMT": {
      "amount": 900.8,
      "date": "2021-07-23",
      "time": "12:00:00 GMT",
    },
    "2021-07-23--13:00:00 GMT": {
      "amount": 0.6,
      "date": "2021-07-23",
      "time": "13:00:00 GMT",
    },
    "2021-07-23--14:00:00 GMT": {
      "amount": 76.4,
      "date": "2021-07-23",
      "time": "14:00:00 GMT",
    }
  }
}
{
  "isMultiple": true,
  "meta": {
    "source": "https://example.com/purchases",
    "primaryFields": ["date", "time"],
    "groupBy": "data_type",
    "separator": "&^",
    "cronPattern": "@every 0h30m"
  },
  "data": {
    "volume": {
      "2021-07-23&^11:00:00 GMT": {
        "amount": 90.8,
        "date": "2021-07-23",
        "time": "11:00:00 GMT",
        "data_type": "volume"
      },
      "2021-07-23&^12:00:00 GMT": {
        "amount": 900.8,
        "date": "2021-07-23",
        "time": "12:00:00 GMT",
        "data_type": "volume"
      },
    },
    "price": {
      "2021-07-23&^11:00:00 GMT": {
        "price": 3000,
        "date": "2021-07-23",
        "time": "11:00:00 GMT",
        "data_type": "volume"
      },
      "2021-07-23&^12:00:00 GMT": {
        "price": 4500,
        "date": "2021-07-23",
        "time": "12:00:00 GMT",
        "data_type": "volume"
      }
    }
  }
}



How Are Data Pipelines Setup?

We use a JSON configuration file to define the data pipelines. By default, that configuration file is a JSON file called "restieConfig.json" expected to be found in the same folder as the running app.

However, any JSON configuration file of whatever name and path can be used. Its full path just needs to be passed to the app when running using the --config (or -c) flag e.g.

./restie run --config /home/admin/configs/etl.json

Here is a sample configuration file for Restie.

{
  "pipelines": [
    {
      "name": "volume_pipeline",
      "source": "https://example.com/volume",
      "sourceType": "XML",
      "httpMethod": "GET",
      "isMultiple": false,
      "cronPattern": "@every 1h30m",
      "timestampPattern": "YYYY-MM-DDThh:mm:ssZ",
      "datePattern": "YYYY-MM-DD",
      "timezone": "Africa/Cairo",
      "dataPath": ["response", "responseBody", "responseList"],
      "recordTag": "record",
      "primaryFields": ["date", "time"],
      "separator": "--",
      "queryParams": {
        "startDatetime": "NOW - 0Y0M0D0h0m0s0ms",
        "endDatetime": "NOW + 0Y0M0D1h0m0s0ms",
        "APIKey": "uyfoafayreyruhererhjkahjkhs",
      },
      "headers": {
        "Authorization": "Bearer uyfoafayreyruhererhjkahjkhs",
        "Host": "https://example.com"
      },
      "postData": {},
      "pubSubQUrl": "ws://localhost:8005",
      "pubSubQAuthData": {
        "username": "johndoe",
        "password": "badguyi"
      },
    },
    {
      "name": "purchases_pipeline",
      "source": "https://example.com/purchases",
      "sourceType": "JSON",
      "httpMethod": "POST",
      "isMultiple": true,
      "groupBy": "data_type",
      "dataPath": ["response", "responseBody", "responseList"],
      "recordTag": "",
      "cronPattern": "@every 0h30m",
      "timestampPattern": "YYYY-MM-DDThh:mm:ss",
      "datePattern": "YYYY-MM-DD",
      "timezone": "Africa/Kampala",
      "primaryFields": ["date", "time"],
      "separator": "&^",
      "queryParams": {
        "startDatetime": "NOW",
        "endDatetime": "NOW + 0Y0M0D0h30m0s0ms",
      },
      "headers": {
        "Authorization": "Bearer uyfoafayreyruhererhjkahjkhs",
        "Content-Type": "application/json"
      },
      "postData": {
        "endDate": "TODAY + 0Y0M3D",
        "endMonth": "CURRENT_MONTH + 0Y2M",
        "endYear": "CURRENT_YEAR + 5Y",
      },
      "pubSubQUrl": "ws://localhost:8006"
    }
  ]
}

The possible configuration properties can be broken down as follows:

Config Property Type Required Description
name text (no spaces, single line) Yes The name of the pipeline. It will be the same as the message type published to in the PubSubQ
source URL Yes The REST API endpoint from which to get the data. No query parameters should be part of it
sourceType text: any of "XML", "CSV", or "JSON" Yes The type of response to be expected from the REST API endpoint.
httpMethod text: any of "GET", "POST" Yes The HTTP method the REST API endpoint is to be accessed by
isMultiple boolean (true or false) Yes Whether the data returned by the endpoint has multiple datasets in it basing on the perspective of the client. For example a client might need data grouped by author, and so the returned data will have a dataset for each author.
groupBy text (no spaces, single line) Yes if isMultiple is true, No otherwise The field on each data record to use to group the data into multiple datasets. For the example above, it might be "authorId"
cronPattern text of "@each {}h{}m" or "* * * * * * *" form. See below for more details Yes The Cron setting to control when this data pipeline is to be run, repetitively. The app has an internal cron runner.
timestampPattern text with YYYY, MM, DD, hh, ss, mm, zz, z, or Z Yes The timestamp pattern to use when constructing any time-based query params, post data or headers on each cron run. There is a special config syntax for such time-based data. See below.
datePattern text with YYYY, MM, DD Yes The date pattern to use when constructing any date-based query params, post data or headers on each cron run. There is a special config syntax for such time-based data. See below.
timezone text (valid timezone names) Yes The timezone in which the cron is to be run
dataPath list of text Yes if sourceType is XML, No otherwise The list of fields to follow along XML or JSON REST responses so as to get to the actual data.
recordTag text Yes if sourceType is XML, No otherwise The name of the XML tag that holds each individual record.
primaryFields list of text Yes The list of fields that would be used to collectively uniquely identify a given record in a dataset.
separator text No The string to be used when creating a single key out of the primary key values of a given record so as to create a unique key for the Client-style JSON
headers map of text keys and text values No The headers to be sent along during the REST API request. Some headers may be time-dependent or date-dependent. To cater for those, there is a special syntax to use in the configuration. See below.
queryParams map of text keys and text values No The query parameters to be sent along during the REST API request. Some query parameters may be time-dependent or date-dependent. To cater for those, there is a special syntax to use in the configuration. See below.
postData map of text keys and any primitive type values e.g. integers, strings No The data to be sent in the POST body if httpMethod is "POST". Some properties in this data may be time-dependent or date-dependent. To cater for those, there is a special syntax to use in the configuration. See below.
pubSubQUrl Websocket URL Yes The websocket base URL of the PubSubQ instance to which this Restie app will publish its data so that the data can be availed in realtime to all other apps that need it.
pubSubQAuthData map of text keys and text values No The auth data to be passed to a secured PubSubQ instance to authenticate with it before publishing to it.



How Do We Configure Time-Dependent and Date-Dependent Headers, Query Parameters or Post Data

We understand that certain REST API requests might have headers, POST data or query parameters that depend on time. We have provided an option to have such values in the form below (the square brackets mean the things inside are optional)

KEYWORD [OPERATOR OPERAND]

The KEYWORD can be any of: NOW, CURRENT_QUARTER_HOUR, CURRENT_HALF_HOUR, CURRENT_HOUR, TODAY, CURRENT_MONTH, CURRENT_YEAR

The optional OPERATOR can be any of: +, -

The optional OPERAND can be in the form (the square brackets mean the things inside are optional):

[<Years>Y][<months>M][<days>D][<hours>h][<minutes>m][<seconds>s][<mlliseconds>ms]

e.g.

For: 20 years, 5 months, 2 days, 60 hours, 45 minutes, 13 seconds and 67 milliseconds

20Y5M2D60h45m13s67ms

For: 20 years and 5 months

20Y5M

A sample configuration of a REST API endpoint that always sends the current half hour (current_half_hour), the start date (start_date) as the previous day's date and the end date (end_date) as the date of the day five days from now via query parameters might look something like:

Some required properties have been removed for brevity

{
  "pipelines": [
    {
      "source": "http://example.com",
      "timezone": "Africa/Kampala",
      "httpMethod": "GET",
      "queryParams": {
        "current_half_hour": "CURRENT_HALF_HOUR",
        "start_date": "TODAY - 1D",
        "end_date": "TODAY + 5D"
      }
    }
  ]
}

If today were 2021-07-26 and the time now were 11:00:00EAT, the resulting URL would be:

http://example.com?current_half_hour=23&start_date=2021-07-25&end_date=2021-07-31



How Do We Configure the Timing of the Data Pipelines

We use the cronPattern property to control the time intervals when a given REST API endpoint is to be queried. It deals with repetitive REST calls such as: "every Wednesday at 11:00 am EAT (*)", "every minute"

The cronPattern has two possible formats:
The ******* Pattern:

This is the usual "*****" pattern but with two extra "*" for second and year. Note: It does not support the */number syntax e.g. */7 Note: day of week is from 0 (Sunday) to 6 (Saturday)

The pattern is of the form:

<second>|* <minute>|* <hour>|* <day>|* <month>|* <dayOfWeek>|* <year>|*

e.g.

For: every Wednesday at 11:00:05

5 0 11 * * 3 *

A sample configuration can look like:

Some required properties have been left out for brevity

{
  "pipelines": [
    {
    "name": "sample_pipeline",
     "source": "http://example.com/api/v2",
    "cronPattern": "5 0 11 * * 3 *"
    }
  ]
}



The @every Pattern:

This is the simpler version of creating jobs that run say every hour or every minute or every second or every millisecond etc. Note: The shortest interval is a millisecond

The pattern is of the form:

@every [<hours>h][<minutes>m][<seconds>s][<milliseconds>MS]

e.g.

For: every 2 and a half hours and 30 seconds and 3 milliseconds

@every 2h30m30s3MS

For: every 3 seconds and 45 milliseconds

@every 30s45MS

For: every hour

@every 1h

A sample configuration can look like:

Some required properties have been left out for brevity

{
  "pipelines": [
    {
      "name": "sample_pipeline_2",
      "source": "http://example.com/api",
      "cronPattern": "@every 30m"
    }
  ]
}

Features

  • Pipelines Created in JSON fileUsing a file called "restieConfig.json" in the same folder as this app, you can create many data pipelines connecting to any number of REST APIs
  • Handles XML, CSV, JSON APISRestie is able to output clean JSON data with unique IDs for JavaScript Clients to use, all coming from any XML, JSON or CSV REST APIs
  • Realtime DataSince it is wired to connect to a Pub/Sub (PubSubQ), any number of subscribers can received the clean Client-style JSON as soon as it is received by Restie
  • Automated CronjobsRestie allows you to create REST API data pipelines that repeat automatically basing on your configuration in the JSON config file under the key "cronPattern"
  • Helpful Error MessagesYou don't have to worry about making errors in the JSON config file. The application will show you where your error is when you attempt running it.
  • CLIRestie is a fully-fledged CLI app with a help section. You can run 'restie help' to see all options
  • Light-weightIt is a very small app of ~10MBs
  • Run in the BackgroundSince "v0.0.1-alpha.3", it is possible to run the CLI in the background, while logging output to a file.

Quick Start

  • Download the Chrome Websockets browser extension
  • Download the PubSubQ app for your operating system and architecture (note that most computers have AMD64 architecture. If it does not work, you can always download the ARM version).
Operating System Architecture Download
Windows AMD64 Download
Windows ARM Download
MacOS AMD64 Download
MacOS ARM64 Download
Linux AMD64 Download
Linux ARM64 Download
  • Start the PubSubQ instance on port 8080 (You can choose whatever port you want)

For Linux and Mac:

./pubsubq run -p 8080

For Windows:

pubsubq.exe run -p 8080

Add the ws://localhost:8080/receive/ to the Server URL input

  • Click Connect button

  • Download the Restie app for your operating system and computer architecture.

Operating System Architecture Download
Windows AMD64 Download
Windows ARM Download
MacOS AMD64 Download
MacOS ARM64 Download
Linux AMD64 Download
Linux ARM64 Download
  • Create a file restieConfig.json in the same directory that your restie app was downloaded to.
touch restieConfig.json
  • Copy, paste and save the following JSON into the restieConfig.json file
{
  "pipelines": [
     {
        "name": "cet_time",
        "source": "http://worldtimeapi.org/api/timezone/CET",
        "sourceType": "JSON",
        "httpMethod": "GET",
        "isMultiple": false,
        "cronPattern": "@every 0h1m",
        "timestampPattern": "YYYY-MM-DDThh:mm:sszz",
        "datePattern": "YYYY-MM-DD",
        "dataPath": [],
        "timezone": "GMT",
        "primaryFields": ["timezone", "datetime"],
        "separator": "--",
        "pubSubQUrl": "ws://localhost:8080"
      }
   ]
}
  • Run the restie app

For Linux and Mac

./restie run

For Windows:

restie.exe run
  • Go back to the Chrome websockets browser extension window and view the received messages text area. You should see new messages printed in the area every minute.

Messages Coming in from the PubSubQ

Since v0.0.1-alpha.3: You can also run restie as a daemon by passing it a --daemon (or -d) flag. By default, it will log to a restie.log file in the same folder as the running app. To change the log path, the --log (or -l) flag is provided with an absolute path file

To see more details of how this app is used, pass the --help (or -h) flag to run e.g.

For linux and MacOS

restie run --help

or for windows

restie.exe run --help

Downloads