SopherApps

Details

Name Restie
Summary A CLI app that extracts data from any REST API serving XML, JSON (etc), transforms it into Client style JSON and sends it to a PubSubQ instance to publish to its subscribers
Price (Monthly) 1,000,000 UGX
Free Trial 7 days
Category ETL
Tags etl rest pubsub
Download

Description

This is a simple app that queries a given REST API at a given interval (including once at the start), transforms the data returned into client-style JSON and pushes it over to a PubSubQ instance. It handles REST APIs that return XML, JSON, or CSV.

What is Client-style JSON?

Client-style JSON is the kind of JSON where a given dataset is represented by an object instead of an array. The object's keys are composite keys constructed from a list of primary key fields. The object's values the records. This has the advantage of ensuring client apps don't do any extra processing in case they are to display the data in list-like components like charts, tables, lists etc.

The client-style JSON can also have multiple datasets but they also will be in an object where the keys are the names of the datasets and the values are the client-style JSON for those datasets.

Client-style JSON has three main keys: - "data" - where the data is found - "meta" - where any extra information like the source, the primary keys used etc. are found. - "isMultiple" - to distinguish between data that has multiple datasets and data that has a single dataset

Here are examples of client-style JSON

{
  "isMultiple": false,
  "meta": {
    "source": "https://example.com/volume",
    "primaryFields": ["date", "time"],
    "separator": "--",
    "cron": "@every 1h30m"
  },
  "data": {
    "2021-07-23--11:00:00 GMT": {
      "amount": 90.8,
      "date": "2021-07-23",
      "time": "11:00:00 GMT",
    },
    "2021-07-23--12:00:00 GMT": {
      "amount": 900.8,
      "date": "2021-07-23",
      "time": "12:00:00 GMT",
    },
    "2021-07-23--13:00:00 GMT": {
      "amount": 0.6,
      "date": "2021-07-23",
      "time": "13:00:00 GMT",
    },
    "2021-07-23--14:00:00 GMT": {
      "amount": 76.4,
      "date": "2021-07-23",
      "time": "14:00:00 GMT",
    }
  }
}

{
  "isMultiple": true,
  "meta": {
    "source": "https://example.com/purchases",
    "primaryFields": ["date", "time"],
    "groupBy": "data_type",
    "separator": "&^",
    "cronPattern": "@every 0h30m"
  },
  "data": {
    "volume": {
      "2021-07-23&^11:00:00 GMT": {
        "amount": 90.8,
        "date": "2021-07-23",
        "time": "11:00:00 GMT",
        "data_type": "volume"
      },
      "2021-07-23&^12:00:00 GMT": {
        "amount": 900.8,
        "date": "2021-07-23",
        "time": "12:00:00 GMT",
        "data_type": "volume"
      },
    },
    "price": {
      "2021-07-23&^11:00:00 GMT": {
        "price": 3000,
        "date": "2021-07-23",
        "time": "11:00:00 GMT",
        "data_type": "volume"
      },
      "2021-07-23&^12:00:00 GMT": {
        "price": 4500,
        "date": "2021-07-23",
        "time": "12:00:00 GMT",
        "data_type": "volume"
      }
    }
  }
}

How Are Data Pipelines Setup?

We use a JSON configuration file to define the data pipelines. By default, that configuration file is a JSON file called "restieConfig.json" expected to be found in the same folder as the running app.

However, any JSON configuration file of whatever name and path can be used. Its full path just needs to be passed to the app when running using the --config (or -c) flag e.g.

./restie run --config /home/admin/configs/etl.json

Here is a sample configuration file for Restie.

{
  "pipelines": [
    {
      "name": "volume_pipeline",
      "source": "https://example.com/volume",
      "sourceType": "XML",
      "httpMethod": "GET",
      "isMultiple": false,
      "cronPattern": "@every 1h30m",
      "timestampPattern": "YYYY-MM-DDThh:mm:ssZ",
      "datePattern": "YYYY-MM-DD",
      "timezone": "Africa/Cairo",
      "dataPath": ["response", "responseBody", "responseList"],
      "recordTag": "record",
      "primaryFields": ["date", "time"],
      "separator": "--",
      "queryParams": {
        "startDatetime": "NOW - 0Y0M0D0h0m0s0ms",
        "endDatetime": "NOW + 0Y0M0D1h0m0s0ms",
        "APIKey": "uyfoafayreyruhererhjkahjkhs",
      },
      "headers": {
        "Authorization": "Bearer uyfoafayreyruhererhjkahjkhs",
        "Host": "https://example.com"
      },
      "postData": {},
      "pubSubQUrl": "ws://localhost:8005",
      "pubSubQAuthData": {
        "username": "johndoe",
        "password": "badguyi"
      },
    },
    {
      "name": "purchases_pipeline",
      "source": "https://example.com/purchases",
      "sourceType": "JSON",
      "httpMethod": "POST",
      "isMultiple": true,
      "groupBy": "data_type",
      "dataPath": ["response", "responseBody", "responseList"],
      "recordTag": "",
      "cronPattern": "@every 0h30m",
      "timestampPattern": "YYYY-MM-DDThh:mm:ss",
      "datePattern": "YYYY-MM-DD",
      "timezone": "Africa/Kampala",
      "primaryFields": ["date", "time"],
      "separator": "&^",
      "queryParams": {
        "startDatetime": "NOW",
        "endDatetime": "NOW + 0Y0M0D0h30m0s0ms",
      },
      "headers": {
        "Authorization": "Bearer uyfoafayreyruhererhjkahjkhs",
        "Content-Type": "application/json"
      },
      "postData": {
        "endDate": "TODAY + 0Y0M3D",
        "endMonth": "CURRENT_MONTH + 0Y2M",
        "endYear": "CURRENT_YEAR + 5Y",
      },
      "pubSubQUrl": "ws://localhost:8006"
    }
  ]
}

The possible configuration properties can be broken down as follows:

Config Property	Type	Required	Description
name	text (no spaces, single line)	Yes	The name of the pipeline. It will be the same as the message type published to in the PubSubQ
source	URL	Yes	The REST API endpoint from which to get the data. No query parameters should be part of it
sourceType	text: any of "XML", "CSV", or "JSON"	Yes	The type of response to be expected from the REST API endpoint.
httpMethod	text: any of "GET", "POST"	Yes	The HTTP method the REST API endpoint is to be accessed by
isMultiple	boolean (true or false)	Yes	Whether the data returned by the endpoint has multiple datasets in it basing on the perspective of the client. For example a client might need data grouped by author, and so the returned data will have a dataset for each author.
groupBy	text (no spaces, single line)	Yes if isMultiple is true, No otherwise	The field on each data record to use to group the data into multiple datasets. For the example above, it might be "authorId"
cronPattern	text of "@each {}h{}m" or "* * * * * * *" form. See below for more details	Yes	The Cron setting to control when this data pipeline is to be run, repetitively. The app has an internal cron runner.
timestampPattern	text with YYYY, MM, DD, hh, ss, mm, zz, z, or Z	Yes	The timestamp pattern to use when constructing any time-based query params, post data or headers on each cron run. There is a special config syntax for such time-based data. See below.
datePattern	text with YYYY, MM, DD	Yes	The date pattern to use when constructing any date-based query params, post data or headers on each cron run. There is a special config syntax for such time-based data. See below.
timezone	text (valid timezone names)	Yes	The timezone in which the cron is to be run
dataPath	list of text	Yes if sourceType is XML, No otherwise	The list of fields to follow along XML or JSON REST responses so as to get to the actual data.
recordTag	text	Yes if sourceType is XML, No otherwise	The name of the XML tag that holds each individual record.
primaryFields	list of text	Yes	The list of fields that would be used to collectively uniquely identify a given record in a dataset.
separator	text	No	The string to be used when creating a single key out of the primary key values of a given record so as to create a unique key for the Client-style JSON
headers	map of text keys and text values	No	The headers to be sent along during the REST API request. Some headers may be time-dependent or date-dependent. To cater for those, there is a special syntax to use in the configuration. See below.
queryParams	map of text keys and text values	No	The query parameters to be sent along during the REST API request. Some query parameters may be time-dependent or date-dependent. To cater for those, there is a special syntax to use in the configuration. See below.
postData	map of text keys and any primitive type values e.g. integers, strings	No	The data to be sent in the POST body if httpMethod is "POST". Some properties in this data may be time-dependent or date-dependent. To cater for those, there is a special syntax to use in the configuration. See below.
pubSubQUrl	Websocket URL	Yes	The websocket base URL of the PubSubQ instance to which this Restie app will publish its data so that the data can be availed in realtime to all other apps that need it.
pubSubQAuthData	map of text keys and text values	No	The auth data to be passed to a secured PubSubQ instance to authenticate with it before publishing to it.

How Do We Configure Time-Dependent and Date-Dependent Headers, Query Parameters or Post Data

We understand that certain REST API requests might have headers, POST data or query parameters that depend on time. We have provided an option to have such values in the form below (the square brackets mean the things inside are optional)

KEYWORD [OPERATOR OPERAND]

The KEYWORD can be any of: NOW, CURRENT_QUARTER_HOUR, CURRENT_HALF_HOUR, CURRENT_HOUR, TODAY, CURRENT_MONTH, CURRENT_YEAR

The optional OPERATOR can be any of: +, -

The optional OPERAND can be in the form (the square brackets mean the things inside are optional):

[<Years>Y][<months>M][<days>D][<hours>h][<minutes>m][<seconds>s][<mlliseconds>ms]

e.g.

For: 20 years, 5 months, 2 days, 60 hours, 45 minutes, 13 seconds and 67 milliseconds

20Y5M2D60h45m13s67ms

For: 20 years and 5 months

20Y5M

A sample configuration of a REST API endpoint that always sends the current half hour (current_half_hour), the start date (start_date) as the previous day's date and the end date (end_date) as the date of the day five days from now via query parameters might look something like:

Some required properties have been removed for brevity

{
  "pipelines": [
    {
      "source": "http://example.com",
      "timezone": "Africa/Kampala",
      "httpMethod": "GET",
      "queryParams": {
        "current_half_hour": "CURRENT_HALF_HOUR",
        "start_date": "TODAY - 1D",
        "end_date": "TODAY + 5D"
      }
    }
  ]
}

If today were 2021-07-26 and the time now were 11:00:00EAT, the resulting URL would be:

http://example.com?current_half_hour=23&start_date=2021-07-25&end_date=2021-07-31

How Do We Configure the Timing of the Data Pipelines

We use the cronPattern property to control the time intervals when a given REST API endpoint is to be queried. It deals with repetitive REST calls such as: "every Wednesday at 11:00 am EAT (*)", "every minute"

The `cronPattern` has two possible formats:

The ******* Pattern:

This is the usual "*****" pattern but with two extra "*" for second and year. Note: It does not support the */number syntax e.g. */7 Note: day of week is from 0 (Sunday) to 6 (Saturday)

The pattern is of the form:

<second>|* <minute>|* <hour>|* <day>|* <month>|* <dayOfWeek>|* <year>|*

e.g.

For: every Wednesday at 11:00:05

5 0 11 * * 3 *

A sample configuration can look like:

Some required properties have been left out for brevity

{
  "pipelines": [
    {
    "name": "sample_pipeline",
     "source": "http://example.com/api/v2",
    "cronPattern": "5 0 11 * * 3 *"
    }
  ]
}

The @every Pattern:

This is the simpler version of creating jobs that run say every hour or every minute or every second or every millisecond etc. Note: The shortest interval is a millisecond

The pattern is of the form:

@every [<hours>h][<minutes>m][<seconds>s][<milliseconds>MS]

e.g.

For: every 2 and a half hours and 30 seconds and 3 milliseconds

@every 2h30m30s3MS

For: every 3 seconds and 45 milliseconds

@every 30s45MS

For: every hour

@every 1h

A sample configuration can look like:

Some required properties have been left out for brevity

{
  "pipelines": [
    {
      "name": "sample_pipeline_2",
      "source": "http://example.com/api",
      "cronPattern": "@every 30m"
    }
  ]
}

Features

Pipelines Created in JSON fileUsing a file called "restieConfig.json" in the same folder as this app, you can create many data pipelines connecting to any number of REST APIs
Handles XML, CSV, JSON APISRestie is able to output clean JSON data with unique IDs for JavaScript Clients to use, all coming from any XML, JSON or CSV REST APIs
Realtime DataSince it is wired to connect to a Pub/Sub (PubSubQ), any number of subscribers can received the clean Client-style JSON as soon as it is received by Restie
Automated CronjobsRestie allows you to create REST API data pipelines that repeat automatically basing on your configuration in the JSON config file under the key "cronPattern"
Helpful Error MessagesYou don't have to worry about making errors in the JSON config file. The application will show you where your error is when you attempt running it.
CLIRestie is a fully-fledged CLI app with a help section. You can run 'restie help' to see all options
Light-weightIt is a very small app of ~10MBs
Run in the BackgroundSince "v0.0.1-alpha.3", it is possible to run the CLI in the background, while logging output to a file.

Quick Start

Download the Chrome Websockets browser extension
Download the PubSubQ app for your operating system and architecture (note that most computers have AMD64 architecture. If it does not work, you can always download the ARM version).

Operating System	Architecture	Download
Windows	AMD64	Download
Windows	ARM	Download
MacOS	AMD64	Download
MacOS	ARM64	Download
Linux	AMD64	Download
Linux	ARM64	Download

Start the PubSubQ instance on port 8080 (You can choose whatever port you want)

For Linux and Mac:

./pubsubq run -p 8080

For Windows:

pubsubq.exe run -p 8080

Open the Chrome Websocket browser extension
In the Server URL input of the Chrome websocker browser extension window, type in ws://localhost:8080/receive/

Add the ws://localhost:8080/receive/ to the Server URL input

Click Connect button
Download the Restie app for your operating system and computer architecture.

Operating System	Architecture	Download
Windows	AMD64	Download
Windows	ARM	Download
MacOS	AMD64	Download
MacOS	ARM64	Download
Linux	AMD64	Download
Linux	ARM64	Download

Create a file restieConfig.json in the same directory that your restie app was downloaded to.

touch restieConfig.json

Copy, paste and save the following JSON into the restieConfig.json file

{
  "pipelines": [
     {
        "name": "cet_time",
        "source": "http://worldtimeapi.org/api/timezone/CET",
        "sourceType": "JSON",
        "httpMethod": "GET",
        "isMultiple": false,
        "cronPattern": "@every 0h1m",
        "timestampPattern": "YYYY-MM-DDThh:mm:sszz",
        "datePattern": "YYYY-MM-DD",
        "dataPath": [],
        "timezone": "GMT",
        "primaryFields": ["timezone", "datetime"],
        "separator": "--",
        "pubSubQUrl": "ws://localhost:8080"
      }
   ]
}

Run the restie app

For Linux and Mac

./restie run

For Windows:

restie.exe run

Go back to the Chrome websockets browser extension window and view the received messages text area. You should see new messages printed in the area every minute.

Messages Coming in from the PubSubQ

Since v0.0.1-alpha.3: You can also run restie as a daemon by passing it a --daemon (or -d) flag. By default, it will log to a restie.log file in the same folder as the running app. To change the log path, the --log (or -l) flag is provided with an absolute path file

To see more details of how this app is used, pass the --help (or -h) flag to run e.g.

For linux and MacOS

restie run --help

or for windows

restie.exe run --help

Downloads

Darwin ARM64 (v0.0.1-alpha.3) Download
Linux ARM64 (v0.0.1-alpha.3) Download
Windows ARM (v0.0.1-alpha.3) Download
Linux AMD64 (v0.0.1-alpha.3) Download
Windows AMD64 (v0.0.1-alpha.3) Download
Darwin AMD64 (v0.0.1-alpha.3) Download

Linux AMD64 (v0.0.1-alpha.2) Download
Linux ARM64 (v0.0.1-alpha.2) Download
Darwin ARM64 (v0.0.1-alpha.2) Download
Windows ARM (v0.0.1-alpha.2) Download
Darwin AMD64 (v0.0.1-alpha.2) Download
Windows AMD64 (v0.0.1-alpha.2) Download

Linux ARM64 (v0.0.1-alpha.1) Download
Darwin ARM64 (v0.0.1-alpha.1) Download
Linux AMD64 (v0.0.1-alpha.1) Download
Windows ARM (v0.0.1-alpha.1) Download
Darwin AMD64 (v0.0.1-alpha.1) Download
Windows AMD64 (v0.0.1-alpha.1) Download

Details

Description

What is Client-style JSON?

How Are Data Pipelines Setup?

How Do We Configure Time-Dependent and Date-Dependent Headers, Query Parameters or Post Data

How Do We Configure the Timing of the Data Pipelines

The cronPattern has two possible formats:

The ******* Pattern:

The @every Pattern:

Features

Quick Start

Downloads

Version: 0.0.1-alpha.3

Version: 0.0.1-alpha.2

Version: 0.0.1-alpha.1

The `cronPattern` has two possible formats: