How to import data from a secure cloud object storage (GCP, AWS, Azure)

Overview

With the S3 connector, Birdie can import data in Parquet format from AWS S3 or a storage service that implements the S3 API such as Google Cloud Storage. Once a day the connector checks if there are new objects (files) and if so imports the records in those objects.

Requirements

Dedicated bucket for Birdie Integration
Birdie integration requires a service account with read-only access. Write access may be granted as optional to support teams during initial/manual dataset uploads.
- AWS Docs
  - See docs on creating a user and generating an access key for the user.
  - If you do not wish to provide a service account, see the docs to create a role and create an IAM Policy that allows Birdie (reach out and we'll provide the ID for the Birdie Account) to assume the role (see the docs)
  - Create an IAM Policy that gives the user/role S3 Read or S3 Read/Write Access to the specific bucket. See the docs on how to this.
- GCP Docs
  - See the docs for how to enable HMAC Access to a bucket
  - For more information on how this works, Read about the GCS XML API, which works with S3 compatible tools.
  - For more information about HMAC keys, Read about HMAC keys for GCS
IAM Policy example with read-write access:

Parameters

Region: The region, e.g "us-west-2" (AWS) or "us-central1" (GCP).
Bucket: The bucket name.
Prefix: A prefix for the object keys. We suggest organizing it based on the kind of data (e.g `birdie/tickets`, `birdie/nps`)
Format: The data format to use. Currently only supports `parquet`.
Kind: The kind of data you're trying to import. This defines what schema Birdie expects when reading rows from your file. Supported values are:
- `review`
- `nps`
- `csat`
- `support_ticket`
- `social_media_post`
- `issue`
Credentials for S3
- Access Key ID / HMAC Access ID
- Secret Key ID / HMAC Secret
- External ID (optional,AWS Specific)
- Role ARN (optional, AWS specific)
- The S3 endpoint to use. Only needed if not using AWS S3.
Start Date: Date to filter objects by (object modified at).

S3 Schemas

Each row of the file must fit within one of the following schemas. The schema must match the kind selected when configuring the parameters.

See the oficial PARQUET spec for more information on the supported types and logical types.

Feedbacks // Review

Column Name	Type	Optional	Description
`feedback_id`	`STRING`	Required	Unique identifier for each review.
`text`	`STRING`	Required	Text posted by user
`posted_at`	`STRING`	Required	When the feedback was posted (RFC 3339 timestamp)
`author_id`	`STRING`	Optional	Identifier for the author of the the record.
`account_id`	`STRING`	Optional	Identifier for the account the record belongs to.
`language`	`STRING`	Optional	Language of the record as BCP 47 code.
`title`	`STRING`	Optional	The title of the feedback given by the author.
`rating`	`FLOAT`	Optional	A rating or score of the feedback.
`category`	`STRING`	Optional	The category the review belongs to.
`owner`	`STRING`	Optional	`Owner`, `Competitor`

Feedbacks // NPS and CSAT

Column Name	Type	Optional	Description
`feedback_id`	`STRING`	Required	Unique identifier for each answer.
`text`	`STRING`	Optional	Text posted by user
`posted_at`	`STRING`	Required	When the feedback was posted (RFC 3339 timestamp)
`author_id`	`STRING`	Optional	Identifier for the author of the the record.
`account_id`	`STRING`	Optional	Identifier for the account the record belongs to.
`language`	`STRING`	Optional	Language of the record as BCP 47 code.
`title`	`STRING`	Optional	The title of the survey.
`rating`	`FLOAT`	Optional	A rating or score of the feedback.
`author_name`	`STRING`	Optional	The name of the author.

Conversations // Support tickets

Column Name	Type	Optional	Description
`conversation_id`	`STRING`	Required	Unique identifier for each conversation.
`message_id`	`STRING`	Required	Unique identifier for each message.
`author_id`	`STRING`	Optional	Identifier for the author of the the message.
`account_id`	`STRING`	Optional	Identifier for the account the message belongs to.
`text`	`STRING`	Required	Text of the message
`posted_at`	`STRING`	Required	When the message was posted (RFC 3339 timestamp)
`language`	`STRING`	Optional	Language of the message as BCP 47 code.
`subject`	`STRING`	Optional	Subject of the ticket.
`status`	`STRING`	Optional	Status of the ticket, e.g `open`.
`priority`	`STRING`	Optional	Priority assigned to the ticket.
`channel`	`STRING`	Optional	Source channel of the ticket, e.g `web`.
`tags`	`REPEATED STRING`	Optional	Array of tags applied to the ticket.
`author_type`	`STRING`	Optional	`Internal Person`, `User`, `Bot`
`author_name`	`STRING`	Optional	The name of the author of the message.

Conversation // Social Media Post

Column Name	Type	Optional	Description
`conversation_id`	`STRING`	Required	Unique identifier for each conversation.
`message_id`	`STRING`	Required	Unique identifier for each message.
`author_id`	`STRING`	Optional	Identifier for the author of the the message.
`account_id`	`STRING`	Optional	Identifier for the account the message belongs to.
`text`	`STRING`	Required	Text of the message
`posted_at`	`STRING`	Required	When the message was posted (RFC 3339 timestamp)
`language`	`STRING`	Optional	Language of the message as BCP 47 code.
`title`	`STRING`	Optional	Title of the post.
`owner`	`STRING`	Optional	`Owner`, `Competitor`
`category`	`STRING`	Optional	The category the post was under, e.g a subreddit name.
`url`	`STRING`	Optional	URL of the post.
`channel`	`STRING`	Optional	Source channel of the post, e.g `facebook`.
`tags`	`REPEATED STRING`	Optional	Array of tags applied to the post.
`author_type`	`STRING`	Optional	`Internal Person`, `User`, `Bot`
`author_name`	`STRING`	Optional	The name of the author of the message.
`upvotes`	`INTEGER`	Optional	The number of upvotes the message has.

Conversation // Issue

Column Name	Type	Optional	Description
`conversation_id`	`STRING`	Required	Unique identifier for each conversation.
`message_id`	`STRING`	Required	Unique identifier for each message.
`author_id`	`STRING`	Optional	Identifier for the author of the the message.
`account_id`	`STRING`	Optional	Identifier for the account the message belongs to.
`text`	`STRING`	Required	Text of the message
`posted_at`	`STRING`	Required	When the message was posted (RFC 3339 timestamp)
`language`	`STRING`	Optional	Language of the message as BCP 47 code.
`project_id`	`STRING`	Optional	Project identifier
`project_name`	`STRING`	Optional	Project Name
`title`	`STRING`	Optional	Issue title
`status`	`STRING`	Optional	Issue status
`author_name`	`STRING`	Optional	The name of the author of the message.

Custom Fields

Any columns that don't fit under the previously listed schemas may become custom fields.

The name of the column in the Parquet Schema must be configured as the key/source of the custom field inside the Birdie App.