Skip to main content
All Collections★ Platform Basics | Noções Básicas
How to import data from a secure cloud object storage (GCP, AWS, Azure)
How to import data from a secure cloud object storage (GCP, AWS, Azure)

Import data objects from your GCS or S3 buckets into you Birdie account

P
Written by Product Team
Updated yesterday

Overview

With the S3 connector, Birdie can import data in Parquet format from AWS S3 or a storage service that implements the S3 API such as Google Cloud Storage. Once a day the connector checks if there are new objects (files) and if so imports the records in those objects.

Requirements

  • Dedicated bucket for Birdie Integration

  • Birdie integration requires a service account with read-only access. Write access may be granted as optional to support teams during initial/manual dataset uploads.

    • AWS Docs

      • See docs on creating a user and generating an access key for the user.

      • If you do not wish to provide a service account, see the docs to create a role and create an IAM Policy that allows Birdie (reach out and we'll provide the ID for the Birdie Account) to assume the role (see the docs)

      • Create an IAM Policy that gives the user/role S3 Read or S3 Read/Write Access to the specific bucket. See the docs on how to this.

    • GCP Docs

      • See the docs for how to enable HMAC Access to a bucket

      • For more information on how this works, Read about the GCS XML API, which works with S3 compatible tools.

      • For more information about HMAC keys, Read about HMAC keys for GCS

  • IAM Policy example with read-write access:

Parameters

  • Region: The region, e.g "us-west-2" (AWS) or "us-central1" (GCP).

  • Bucket: The bucket name.

  • Prefix: A prefix for the object keys. We suggest organizing it based on the kind of data (e.g `birdie/tickets`, `birdie/nps`)

  • Format: The data format to use. Currently only supports `parquet`.

  • Kind: The kind of data you're trying to import. This defines what schema Birdie expects when reading rows from your file. Supported values are:

    • `review`

    • `nps`

    • `csat`

    • `support_ticket`

    • `social_media_post`

    • `issue`

  • Credentials for S3

    • Access Key ID / HMAC Access ID

    • Secret Key ID / HMAC Secret

    • External ID (optional,AWS Specific)

    • Role ARN (optional, AWS specific)

    • The S3 endpoint to use. Only needed if not using AWS S3.

  • Start Date: Date to filter objects by (object modified at).

S3 Schemas

Each row of the file must fit within one of the following schemas. The schema must match the kind selected when configuring the parameters.

See the oficial PARQUET spec for more information on the supported types and logical types.

Feedbacks // Review

Column Name

Type

Optional

Description

feedback_id

STRING

Required

Unique identifier for each review.

text

STRING

Required

Text posted by user

posted_at

STRING

Required

When the feedback was posted (RFC 3339 timestamp)

author_id

STRING

Optional

Identifier for the author of the the record.

account_id

STRING

Optional

Identifier for the account the record belongs to.

language

STRING

Optional

Language of the record as BCP 47 code.

title

STRING

Optional

The title of the feedback given by the author.

rating

FLOAT

Optional

A rating or score of the feedback.

category

STRING

Optional

The category the review belongs to.

owner

STRING

Optional

Owner, Competitor

Feedbacks // NPS and CSAT

Column Name

Type

Optional

Description

feedback_id

STRING

Required

Unique identifier for each answer.

text

STRING

Optional

Text posted by user

posted_at

STRING

Required

When the feedback was posted (RFC 3339 timestamp)

author_id

STRING

Optional

Identifier for the author of the the record.

account_id

STRING

Optional

Identifier for the account the record belongs to.

language

STRING

Optional

Language of the record as BCP 47 code.

title

STRING

Optional

The title of the survey.

rating

FLOAT

Optional

A rating or score of the feedback.

author_name

STRING

Optional

The name of the author.

Conversations // Support tickets

Column Name

Type

Optional

Description

conversation_id

STRING

Required

Unique identifier for each conversation.

message_id

STRING

Required

Unique identifier for each message.

author_id

STRING

Optional

Identifier for the author of the the message.

account_id

STRING

Optional

Identifier for the account the message belongs to.

text

STRING

Required

Text of the message

posted_at

STRING

Required

When the message was posted (RFC 3339 timestamp)

language

STRING

Optional

Language of the message as BCP 47 code.

subject

STRING

Optional

Subject of the ticket.

status

STRING

Optional

Status of the ticket, e.g open.

priority

STRING

Optional

Priority assigned to the ticket.

channel

STRING

Optional

Source channel of the ticket, e.g web.

tags

REPEATED STRING

Optional

Array of tags applied to the ticket.

author_type

STRING

Optional

Internal Person, User, Bot

author_name

STRING

Optional

The name of the author of the message.

Conversation // Social Media Post

Column Name

Type

Optional

Description

conversation_id

STRING

Required

Unique identifier for each conversation.

message_id

STRING

Required

Unique identifier for each message.

author_id

STRING

Optional

Identifier for the author of the the message.

account_id

STRING

Optional

Identifier for the account the message belongs to.

text

STRING

Required

Text of the message

posted_at

STRING

Required

When the message was posted (RFC 3339 timestamp)

language

STRING

Optional

Language of the message as BCP 47 code.

title

STRING

Optional

Title of the post.

owner

STRING

Optional

Owner, Competitor

category

STRING

Optional

The category the post was under, e.g a subreddit name.

url

STRING

Optional

URL of the post.

channel

STRING

Optional

Source channel of the post, e.g facebook.

tags

REPEATED STRING

Optional

Array of tags applied to the post.

author_type

STRING

Optional

Internal Person, User, Bot

author_name

STRING

Optional

The name of the author of the message.

upvotes

INTEGER

Optional

The number of upvotes the message has.

Conversation // Issue

Column Name

Type

Optional

Description

conversation_id

STRING

Required

Unique identifier for each conversation.

message_id

STRING

Required

Unique identifier for each message.

author_id

STRING

Optional

Identifier for the author of the the message.

account_id

STRING

Optional

Identifier for the account the message belongs to.

text

STRING

Required

Text of the message

posted_at

STRING

Required

When the message was posted (RFC 3339 timestamp)

language

STRING

Optional

Language of the message as BCP 47 code.

project_id

STRING

Optional

Project identifier

project_name

STRING

Optional

Project Name

title

STRING

Optional

Issue title

status

STRING

Optional

Issue status

author_name

STRING

Optional

The name of the author of the message.

Custom Fields

Any columns that don't fit under the previously listed schemas may become custom fields.

The name of the column in the Parquet Schema must be configured as the key/source of the custom field inside the Birdie App.

Did this answer your question?