What is Amazon Redshift used for?

Amazon Redshift is a cloud data warehouse used for the storing and analysis of information. It forms a part of the Amazon Web Services platform. Amazon Redshift is a column-oriented, parallel processing database designed for large scale data processing.

Is Redshift an ETL tool?

No, Redshift is a data warehouse— it’s used as a final storage location for most ETL tools. Data stored in Redshift can be programmatically accessed via SQL for analysis.

What is difference between SQL and Redshift?

SQL, Structured Query Language, is a programming language used for the retrieval of information from databases. Redshift is a parallelized, cloud-hosted database. SQL is used to interface with data stored in Redshift.

What type of database is Redshift?

Redshift is a column-oriented cloud-native database. It is a fully hosted database, meaning the infrastructure and management is handled entirely via AWS. Serverless options exist, too— these almost fully eliminate the need for any database maintenance or scaling.

What is the difference between Redshift and AWS?

Redshift is a component of AWS. Specifically, it is a cloud data warehouse used for the storing and analysis of information. Redshift integrates nicely with the AWS platform, making it a good choice for current customers of the service.

Is Redshift SQL or MySql?

Amazon Redshift syntax is based loosely on PostgreSQL, but the dialect is unique to Redshift. Redshift utilizes SQL but has no relation to MySQL.

What is Redshift and how it works?

Redshift is a column-oriented cloud-native database. It is a fully hosted database, meaning the infrastructure and management is handled entirely via AWS. Serverless options exist, too— these almost fully eliminate the need for any database maintenance or scaling.

Is Redshift a SQL database?

Yes, Redshift is a column-oriented cloud-native database. It is a fully hosted database, meaning the infrastructure and management is handled entirely via AWS. Serverless options exist, too— these almost fully eliminate the need for any database maintenance or scaling.

Which language is used in Redshift?

SQL is used to interface with Amazon Redshift. It accepts a SQL variant similar to Postgres. Amazon Redshift also allows users to define functions in Python.

What type of data is stored in Redshift?

Structured data in a wide variety of formats can be stored in Redshift. Redshift also has some support for semi-structured data in JSON format.

Is Redshift relational or Nosql?

Redshift is a relational database.

What are the benefits of Redshift?

Redshift has a number of benefits not limited to: efficient storage and high-performance query processing, support for streaming ingestion, support for semi-structured data, AWS services integration, auto-copy from S3 functionality, and extensive integration with 3rd party tooling.

Database/Cloud

Amazon Redshift Cheat Sheet

Questions about Redshift? We're providing answers to the questions we often get from Redshift users and admins.

Matt Palmer

Jan 4, 2022 • 6 min read

In this article we've documented answers to the questions that Zuar frequently gets from the Amazon Redshift users and account admins that we work with on a daily basis.

Cheat Sheet for Amazon's Redshift Cloud Data Warehouse

What storage types does Redshift use?

Amazon Redshift stores data in columnar format. This means data is accessed by column rather than by row. In a typical relational database, records are stored by row:

records are stored by row in a relational database — *Source: Amazon Redshift docs*

With row-wise storage, data is written sequentially by row. If a row is large enough, it may take up multiple blocks. If too small, the row will take up less than one block, resulting in an inefficient use of disk space. This type of storage is optimal for solutions frequently reading and writing all values of a record, typically one or a small number of records at a time. As such, row-wise storage is ideal for online transaction processing (OLTP) application data-stores.

Columnar storage, by contrast, stores values by each column into disk blocks.

Columnar storage stores values by each column into disk blocks — *Source: Amazon Redshift docs*

Using column-wise storage, each data block can be entirely filled. In the example above, one column-wise block holds the same amount of data as three row-wise blocks! This translates directly to I/O operation efficiency. Storage efficiency is even greater in larger datasets. Columnar datastores are extremely efficient for read workloads requiring small sets of columns, like BI or analytics. These datastores are frequently called online analytical processing (OLAP).

For Amazon Redshift, this means efficiency in storage, I/O, and memory.

How do I load data to Redshift from external sources?

Syntax:

COPY table-name
[ column-list ]
FROM data_source
authorization
[ [ FORMAT ] [ AS ] data_format ]
[ parameter [ argument ] [, ... ] ]

This can be used to load files from Amazon EMR, S3, Dynamo or SSH.

How do I insert rows into a table?

Syntax:

INSERT INTO table_name [ ( column [, ...] ) ]
{DEFAULT VALUES |
VALUES ( { expression | DEFAULT } [, ...] )
[, ( { expression | DEFAULT } [, ...] )
[, ...] ] |
query }

For example, to insert an individual row into an existing table, use comma-delimited values corresponding to the column names. Take the following example for a trails table with columns name, city, state, length, difficulty, type:

INSERT INTO hiking_trails VALUES (‘Craggy Gardens’, ‘Barnardsville’, ‘NC’, ‘1.9 miles’, ‘easy’, ‘hiking’);

You can also INSERT from another table:

INSERT INTO trails (SELECT * FROM trails WHERE type=’hiking’)

How do I export to CSV?

Syntax:

UNLOAD ('SELECT * FROM VENUE')
TO [path-to-s3-bucket]
IAMROLE [redshift-iam-role]
CSV;

With the appropriate bucket path and IAM role. UNLOAD accepts a number of arguments, discussed in further detail here.

How do I create a new user?

Syntax:

CREATE USER name [ WITH ]
PASSWORD { 'password' | 'md5hash' | DISABLE }
[ option [ ... ] ]

where 'option' can be:

CREATEDB | NOCREATEDB

CREATEUSER | NOCREATEUSER

SYSLOG ACCESS { RESTRICTED | UNRESTRICTED }

IN GROUP groupname [, ... ]

VALID UNTIL 'abstime'

CONNECTION LIMIT { limit | UNLIMITED }

SESSION TIMEOUT { limit }

A simple example of a user with a password:

CREATE USER newman WITH PASSWORD '@AbC4321!';

Parse JSON in Redshift

How do I check for a valid JSON string or array?

To check for valid, simple JSON strings use:

IS_VALID_JSON(‘string’)

For JSON arrays:

IS_VALID_JSON_ARRAY(‘string’)

How do I extract elements from a JSON array?

To extract a JSON element by position in a JSON Array:

JSON_EXTRACT_ARRAY_ELEMENT_TEXT('json string', pos [, null_if_invalid ])

Example usage:

SELECT JSON_EXTRACT_ARRAY_ELEMENT_TEXT('[111,112,113]', 2);

This selects element two, 113.

To extract a JSON element by key:

JSON_EXTRACT_PATH_TEXT('json_string', 'path_elem' [,'path_elem'[, …] ] [, null_if_invalid ])

Example usage with nested JSON:

SELECT JSON_EXTRACT_PATH_TEXT('{"f2":{"f3":1},"f4":{"f5":99,"f6":"star"}}','f4', 'f6');

This selects the keys f4, then f6 (sequentially), returning “star.”

How do I generate Redshift DDL for existing table?

There are times when it is necessary to recreate a table (e.g. copying a table across clusters). To do that, you can query Redshift to get the DDL for any table with:

What functions get my redshift system info?

SELECT CURRENT_AWS_ACCOUNT();
SELECT CURRENT_DATABASE();
SELECT CURRENT_NAMESPACE();
SELECT CURRENT_SCHEMA();
SELECT CURRENT_USER();
SELECT CURRENT_USER_ID();

See more Redshift system information functions.

How can I list all users, groups, databases?

SELECT * FROM pg_user;

SELECT * FROM pg_group;

SELECT * FROM pg_database;

How can I troubleshoot loading errors?

Selecting from stl_load_errors provides information about errors during loading, and can be helpful for troubleshooting problematic loads.

SELECT *
FROM stl_load_errors
ORDER BY starttime DESC
LIMIT 100;

How do I get SQL from query_id?

SELECT LISTAGG(text) WITHIN GROUP (ORDER BY sequence) AS sql

FROM STL_QUERYTEXT

WHERE query = …;

What is the best way to assess tables that need to be vacuumed or analyzed?

This query returns tables where greater than 20% of rows are unsorted or statistics are 20% stale.

SELECT "database", "schema", "table", unsorted, stats_off
FROM svv_table_info
WHERE unsorted > 20
OR stats_off > 20

Running VACUUM or ANALYZE on these tables may improve performance by keeping rows sorted and providing the query planner with accurate information.

How do I manipulate Dates in Redshift?

Add dates: this example returns ‘2021-01-31’— 30 days after Jan. 1, 2021.

SELECT DATEADD(day, 30, ‘2021-01-01’)

Subtract dates: this example returns 52— the number of weeks in a calendar year.

SELECT DATEDIFF(week,'2021-01-01','2021-12-31')

Get Date Parts

The following examples extract various elements of dates. Multiple are provided for clarity.

SELECT DATE_PART(minute, '2021-01-01 08:28:01');

This will return ‘28,’ the minute component of the timestamp.

SELECT DATE_PART(w, '2021-06-17 08:28:01');

This will return ‘24,’ the week number of the year.

Alternatively, you could also use EXTRACT, which takes different arguments, but works identically:

SELECT EXTRACT(week from ‘2021-06-17’)

Truncate Timestamp

DATE_PART truncates a timestamp based on the specified part. The example will return ‘2021-09-01,’ the pertinent month.

DATE_TRUNC(‘month’, ‘2021-09-05’)

For more information see Redshift’s date part and timestamp functions.

Check out some of our other database cheat sheets:

How do I combine my Redshift data with other data sources, prepped and ready for analysis?

Analyzing data across your business solutions is simplified with Zuar Runner. You can automate your ELT/ETL processes and have data flowing from hundreds of potential sources, like Redshift, into a single destination. Transport, warehouse, transform, model, report, and monitor: it's all managed by Runner. You can learn more about Runner here.

Amazon Redshift Cheat Sheet

Matt Palmer

What storage types does Redshift use?

How do I load data to Redshift from external sources?

How do I insert rows into a table?

How do I export to CSV?

How do I create a new user?

Parse JSON in Redshift

How do I check for a valid JSON string or array?

How do I extract elements from a JSON array?

How do I generate Redshift DDL for existing table?

What functions get my redshift system info?

How can I list all users, groups, databases?

How can I troubleshoot loading errors?

How do I get SQL from query_id?

What is the best way to assess tables that need to be vacuumed or analyzed?

How do I manipulate Dates in Redshift?

Get Date Parts

Truncate Timestamp

How do I combine my Redshift data with other data sources, prepped and ready for analysis?

Other relevant articles:

Sign up for more like this. Our Guarantee: we email infrequently and a single click will unsubscribe you.