JDBC
The dbt Semantic Layer Java Database Connectivity (JDBC) API enables users to query metrics and dimensions using the JDBC protocol, while also providing standard metadata functionality.
A JDBC driver is a software component enabling a Java application to interact with a data platform. Here's some more information about our JDBC API:
- The Semantic Layer JDBC API utilizes the open-source JDBC driver with ArrowFlight SQL protocol.
- You can download the JDBC driver from Maven.
- The dbt Semantic Layer supports ArrowFlight SQL driver version 12.0.0 and higher.
- You can embed the driver into your application stack as needed, and you can use dbt Labs' example project for reference.
- If you’re a partner or user building a homegrown application, you’ll need to install an AWS root CA to the Java Trust documentation (specific to Java and JDBC call).
dbt Labs partners can use the JDBC API to build integrations in their tools with the dbt Semantic Layer
Using the JDBC API
If you are a dbt user or partner with access to dbt Cloud and the dbt Semantic Layer, you can setup and test this API with data from your own instance by configuring the Semantic Layer and obtaining the right JDBC connection parameters described in this document.
You may be able to use our JDBC API with tools that do not have an official integration with the dbt Semantic Layer. If the tool you use allows you to write SQL and either supports a generic JDBC driver option (such as DataGrip) or supports Dremio and uses ArrowFlightSQL driver version 12.0.0 or higher, you can access the Semantic Layer API.
Refer to Get started with the dbt Semantic Layer for more info.
Authentication
dbt Cloud authorizes requests to the dbt Semantic Layer API. You need to provide an environment ID, host, and service account tokens.
Connection parameters
The JDBC connection requires a few different connection parameters.
This is an example of a URL connection string and the individual components:
jdbc:arrow-flight-sql://semantic-layer.cloud.getdbt.com:443?&environmentId=202339&token=SERVICE_TOKEN
JDBC parameter | Description | Example |
---|---|---|
jdbc:arrow-flight-sql:// | The protocol for the JDBC driver. | jdbc:arrow-flight-sql:// |
semantic-layer.cloud.getdbt.com | The access URL for your account's dbt Cloud region. You must always add the semantic-layer prefix before the access URL. | For dbt Cloud deployment hosted in North America, use semantic-layer.cloud.getdbt.com |
environmentId | The unique identifier for the dbt production environment, you can retrieve this from the dbt Cloud URL when you navigate to Environments under Deploy. | If your URL ends with .../environments/222222 , your environmentId is 222222 |
SERVICE_TOKEN | dbt Cloud service token with “Semantic Layer Only” and "Metadata Only" permissions. Create a new service token on the Account Settings page. | token=SERVICE_TOKEN |
*Note — If you're testing locally on a tool like DataGrip, you may also have to provide the following variable at the end or beginning of the JDBC URL &disableCertificateVerification=true
.
Querying the API for metric metadata
The Semantic Layer JDBC API has built-in metadata calls which can provide a user with information about their metrics and dimensions. Here are some metadata commands and examples:
- Fetch all defined metrics
- Fetch all dimensions for a metric
- Fetch dimension values metrics
- Fetch queryable primary time granularities for metrics
- Fetch available metrics given dimensions
- Fetch queryable granularities for all time dimensions
- Determine what time dimension(s) make up metric_time for your metric(s)
Use this query to fetch all defined metrics in your dbt project:
select * from {{
semantic_layer.metrics()
}}
Use this query to fetch all dimensions for a metric.
Note, metrics
is a required argument that lists one or multiple metrics in it.
select * from {{
semantic_layer.dimensions(metrics=['food_order_amount'])}}
Use this query to fetch dimension values for one or multiple metrics and single dimension.
Note, metrics
is a required argument that lists one or multiple metrics in it, and a single dimension.
select * from {{
semantic_layer.dimension_values(metrics=['food_order_amount'], group_by=['customer__customer_name'])}}
Use this query to fetch queryable granularities for a list of metrics. This API request allows you to only show the time granularities that make sense for the primary time dimension of the metrics (such as metric_time
), but if you want queryable granularities for other time dimensions, you can use the dimensions()
call, and find the column queryable_granularities.
Note, metrics
is a required argument that lists one or multiple metrics in it.
select * from {{
semantic_layer.queryable_granularities(metrics=['food_order_amount', 'order_gross_profit'])}}
Use this query to fetch available metrics given dimensions. This command is essentially the opposite of getting dimensions given a list of metrics.
Note, group_by
is a required argument that lists one or multiple dimensions in it.
select * from {{
semantic_layer.metrics_for_dimensions(group_by=['customer__customer_type'])
}}
Use this example query to fetch available granularities for all time dimesensions (the similar queryable granularities API call only returns granularities for the primary time dimensions for metrics). The following call is a derivative of the dimensions()
call and specifically selects the granularities field.
select NAME, QUERYABLE_GRANULARITIES from {{
semantic_layer.dimensions(
metrics=["order_total"]
)
}}
It may be useful in your application to expose the names of the time dimensions that represent metric_time
or the common thread across all metrics.
You can first query the metrics()
argument to fetch a list of measures, then use the measures()
call which will return the name(s) of the time dimensions that make up metric time.
select * from {{
semantic_layer.measures(metrics=['orders'])
}}
Querying the API for metric values
To query metric values, here are the following parameters that are available:
Parameter | Description | Example | Type |
---|---|---|---|
metrics | The metric name as defined in your dbt metric configuration | metrics=['revenue'] | Required |
group_by | Dimension names or entities to group by. We require a reference to the entity of the dimension (other than for the primary time dimension), which is pre-appended to the front of the dimension name with a double underscore. | group_by=['user__country', 'metric_time'] | Optional |
grain | A parameter specific to any time dimension and changes the grain of the data from the default for the metric. | group_by=[Dimension('metric_time') grain('week\|day\|month\|quarter\|year')] | Optional |
where | A where clause that allows you to filter on dimensions and entities using parameters - comes with TimeDimension , Dimension , and Entity objects. Granularity is required with TimeDimension | "{{ where=Dimension('customer__country') }} = 'US')" | Optional |
limit | Limit the data returned | limit=10 | Optional |
order | Order the data returned | order_by=['-order_gross_profit'] (remove - for ascending order) | Optional |
compile | If true, returns generated SQL for the data platform but does not execute | compile=True | Optional |
Note on time dimensions and metric_time
You will notice that in the list of dimensions for all metrics, there is a dimension called metric_time
. Metric_time
is a reserved keyword for the measure-specific aggregation time dimensions. For any time-series metric, the metric_time
keyword should always be available for use in queries. This is a common dimension across all metrics in a semantic graph.
You can look at a single metric or hundreds of metrics, and if you group by metric_time
, it will always give you the correct time series.
Additionally, when performing granularity calculations that are global (not specific to a particular time dimension), we recommend you always operate on metric_time
and you will get the correct answer.
Note that metric_time
should be available in addition to any other time dimensions that are available for the metric(s). In the case where you are looking at one metric (or multiple metrics from the same data source), the values in the series for the primary time dimension and metric_time
are equivalent.
Examples
Refer to the following examples to help you get started with the JDBC API.
Fetch metadata for metrics
You can filter/add any SQL outside of the templating syntax. For example, you can use the following query to fetch the name and dimensions for a metric:
select name, dimensions from {{
semantic_layer.metrics()
}}
WHERE name='food_order_amount'
Query common dimensions
You can select common dimensions for multiple metrics. Use the following query to fetch the name and dimensions for multiple metrics:
select * from {{
semantic_layer.dimensions(metrics=['food_order_amount', 'order_gross_profit'])
}}
Query grouped by time
The following example query uses the shorthand method to fetch revenue and new customers grouped by time:
select * from {{
semantic_layer.query(metrics=['food_order_amount','order_gross_profit'],
group_by=['metric_time'])
}}
Query with a time grain
Use the following example query to fetch multiple metrics with a change in time dimension granularities:
select * from {{
semantic_layer.query(metrics=['food_order_amount', 'order_gross_profit'],
group_by=[Dimension('metric_time').grain('month')])
}}
Group by categorical dimension
Use the following query to group by a categorical dimension:
select * from {{
semantic_layer.query(metrics=['food_order_amount', 'order_gross_profit'],
group_by=[Dimension('metric_time').grain('month'), 'customer__customer_type'])
}}
Query with where filters
Where filters in API allow for a filter list or string. We recommend using the filter list for production applications as this format will realize all benefits from the Predicate pushdown where possible.
Where filters have the following components that you can use:
Dimension()
- This is used for any categorical or time dimensions. If used for a time dimension, granularity is required -Dimension('metric_time').grain('week')
orDimension('customer__country')
TimeDimension()
- This is used for all time dimensions and requires a granularity argument -TimeDimension('metric_time', 'MONTH)
Entity()
- This is used for entities like primary and foreign keys -Entity('order_id')
Use the following example to query using a where
filter with the string format:
select * from {{
semantic_layer.query(metrics=['food_order_amount', 'order_gross_profit'],
group_by=[Dimension('metric_time').grain('month'),'customer__customer_type'],
where="{{ TimeDimension('metric_time', 'MONTH') }} >= '2017-03-09' AND {{ Dimension('customer__customer_type' }} in ('new') AND {{ Entity('order_id') }} = 10")
}}
Use the following example to query using a where
filter with a filter list format:
select * from {{
semantic_layer.query(metrics=['food_order_amount', 'order_gross_profit'],
group_by=[Dimension('metric_time').grain('month'),'customer__customer_type'],
where=[{{ TimeDimension('metric_time', 'MONTH')}} >= '2017-03-09', {{ Dimension('customer__customer_type' }} in ('new'), {{ Entity('order_id') }} = 10])
}}
Query with a limit and order by
Use the following example to query using a limit
or order_by
clauses:
select * from {{
semantic_layer.query(metrics=['food_order_amount', 'order_gross_profit'],
group_by=[Dimension('metric_time')],
limit=10,
order_by=['order_gross_profit'])
}}
Query with compile keyword
Use the following example to query using a compile
keyword:
select * from {{
semantic_layer.query(metrics=['food_order_amount', 'order_gross_profit'],
group_by=[Dimension('metric_time').grain('month'),'customer__customer_type'],
compile=True)
}}
FAQs
Why do some dimensions use different syntax, like
metric_time
versus[Dimension('metric_time')
?
When you select a dimension on its own, such asmetric_time
you can use the shorthand method which doesn't need the “Dimension” syntax. However, when you perform operations on the dimension, such as adding granularity, the object syntax[Dimension('metric_time')
is required.What does the double underscore
"__"
syntax in dimensions mean?
The double underscore"__"
syntax indicates a mapping from an entity to a dimension, as well as where the dimension is located. For example,user__country
means someone is looking at thecountry
dimension from theuser
table.What is the default output when adding granularity?
The default output follows the format{time_dimension_name}__{granularity_level}
. So for example, if the time dimension name isds
and the granularity level is yearly, the output isds__year
.