Title: | DBI Package for the DuckDB Database Management System |
---|---|
Description: | The DuckDB project is an embedded analytical data management system with support for the Structured Query Language (SQL). This package includes all of DuckDB and an R Database Interface (DBI) connector. |
Authors: | Hannes Mühleisen [aut] , Mark Raasveldt [aut] , Kirill Müller [cre] , Stichting DuckDB Foundation [cph], Apache Software Foundation [cph], PostgreSQL Global Development Group [cph], The Regents of the University of California [cph], Cameron Desrochers [cph], Victor Zverovich [cph], RAD Game Tools [cph], Valve Software [cph], Rich Geldreich [cph], Tenacious Software LLC [cph], The RE2 Authors [cph], Google Inc. [cph], Facebook Inc. [cph], Steven G. Johnson [cph], Jiahao Chen [cph], Tony Kelman [cph], Jonas Fonseca [cph], Lukas Fittl [cph], Salvatore Sanfilippo [cph], Art.sy, Inc. [cph], Oran Agra [cph], Redis Labs, Inc. [cph], Melissa O'Neill [cph], PCG Project contributors [cph] |
Maintainer: | Kirill Müller <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.1.3.9038 |
Built: | 2025-01-19 21:24:38 UTC |
Source: | https://github.com/meztez/duckdb-r |
This is a SQL backend for dbplyr tailored to take into account DuckDB's possibilities. This mainly follows the backend for PostgreSQL, but contains more mapped functions.
tbl_file()
is an experimental variant of dplyr::tbl()
to directly access files on disk.
It is safer than dplyr::tbl()
because there is no risk of misinterpreting the request,
and paths with special characters are supported.
tbl_function()
is an experimental variant of dplyr::tbl()
to create a lazy table from a table-generating function,
useful for reading nonstandard CSV files or other data sources.
It is safer than dplyr::tbl()
because there is no risk of misinterpreting the query.
See https://duckdb.org/docs/data/overview for details on data importing functions.
As an alternative, use dplyr::tbl(src, dplyr::sql("SELECT ... FROM ..."))
for custom SQL queries.
tbl_query()
is deprecated in favor of tbl_function()
.
Use simulate_duckdb()
with lazy_frame()
to see simulated SQL without opening a DuckDB connection.
tbl_file(src, path, ..., cache = FALSE) tbl_function(src, query, ..., cache = FALSE) tbl_query(src, query, ...) simulate_duckdb(...)
tbl_file(src, path, ..., cache = FALSE) tbl_function(src, query, ..., cache = FALSE) tbl_query(src, query, ...) simulate_duckdb(...)
src |
A duckdb connection object |
path |
Path to existing Parquet, CSV or JSON file |
... |
Any parameters to be forwarded |
cache |
Enable object cache for Parquet files |
query |
SQL code, omitting the |
library(dplyr, warn.conflicts = FALSE) con <- DBI::dbConnect(duckdb(), path = ":memory:") db <- copy_to(con, data.frame(a = 1:3, b = letters[2:4])) db %>% filter(a > 1) %>% select(b) path <- tempfile(fileext = ".csv") write.csv(data.frame(a = 1:3, b = letters[2:4])) db_csv <- tbl_file(con, path) db_csv %>% summarize(sum_a = sum(a)) db_csv_fun <- tbl_function(con, paste0("read_csv_auto('", path, "')")) db_csv %>% count() DBI::dbDisconnect(con, shutdown = TRUE)
library(dplyr, warn.conflicts = FALSE) con <- DBI::dbConnect(duckdb(), path = ":memory:") db <- copy_to(con, data.frame(a = 1:3, b = letters[2:4])) db %>% filter(a > 1) %>% select(b) path <- tempfile(fileext = ".csv") write.csv(data.frame(a = 1:3, b = letters[2:4])) db_csv <- tbl_file(con, path) db_csv %>% summarize(sum_a = sum(a)) db_csv_fun <- tbl_function(con, paste0("read_csv_auto('", path, "')")) db_csv %>% count() DBI::dbDisconnect(con, shutdown = TRUE)
duckdb()
creates or reuses a database instance.
duckdb_shutdown()
shuts down a database instance.
Return an adbcdrivermanager::adbc_driver()
for use with Arrow Database
Connectivity via the adbcdrivermanager package.
dbConnect()
connects to a database instance.
dbDisconnect()
closes a DuckDB database connection.
The associated DuckDB database instance is shut down automatically,
it is no longer necessary to set shutdown = TRUE
or to call duckdb_shutdown()
.
duckdb( dbdir = DBDIR_MEMORY, read_only = FALSE, bigint = "numeric", config = list(), ..., environment_scan = FALSE ) duckdb_shutdown(drv) duckdb_adbc() ## S4 method for signature 'duckdb_driver' dbConnect( drv, dbdir = DBDIR_MEMORY, ..., debug = getOption("duckdb.debug", FALSE), read_only = FALSE, timezone_out = "UTC", tz_out_convert = c("with", "force"), config = list(), bigint = "numeric" ) ## S4 method for signature 'duckdb_connection' dbDisconnect(conn, ..., shutdown = TRUE)
duckdb( dbdir = DBDIR_MEMORY, read_only = FALSE, bigint = "numeric", config = list(), ..., environment_scan = FALSE ) duckdb_shutdown(drv) duckdb_adbc() ## S4 method for signature 'duckdb_driver' dbConnect( drv, dbdir = DBDIR_MEMORY, ..., debug = getOption("duckdb.debug", FALSE), read_only = FALSE, timezone_out = "UTC", tz_out_convert = c("with", "force"), config = list(), bigint = "numeric" ) ## S4 method for signature 'duckdb_connection' dbDisconnect(conn, ..., shutdown = TRUE)
dbdir |
Location for database files. Should be a path to an existing
directory in the file system. With the default (or |
read_only |
Set to |
bigint |
How 64-bit integers should be returned. There are two options: |
config |
Named list with DuckDB configuration flags, see https://duckdb.org/docs/configuration/overview#configuration-reference for the possible options. These flags are only applied when the database object is instantiated. Subsequent connections will silently ignore these flags. |
... |
Reserved for future extensions, must be empty. |
environment_scan |
Set to |
drv |
Object returned by |
debug |
Print additional debug information, such as queries. |
timezone_out |
The time zone returned to R, defaults to |
tz_out_convert |
How to convert timestamp columns to the timezone specified
in |
conn |
A |
shutdown |
Unused. The database instance is shut down automatically. |
duckdb()
returns an object of class duckdb_driver.
dbDisconnect()
and duckdb_shutdown()
are called for their
side effect.
An object of class "adbc_driver"
dbConnect()
returns an object of class duckdb_connection.
library(adbcdrivermanager) with_adbc(db <- adbc_database_init(duckdb_adbc()), { as.data.frame(read_adbc(db, "SELECT 1 as one;")) }) drv <- duckdb() con <- dbConnect(drv) dbGetQuery(con, "SELECT 'Hello, world!'") dbDisconnect(con) duckdb_shutdown(drv) # Shorter: con <- dbConnect(duckdb()) dbGetQuery(con, "SELECT 'Hello, world!'") dbDisconnect(con, shutdown = TRUE)
library(adbcdrivermanager) with_adbc(db <- adbc_database_init(duckdb_adbc()), { as.data.frame(read_adbc(db, "SELECT 1 as one;")) }) drv <- duckdb() con <- dbConnect(drv) dbGetQuery(con, "SELECT 'Hello, world!'") dbDisconnect(con) duckdb_shutdown(drv) # Shorter: con <- dbConnect(duckdb()) dbGetQuery(con, "SELECT 'Hello, world!'") dbDisconnect(con, shutdown = TRUE)
Get the Substrait plan for a SQL query Transforms a SQL query into a raw vector containing the serialized Substrait query blob
duckdb_get_substrait(conn, query, enable_optimizer = TRUE)
duckdb_get_substrait(conn, query, enable_optimizer = TRUE)
conn |
A DuckDB connection, created by |
query |
The query string in SQL |
enable_optimizer |
Optional parameter to enable/disable query-optimizer. By default optimizer is enabled. |
A raw vector containing the substrait protobuf blob
Get the Substrait plan for a SQL query in the JSON format Transforms a SQL query into a vector containing the serialized Substrait query JSON
duckdb_get_substrait_json(conn, query, enable_optimizer = TRUE)
duckdb_get_substrait_json(conn, query, enable_optimizer = TRUE)
conn |
A DuckDB connection, created by |
query |
The query string in SQL |
enable_optimizer |
Optional parameter to enable/disable query-optimizer. By default optimizer is enabled. |
A vector containing the substrait protobuf JSON
Query DuckDB using Substrait Method for interpreting a Substrait BLOB plan as a DuckDB Query Plan It interprets and executes the query.
duckdb_prepare_substrait(conn, query, arrow = FALSE)
duckdb_prepare_substrait(conn, query, arrow = FALSE)
conn |
A DuckDB connection, created by |
query |
The Protobuf-encoded Substrait Query Plan. Qack! |
arrow |
Whether the result should be in Arrow format |
A DuckDB Query Result
Query DuckDB using Substrait Method for interpreting a Substrait JSON plan as a DuckDB Query Plan It interprets and executes the query.
duckdb_prepare_substrait_json(conn, json, arrow = FALSE)
duckdb_prepare_substrait_json(conn, json, arrow = FALSE)
conn |
A DuckDB connection, created by |
json |
The Json Query Plan. Qack! |
arrow |
Whether the result should be in Arrow format |
A DuckDB Query Result
Directly reads a CSV file into DuckDB, tries to detect and create the correct schema for it. This usually is much faster than reading the data into R and writing it to DuckDB.
duckdb_read_csv( conn, name, files, ..., header = TRUE, na.strings = "", nrow.check = 500, delim = ",", quote = "\"", col.names = NULL, col.types = NULL, lower.case.names = FALSE, sep = delim, transaction = TRUE, temporary = FALSE )
duckdb_read_csv( conn, name, files, ..., header = TRUE, na.strings = "", nrow.check = 500, delim = ",", quote = "\"", col.names = NULL, col.types = NULL, lower.case.names = FALSE, sep = delim, transaction = TRUE, temporary = FALSE )
conn |
A DuckDB connection, created by |
name |
The name for the virtual table that is registered or unregistered |
files |
One or more CSV file names, should all have the same structure though |
... |
Reserved for future extensions, must be empty. |
header |
Whether or not the CSV files have a separate header in the first line |
na.strings |
Which strings in the CSV files should be considered to be NULL |
nrow.check |
How many rows should be read from the CSV file to figure out data types |
delim |
Which field separator should be used |
quote |
Which quote character is used for columns in the CSV file |
col.names |
Override the detected or generated column names |
col.types |
Character vector of column types in the same order as col.names, or a named character vector where names are column names and types pairs. Valid types are DuckDB data types, e.g. VARCHAR, DOUBLE, DATE, BIGINT, BOOLEAN, etc. |
lower.case.names |
Transform column names to lower case |
sep |
Alias for delim for compatibility |
transaction |
Should a transaction be used for the entire operation |
temporary |
Set to |
If the table already exists in the database, the csv is appended to it. Otherwise the table is created.
The number of rows in the resulted table, invisibly.
con <- dbConnect(duckdb()) data <- data.frame(a = 1:3, b = letters[1:3]) path <- tempfile(fileext = ".csv") write.csv(data, path, row.names = FALSE) duckdb_read_csv(con, "data", path) dbReadTable(con, "data") dbDisconnect(con) # Providing data types for columns path <- tempfile(fileext = ".csv") write.csv(iris, path, row.names = FALSE) con <- dbConnect(duckdb()) duckdb_read_csv(con, "iris", path, col.types = c( Sepal.Length = "DOUBLE", Sepal.Width = "DOUBLE", Petal.Length = "DOUBLE", Petal.Width = "DOUBLE", Species = "VARCHAR" ) ) dbReadTable(con, "iris") dbDisconnect(con)
con <- dbConnect(duckdb()) data <- data.frame(a = 1:3, b = letters[1:3]) path <- tempfile(fileext = ".csv") write.csv(data, path, row.names = FALSE) duckdb_read_csv(con, "data", path) dbReadTable(con, "data") dbDisconnect(con) # Providing data types for columns path <- tempfile(fileext = ".csv") write.csv(iris, path, row.names = FALSE) con <- dbConnect(duckdb()) duckdb_read_csv(con, "iris", path, col.types = c( Sepal.Length = "DOUBLE", Sepal.Width = "DOUBLE", Petal.Length = "DOUBLE", Petal.Width = "DOUBLE", Species = "VARCHAR" ) ) dbReadTable(con, "iris") dbDisconnect(con)
duckdb_register()
registers a data frame as a virtual table (view)
in a DuckDB connection.
No data is copied.
duckdb_register(conn, name, df, overwrite = FALSE, experimental = FALSE) duckdb_unregister(conn, name)
duckdb_register(conn, name, df, overwrite = FALSE, experimental = FALSE) duckdb_unregister(conn, name)
conn |
A DuckDB connection, created by |
name |
The name for the virtual table that is registered or unregistered |
df |
A |
overwrite |
Should an existing registration be overwritten? |
experimental |
Enable experimental optimizations |
duckdb_unregister()
unregisters a previously registered data frame.
These functions are called for their side effect.
con <- dbConnect(duckdb()) data <- data.frame(a = 1:3, b = letters[1:3]) duckdb_register(con, "data", data) dbReadTable(con, "data") duckdb_unregister(con, "data") dbDisconnect(con)
con <- dbConnect(duckdb()) data <- data.frame(a = 1:3, b = letters[1:3]) duckdb_register(con, "data", data) dbReadTable(con, "data") duckdb_unregister(con, "data") dbDisconnect(con)
duckdb_register_arrow()
registers an Arrow data source as a virtual table (view)
in a DuckDB connection.
No data is copied.
duckdb_register_arrow(conn, name, arrow_scannable, use_async = NULL) duckdb_unregister_arrow(conn, name) duckdb_list_arrow(conn)
duckdb_register_arrow(conn, name, arrow_scannable, use_async = NULL) duckdb_unregister_arrow(conn, name) duckdb_list_arrow(conn)
conn |
A DuckDB connection, created by |
name |
The name for the virtual table that is registered or unregistered |
arrow_scannable |
A scannable Arrow-object |
use_async |
Switched to the asynchronous scanner. (deprecated) |
duckdb_unregister_arrow()
unregisters a previously registered data frame.
These functions are called for their side effect.