| Title: | Scalable Spatial Data Analysis Using 'SedonaDB' |
|---|---|
| Description: | Provides scalable spatial operations on vector and raster data using 'Apache SedonaDB' ('DataFusion'-based spatial engine) as backend. Enables efficient processing of large spatial datasets without loading all data into 'R' memory, leveraging 'DuckDB' and 'Arrow' for high-performance I/O. |
| Authors: | Egor Kotov [aut, cre, cph] (ORCID: <https://orcid.org/0000-0001-6690-5345>) |
| Maintainer: | Egor Kotov <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.0.0.9000 |
| Built: | 2026-05-25 06:23:38 UTC |
| Source: | https://github.com/e-kotov/sx |
Registers an sf object, sedonadb_dataframe, or dbplyr query as a named view in SedonaDB.
This operation is lazy; it does not move data until a query is executed.
sx_as_view(x, name = NULL, overwrite = TRUE, verbosity = NULL)sx_as_view(x, name = NULL, overwrite = TRUE, verbosity = NULL)
x |
A |
name |
Character. The name to register the view as. If |
overwrite |
Logical. If |
verbosity |
Character or NULL. Controls message output for this function call.
If NULL (the default), uses the global |
A sedonadb_dataframe pointing to the registered view (invisibly).
sx_create_table() to materialize data into memory.
Compute a buffer around this geometry.
sx_buffer( x, dist, output = NULL, view_name = NULL, verbosity = NULL, use_s2 = NULL, ... )sx_buffer( x, dist, output = NULL, view_name = NULL, verbosity = NULL, use_s2 = NULL, ... )
x |
Input object (sf, sedonadb_dataframe, or character view name). |
dist |
A single numeric value representing the buffer distance. Note: SedonaDB currently does not support column-based distances for buffering. |
output |
Character or NULL. Output type: Output types:
|
view_name |
Character (optional). Name to register the result as a persistent view in the active backend. If NULL (default), returns the result directly without creating a view. Not all backends support named views. Check backend-specific documentation for availability. |
verbosity |
Character or NULL. Controls message output for this function call.
If NULL (the default), uses the global |
use_s2 |
Logical or NULL. Controls spherical geometry (S2) for this operation.
|
... |
Ignored. Used to catch and warn about unsupported sf arguments. |
Result (type depends on output)
Compute the centroid of this geometry.
sx_centroid( x, of_largest_polygon = FALSE, output = NULL, view_name = NULL, verbosity = NULL, ... )sx_centroid( x, of_largest_polygon = FALSE, output = NULL, view_name = NULL, verbosity = NULL, ... )
x |
Input object (sf, sedonadb_dataframe, or character view name). |
of_largest_polygon |
Logical. If |
output |
Character or NULL. Output type: Output types:
|
view_name |
Character (optional). Name to register the result as a persistent view in the active backend. If NULL (default), returns the result directly without creating a view. Not all backends support named views. Check backend-specific documentation for availability. |
verbosity |
Character or NULL. Controls message output for this function call.
If NULL (the default), uses the global |
... |
Ignored. Used to catch and warn about unsupported sf arguments. |
Result (type depends on output)
Materializes a lazy sedonadb_dataframe into an R sf object. Checks row count against a safety threshold before downloading to prevent crashes from massive datasets.
sx_collect(x, force = FALSE, verbosity = NULL)sx_collect(x, force = FALSE, verbosity = NULL)
x |
A |
force |
Logical. If |
verbosity |
Character or NULL. Controls message output for this function call.
If NULL (the default), uses the global |
An sf object
library(sf) # Load sample data nc <- st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE) box <- st_bbox(nc[1:10, ]) |> st_as_sfc() |> st_as_sf() # ------------------------------------------------------------------- # Example 1: Collect from sedonadb_dataframe (lazy result) # ------------------------------------------------------------------- lazy_result <- sx_filter(nc, box, output = "lazy") sf_result <- sx_collect(lazy_result) class(sf_result) # "sf" # ------------------------------------------------------------------- # Example 2: Collect from a named view # ------------------------------------------------------------------- sx_as_view(nc, "nc_data") sf_from_view <- sx_collect("nc_data") # --- # Note: sx_collect has a safety limit (default 1M rows). # Use force = TRUE to bypass if you're sure about memory capacity: # sx_collect(large_lazy_result, force = TRUE)library(sf) # Load sample data nc <- st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE) box <- st_bbox(nc[1:10, ]) |> st_as_sfc() |> st_as_sf() # ------------------------------------------------------------------- # Example 1: Collect from sedonadb_dataframe (lazy result) # ------------------------------------------------------------------- lazy_result <- sx_filter(nc, box, output = "lazy") sf_result <- sx_collect(lazy_result) class(sf_result) # "sf" # ------------------------------------------------------------------- # Example 2: Collect from a named view # ------------------------------------------------------------------- sx_as_view(nc, "nc_data") sf_from_view <- sx_collect("nc_data") # --- # Note: sx_collect has a safety limit (default 1M rows). # Use force = TRUE to bypass if you're sure about memory capacity: # sx_collect(large_lazy_result, force = TRUE)
Forces computation of a lazy view or dataframe and stores the result as a
materialized table in SedonaDB memory (Apache Arrow MemTable).
This is useful for caching intermediate results to speed up subsequent queries.
sx_create_table(x, name = NULL, overwrite = TRUE, verbosity = NULL)sx_create_table(x, name = NULL, overwrite = TRUE, verbosity = NULL)
x |
A |
name |
Character. The name to register the table as. If |
overwrite |
Logical. If |
verbosity |
Character or NULL. Controls message output for this function call.
If NULL (the default), uses the global |
A sedonadb_dataframe pointing to the materialized table (invisibly).
## Not run: # 1. Define a lazy operation lazy_df <- sx_buffer("input_view", 10) # 2. Materialize it into memory as 'buffered_table' sx_create_table(lazy_df, "buffered_table") ## End(Not run)## Not run: # 1. Define a lazy operation lazy_df <- sx_buffer("input_view", 10) # 2. Materialize it into memory as 'buffered_table' sx_create_table(lazy_df, "buffered_table") ## End(Not run)
Extracts the Coordinate Reference System (CRS) from a SedonaDB view name
or sedonadb_dataframe by reading the Arrow schema metadata.
This function returns a standard sf::crs object, ensuring full compatibility
with the sf ecosystem.
sx_crs(x, verbosity = NULL)sx_crs(x, verbosity = NULL)
x |
A |
verbosity |
Character or NULL. Controls message output for this function call.
If NULL (the default), uses the global |
An object of class crs (from the sf package).
x <- sx_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE) sx_crs(x)x <- sx_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE) sx_crs(x)
sx implements these dplyr::dplyr verbs for sedonadb_dataframe objects.
See the documentation of the original generic functions for usage details.
These methods translate R expressions to SQL queries executed by SedonaDB (DataFusion).
dplyr::select(): Choose columns.
dplyr::rename(): Rename columns.
dplyr::mutate(): Add or modify columns.
dplyr::filter(): Filter rows.
dplyr::arrange(): Sort rows.
dplyr::distinct(): Keep unique rows (.keep_all = TRUE is not supported).
dplyr::pull(): Extract a single column.
dplyr::collect(): Force computation and return R object.
sx implements dplyr::dplyr joins for sedonadb_dataframe objects.
See dplyr::left_join() and dplyr::inner_join() for details.
Returns a data frame of spatial drivers supported by the current backend. For the DuckDB reader, this includes all GDAL-supported formats.
sx_drivers(verbosity = NULL)sx_drivers(verbosity = NULL)
verbosity |
Character or NULL. Controls message output for this function call.
If NULL (the default), uses the global |
A data frame with columns:
short_name: Driver short name (e.g., "GPKG")
long_name: Descriptive name (e.g., "GeoPackage")
can_read: Logical. Whether the driver supports reading.
can_write: Logical. Whether the driver supports writing.
can_create: Logical. Whether the driver supports creating new files.
# List all supported drivers drivers <- sx_drivers() head(drivers) # Check for specific driver "GPKG" %in% drivers$short_name# List all supported drivers drivers <- sx_drivers() head(drivers) # Check for specific driver "GPKG" %in% drivers$short_name
Drop a registered view or table
sx_drop_view(name)sx_drop_view(name)
name |
Character. Name of the view/table to drop. |
Invisibly returns the name.
Ingests a DuckDB table, view, or dbplyr query into SedonaDB.
This operation uses zero-copy Arrow streaming to transfer data efficiently
between DuckDB and SedonaDB.
sx_duckdb_to_sedona( data, conn = NULL, name = NULL, materialize = TRUE, verbosity = NULL )sx_duckdb_to_sedona( data, conn = NULL, name = NULL, materialize = TRUE, verbosity = NULL )
data |
A |
conn |
A |
name |
Character string. Optional name for the registered view in SedonaDB.
If |
materialize |
Logical. If |
verbosity |
Character or NULL. Controls message output for this function call.
If NULL (the default), uses the global |
A sedonadb_dataframe.
## Not run: library(sx) library(dplyr) library(dbplyr) library(duckdb) con <- dbConnect(duckdb()) dbExecute(con, "INSTALL spatial; LOAD spatial;") # Create a table in DuckDB dbExecute(con, "CREATE TABLE points (id INTEGER, geom GEOMETRY)") dbExecute(con, "INSERT INTO points VALUES (1, ST_Point(0,0))") # Option 1: Ingest from dbplyr object (connection auto-detected) tbl_points <- tbl(con, "points") sdf <- sx_duckdb_to_sedona(tbl_points) # Option 2: Ingest from table name (requires connection) sdf_2 <- sx_duckdb_to_sedona("points", conn = con) # Option 3: Ingest and register as named view sx_duckdb_to_sedona("points", conn = con, name = "sedona_points_view") dbDisconnect(con) ## End(Not run)## Not run: library(sx) library(dplyr) library(dbplyr) library(duckdb) con <- dbConnect(duckdb()) dbExecute(con, "INSTALL spatial; LOAD spatial;") # Create a table in DuckDB dbExecute(con, "CREATE TABLE points (id INTEGER, geom GEOMETRY)") dbExecute(con, "INSERT INTO points VALUES (1, ST_Point(0,0))") # Option 1: Ingest from dbplyr object (connection auto-detected) tbl_points <- tbl(con, "points") sdf <- sx_duckdb_to_sedona(tbl_points) # Option 2: Ingest from table name (requires connection) sdf_2 <- sx_duckdb_to_sedona("points", conn = con) # Option 3: Ingest and register as named view sx_duckdb_to_sedona("points", conn = con, name = "sedona_points_view") dbDisconnect(con) ## End(Not run)
Compute the envelope (bounding box) of this geometry.
sx_envelope(x, output = NULL, view_name = NULL, verbosity = NULL, ...)sx_envelope(x, output = NULL, view_name = NULL, verbosity = NULL, ...)
x |
Input object (sf, sedonadb_dataframe, or character view name). |
output |
Character or NULL. Output type: Output types:
|
view_name |
Character (optional). Name to register the result as a persistent view in the active backend. If NULL (default), returns the result directly without creating a view. Not all backends support named views. Check backend-specific documentation for availability. |
verbosity |
Character or NULL. Controls message output for this function call.
If NULL (the default), uses the global |
... |
Ignored. Used to catch and warn about unsupported sf arguments. |
Result (type depends on output)
Filters an sf object spatially based on its relationship with another sf object
or geometry. All calculations are performed in SedonaDB for efficiency.
Supports lazy evaluation returning sedonadb_dataframe objects.
sx_filter( x, y, predicate = "intersects", distance = NULL, output = NULL, view_name = NULL, verbosity = NULL, use_s2 = NULL, ... )sx_filter( x, y, predicate = "intersects", distance = NULL, output = NULL, view_name = NULL, verbosity = NULL, use_s2 = NULL, ... )
x |
A |
y |
A |
predicate |
Character. Spatial predicate to use. One of: |
distance |
Numeric (optional). Distance threshold for the |
output |
Character or NULL. Output type: Output types:
|
view_name |
Character (optional). Name to register the result as a persistent view in the active backend. If NULL (default), returns the result directly without creating a view. Not all backends support named views. Check backend-specific documentation for availability. |
verbosity |
Character or NULL. Controls message output for this function call.
If NULL (the default), uses the global |
use_s2 |
Logical or NULL. Controls spherical geometry (S2) for this operation.
|
... |
Additional arguments passed to methods. |
An sf object or sedonadb_dataframe (depending on output).
library(sf) nc <- st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE) box <- st_bbox(nc[1:10, ]) |> st_as_sfc() |> st_as_sf() # ------------------------------------------------------------------- # Example 1: Using sf objects (most common) # ------------------------------------------------------------------- # Default lazy return lazy_result <- sx_filter(nc, box, predicate = "intersects") # Materialize to sf sf_result <- sx_collect(lazy_result) # Or request sf directly sf_result <- sx_filter(nc, box, output = "sf") # ------------------------------------------------------------------- # Example 2: Using sedonadb_dataframe (chain lazy operations) # ------------------------------------------------------------------- # First filter returns lazy result step1 <- sx_filter(nc, box, predicate = "intersects", output = "lazy") # Chain with another operation (filter by a different geometry) box2 <- st_bbox(nc[20:25, ]) |> st_as_sfc() |> st_as_sf() step2 <- sx_filter(step1, box2, predicate = "intersects", output = "lazy") # Collect final result final <- sx_collect(step2) # ------------------------------------------------------------------- # Example 3: Using pre-registered view names # ------------------------------------------------------------------- sx_as_view(nc, "nc_counties") sx_as_view(box, "filter_box") # Filter using view names result <- sx_filter("nc_counties", "filter_box", output = "sf") # ------------------------------------------------------------------- # Example 4: Different predicates # ------------------------------------------------------------------- # Filter with "within" predicate within_result <- sx_filter(nc, box, predicate = "within", output = "sf") # ------------------------------------------------------------------- # Example 5: Interoperability Outputs # ------------------------------------------------------------------- # Export as geoarrow for Arrow/Parquet workflows geoarrow_result <- sx_filter(nc, box, output = "geoarrow") # Export as raw WKB for DuckDB/PostGIS import raw_result <- sx_filter(nc, box, output = "raw")library(sf) nc <- st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE) box <- st_bbox(nc[1:10, ]) |> st_as_sfc() |> st_as_sf() # ------------------------------------------------------------------- # Example 1: Using sf objects (most common) # ------------------------------------------------------------------- # Default lazy return lazy_result <- sx_filter(nc, box, predicate = "intersects") # Materialize to sf sf_result <- sx_collect(lazy_result) # Or request sf directly sf_result <- sx_filter(nc, box, output = "sf") # ------------------------------------------------------------------- # Example 2: Using sedonadb_dataframe (chain lazy operations) # ------------------------------------------------------------------- # First filter returns lazy result step1 <- sx_filter(nc, box, predicate = "intersects", output = "lazy") # Chain with another operation (filter by a different geometry) box2 <- st_bbox(nc[20:25, ]) |> st_as_sfc() |> st_as_sf() step2 <- sx_filter(step1, box2, predicate = "intersects", output = "lazy") # Collect final result final <- sx_collect(step2) # ------------------------------------------------------------------- # Example 3: Using pre-registered view names # ------------------------------------------------------------------- sx_as_view(nc, "nc_counties") sx_as_view(box, "filter_box") # Filter using view names result <- sx_filter("nc_counties", "filter_box", output = "sf") # ------------------------------------------------------------------- # Example 4: Different predicates # ------------------------------------------------------------------- # Filter with "within" predicate within_result <- sx_filter(nc, box, predicate = "within", output = "sf") # ------------------------------------------------------------------- # Example 5: Interoperability Outputs # ------------------------------------------------------------------- # Export as geoarrow for Arrow/Parquet workflows geoarrow_result <- sx_filter(nc, box, output = "geoarrow") # Export as raw WKB for DuckDB/PostGIS import raw_result <- sx_filter(nc, box, output = "raw")
Retrieves the name of the geometry column from an sf object,
sedonadb_dataframe, or a registered SedonaDB view.
sx_geometry_column(x)sx_geometry_column(x)
x |
A |
Character. The name of the geometry column.
library(sf) nc <- st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE) sx_geometry_column(nc) # "geometry" sdf <- sx_as_view(nc, "nc_view") sx_geometry_column("nc_view") # "geometry"library(sf) nc <- st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE) sx_geometry_column(nc) # "geometry" sdf <- sx_as_view(nc, "nc_view") sx_geometry_column("nc_view") # "geometry"
Transfers attribute data from a source spatial layer to a target spatial layer based
on the area of overlap between their geometries. All calculations are performed in
SedonaDB for efficiency.
Supports lazy evaluation returning sedonadb_dataframe objects.
sx_interpolate_aw( target, source, tid, sid, extensive = NULL, intensive = NULL, weight = "sum", output = NULL, view_name = NULL, keep_NA = TRUE, na.rm = FALSE, join_crs = NULL, verbosity = NULL, use_s2 = NULL, ... )sx_interpolate_aw( target, source, tid, sid, extensive = NULL, intensive = NULL, weight = "sum", output = NULL, view_name = NULL, keep_NA = TRUE, na.rm = FALSE, join_crs = NULL, verbosity = NULL, use_s2 = NULL, ... )
target |
A |
source |
A |
tid |
Character. Unique ID column name in |
sid |
Character. Unique ID column name in |
extensive |
Character vector. Columns in |
intensive |
Character vector. Columns in |
weight |
Character. Denominator for extensive variables: "sum" (default) or "total". |
output |
Character or NULL. Output type: Output types:
|
view_name |
Character (optional). Name to register the result as a persistent view in the active backend. If NULL (default), returns the result directly without creating a view. Not all backends support named views. Check backend-specific documentation for availability. |
keep_NA |
Logical. If TRUE, output includes all target features (LEFT JOIN). |
na.rm |
Logical. If TRUE, source features with NA values are ignored. |
join_crs |
Numeric or Character (optional). EPSG code or WKT for CRS transform during calc. |
verbosity |
Character or NULL. Controls message output for this function call.
If NULL (the default), uses the global |
use_s2 |
Logical or NULL. Controls spherical geometry (S2) for this operation.
|
... |
Ignored. Used to catch and warn about unsupported sf arguments. |
Areal-weighted interpolation assumes uniform distribution of values within source polygons.
Coordinate Systems:
Area calculations are sensitive to CRS. It is strongly recommended to use a projected CRS.
Use the join_crs argument to project data on-the-fly during the interpolation.
Extensive vs. Intensive Variables:
Extensive (counts, sums): Value is divided proportionally to area.
Use weight="sum" (relative to target coverage) or weight="total" (relative to source area).
Intensive (rates, densities): Value is averaged based on partial areas. Always uses intersection area weighting.
An sf object, sedonadb_dataframe, or tibble.
areal::aw_interpolate() for reference implementation.
library(sf) # 1. Prepare Data # Load NC counties (source) and project to Albers (EPSG:5070) nc <- st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE) nc <- st_transform(nc, 5070) nc$sid <- seq_len(nrow(nc)) # Create a target grid grid <- st_make_grid(nc, n = c(10, 5)) |> st_as_sf() grid$tid <- seq_len(nrow(grid)) # ------------------------------------------------------------------- # Example 1: Using sf objects directly (most common use case) # ------------------------------------------------------------------- # Extensive interpolation (total counts, e.g., births) result_ext <- sx_interpolate_aw( target = grid, source = nc, tid = "tid", sid = "sid", extensive = "BIR74", weight = "total", output = "sf" ) # Check mass preservation (should be ~1.0) sum(result_ext$BIR74, na.rm = TRUE) / sum(nc$BIR74) # Intensive interpolation (rates/densities) result_int <- sx_interpolate_aw( target = grid, source = nc, tid = "tid", sid = "sid", intensive = "BIR74", output = "sf" ) # ------------------------------------------------------------------- # Example 2: Using sedonadb_dataframe (lazy evaluation) # ------------------------------------------------------------------- # First operation returns lazy result lazy_result <- sx_interpolate_aw( target = grid, source = nc, tid = "tid", sid = "sid", extensive = c("BIR74", "BIR79"), output = "sedonadb_dataframe" ) # Materialize when ready final_sf <- sx_collect(lazy_result) # ------------------------------------------------------------------- # Example 3: Using pre-registered SedonaDB view names # ------------------------------------------------------------------- # Register data as views sx_as_view(nc, "nc_counties") sx_as_view(grid, "target_grid") # Use view names as input result_from_views <- sx_interpolate_aw( target = "target_grid", source = "nc_counties", tid = "tid", sid = "sid", extensive = "BIR74", output = "sf" ) # Quick visualization plot(result_ext["BIR74"], main = "Interpolated Births (1974)", border = NA) # ------------------------------------------------------------------- # Example 4: Arrow ecosystem # ------------------------------------------------------------------- # Export as geoarrow for zero-copy Parquet writing geo_result <- sx_interpolate_aw(grid, nc, "tid", "sid", extensive = "BIR74", output = "geoarrow")library(sf) # 1. Prepare Data # Load NC counties (source) and project to Albers (EPSG:5070) nc <- st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE) nc <- st_transform(nc, 5070) nc$sid <- seq_len(nrow(nc)) # Create a target grid grid <- st_make_grid(nc, n = c(10, 5)) |> st_as_sf() grid$tid <- seq_len(nrow(grid)) # ------------------------------------------------------------------- # Example 1: Using sf objects directly (most common use case) # ------------------------------------------------------------------- # Extensive interpolation (total counts, e.g., births) result_ext <- sx_interpolate_aw( target = grid, source = nc, tid = "tid", sid = "sid", extensive = "BIR74", weight = "total", output = "sf" ) # Check mass preservation (should be ~1.0) sum(result_ext$BIR74, na.rm = TRUE) / sum(nc$BIR74) # Intensive interpolation (rates/densities) result_int <- sx_interpolate_aw( target = grid, source = nc, tid = "tid", sid = "sid", intensive = "BIR74", output = "sf" ) # ------------------------------------------------------------------- # Example 2: Using sedonadb_dataframe (lazy evaluation) # ------------------------------------------------------------------- # First operation returns lazy result lazy_result <- sx_interpolate_aw( target = grid, source = nc, tid = "tid", sid = "sid", extensive = c("BIR74", "BIR79"), output = "sedonadb_dataframe" ) # Materialize when ready final_sf <- sx_collect(lazy_result) # ------------------------------------------------------------------- # Example 3: Using pre-registered SedonaDB view names # ------------------------------------------------------------------- # Register data as views sx_as_view(nc, "nc_counties") sx_as_view(grid, "target_grid") # Use view names as input result_from_views <- sx_interpolate_aw( target = "target_grid", source = "nc_counties", tid = "tid", sid = "sid", extensive = "BIR74", output = "sf" ) # Quick visualization plot(result_ext["BIR74"], main = "Interpolated Births (1974)", border = NA) # ------------------------------------------------------------------- # Example 4: Arrow ecosystem # ------------------------------------------------------------------- # Export as geoarrow for zero-copy Parquet writing geo_result <- sx_interpolate_aw(grid, nc, "tid", "sid", extensive = "BIR74", output = "geoarrow")
Performs a spatial join between two sf objects or a spatial object and a registered SedonaDB view.
sx_join( x, y, join = "intersects", distance = NULL, left = TRUE, largest = FALSE, output = NULL, view_name = NULL, overwrite = FALSE, verbosity = NULL, use_s2 = NULL, ... )sx_join( x, y, join = "intersects", distance = NULL, left = TRUE, largest = FALSE, output = NULL, view_name = NULL, overwrite = FALSE, verbosity = NULL, use_s2 = NULL, ... )
x |
A |
y |
A |
join |
Character. Spatial predicate to use. One of: |
distance |
Numeric (optional). Distance threshold for the |
left |
Logical. If |
largest |
Logical. If |
output |
Character or NULL. Output type: Output types:
|
view_name |
Character (optional). Name to register the result as a persistent view in the active backend. If NULL (default), returns the result directly without creating a view. Not all backends support named views. Check backend-specific documentation for availability. |
overwrite |
Logical. If |
verbosity |
Character or NULL. Controls message output for this function call.
If NULL (the default), uses the global |
use_s2 |
Logical or NULL. Controls spherical geometry (S2) for this operation.
|
... |
Ignored. Used to catch and warn about unsupported sf arguments. |
Depends on output. Defaults to sedonadb_dataframe.
library(sf) nc <- st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE) # Create points to join against counties set.seed(42) pts <- st_sample(st_union(nc), 50) |> st_as_sf() pts$point_id <- seq_len(nrow(pts)) # ------------------------------------------------------------------- # Example 1: Using sf objects (most common) # ------------------------------------------------------------------- # Join point attributes to the county that contains them joined <- sx_join(pts, nc, join = "within") # ------------------------------------------------------------------- # Example 2: Using pre-registered view names # ------------------------------------------------------------------- sx_as_view(nc, "nc_counties") sx_as_view(pts, "sample_points") # Join using view names joined_from_views <- sx_join("sample_points", "nc_counties", join = "within") # ------------------------------------------------------------------- # Example 3: Create a persistent view (named output) # ------------------------------------------------------------------- # Instead of returning sf, create a named view for later use sx_join(pts, nc, join = "intersects", view_name = "points_with_county") # The view can now be used in other operations result <- sx_collect("points_with_county") # ------------------------------------------------------------------- # Example 4: Interoperability # ------------------------------------------------------------------- # Export as raw WKB for database import wkb_result <- sx_join(pts, nc, output = "raw")library(sf) nc <- st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE) # Create points to join against counties set.seed(42) pts <- st_sample(st_union(nc), 50) |> st_as_sf() pts$point_id <- seq_len(nrow(pts)) # ------------------------------------------------------------------- # Example 1: Using sf objects (most common) # ------------------------------------------------------------------- # Join point attributes to the county that contains them joined <- sx_join(pts, nc, join = "within") # ------------------------------------------------------------------- # Example 2: Using pre-registered view names # ------------------------------------------------------------------- sx_as_view(nc, "nc_counties") sx_as_view(pts, "sample_points") # Join using view names joined_from_views <- sx_join("sample_points", "nc_counties", join = "within") # ------------------------------------------------------------------- # Example 3: Create a persistent view (named output) # ------------------------------------------------------------------- # Instead of returning sf, create a named view for later use sx_join(pts, nc, join = "intersects", view_name = "points_with_county") # The view can now be used in other operations result <- sx_collect("points_with_county") # ------------------------------------------------------------------- # Example 4: Interoperability # ------------------------------------------------------------------- # Export as raw WKB for database import wkb_result <- sx_join(pts, nc, output = "raw")
Lists all layers available in a spatial file (e.g., for GeoPackage, FileGDB, or PostGIS).
sx_layers(path, verbosity = NULL)sx_layers(path, verbosity = NULL)
path |
Character string. Path to the file (local, HTTP, or S3) to inspect. |
verbosity |
Character or NULL. Controls message output for this function call.
If NULL (the default), uses the global |
A data frame with layer information, typically including layer_name,
geometry_type, and feature_count.
# Path to a multi-layer GPKG gpkg_path <- system.file("spatial/countries.geojson", package = "sx") # Example # sx_layers(gpkg_path)# Path to a multi-layer GPKG gpkg_path <- system.file("spatial/countries.geojson", package = "sx") # Example # sx_layers(gpkg_path)
Manually applies sx resource limits (threads, memory) to a given DuckDB connection.
sx_limit_duckdb_conn(conn, threads = NULL, memory_limit_gb = NULL)sx_limit_duckdb_conn(conn, threads = NULL, memory_limit_gb = NULL)
conn |
A DuckDB connection object (from |
threads |
Integer. Number of threads to use. If |
memory_limit_gb |
Numeric or character. Memory limit to use. If |
When threads or memory_limit_gb are explicitly provided, the connection is
"pinned" in an internal registry. Pinned connections are immune to global
policy enforcement (see sx_options(duckdb_enforcement_mode = "all")), ensuring
that manual engineering decisions are respected.
To unpin a connection and return it to global policy management, call this function without any limit arguments.
Invisibly returns the connection.
Returns a list of all tables and views currently registered in the SedonaDB context.
sx_list(verbosity = NULL)sx_list(verbosity = NULL)
verbosity |
Character or NULL. Controls message output for this function call.
If NULL (the default), uses the global |
A tibble containing catalog information (table_name, table_type, etc.).
Get or set global sx options
sx_options( output_type = NULL, verbosity = NULL, threads = NULL, memory_limit_gb = NULL, duckdb_threads = NULL, duckdb_memory_limit_gb = NULL, duckdb_enforcement_mode = NULL )sx_options( output_type = NULL, verbosity = NULL, threads = NULL, memory_limit_gb = NULL, duckdb_threads = NULL, duckdb_memory_limit_gb = NULL, duckdb_enforcement_mode = NULL )
output_type |
Character string. Controls the default return type for spatial operations. Must be one of:
If |
verbosity |
Character or NULL. Controls message output level:
|
threads |
Positive integer. Number of threads to use for SedonaDB parallel operations. |
memory_limit_gb |
Positive numeric. Memory limit in GB for SedonaDB backend operations.
Note: For SedonaDB/DataFusion, this setting is applied at the runtime level
and may not be visible via SQL |
duckdb_threads |
Positive integer or "Auto". Threads for DuckDB operations. |
duckdb_memory_limit_gb |
Positive numeric or "Auto". Memory limit for DuckDB (e.g., 4 or "4GB"). |
duckdb_enforcement_mode |
Character. Controls resource limit enforcement scope:
Note on Policy Immunity: Connections manually configured via
If |
Invisibly returns a list containing the currently set options when setting. Returns the list visibly when called without arguments (getter mode).
## Not run: # --- Output Type Options --- # Set default output to sf (materialized) sx_options(output_type = "sf") # --- Independent Resource Config --- sx_options( threads = 8, # SedonaDB (Heavy compute) duckdb_threads = 2, # DuckDB (I/O only) duckdb_memory_limit_gb = 4 # DuckDB specific limit ) # --- View Current Configuration --- sx_options() ## End(Not run)## Not run: # --- Output Type Options --- # Set default output to sf (materialized) sx_options(output_type = "sf") # --- Independent Resource Config --- sx_options( threads = 8, # SedonaDB (Heavy compute) duckdb_threads = 2, # DuckDB (I/O only) duckdb_memory_limit_gb = 4 # DuckDB specific limit ) # --- View Current Configuration --- sx_options() ## End(Not run)
This function reads spatial files and transfers them to SedonaDB. Parquet files (local, HTTP, S3) are read natively by SedonaDB. Other formats (Shapefile, GeoJSON, GPKG) are read via DuckDB and streamed via Arrow.
sx_read( path, data_reader = "auto", query = NULL, view_name = NULL, target = "sedonadb", options = NULL, shp_encoding = NULL, layer = NULL, spatial_filter = NULL, open_options = NULL, allowed_drivers = NULL, hive_partitioning = NULL, union_by_name = NULL, max_batch_size = NULL, verbosity = NULL, ... )sx_read( path, data_reader = "auto", query = NULL, view_name = NULL, target = "sedonadb", options = NULL, shp_encoding = NULL, layer = NULL, spatial_filter = NULL, open_options = NULL, allowed_drivers = NULL, hive_partitioning = NULL, union_by_name = NULL, max_batch_size = NULL, verbosity = NULL, ... )
path |
Character string. Path to the file (local, HTTP, or S3) to read.
Can also be a table name if using the DuckDB reader with a custom |
data_reader |
Character string specifying which data reader to use. Options:
|
query |
Optional SQL query (for DuckDB reader only). If NULL, reads all data.
Use |
view_name |
Character (optional). Name to register the result as a persistent view in the active backend. If NULL (default), returns the result directly without creating a view. Not all backends support named views. Check backend-specific documentation for availability. |
target |
Target engine for loading. Currently only |
options |
Named list of options for SedonaDB parquet reader (e.g., S3 credentials).
Example: |
shp_encoding |
Character encoding for Shapefile attribute data (e.g., "UTF-8", "CP1252"). Only used when reading Shapefiles via DuckDB. Useful for non-ASCII characters. |
layer |
Layer name to read (for multi-layer formats like GPKG). |
spatial_filter |
WKT string or |
open_options |
Character vector of driver-specific open options for GDAL (e.g.,
|
allowed_drivers |
Character vector of GDAL driver names to restrict reading to. |
hive_partitioning |
Logical. For partitioned Parquet directories, if |
union_by_name |
Logical. For multi-file Parquet reads, if |
max_batch_size |
Integer. Maximum batch size for GDAL reads via |
verbosity |
Character or NULL. Controls message output for this function call.
If NULL (the default), uses the global |
... |
Additional arguments passed to the data reader (e.g., |
A sedonadb_dataframe.
## Not run: # Auto-detect: parquet uses SedonaDB, shp uses DuckDB sdf <- sx_read("path/to/file.parquet") sdf <- sx_read("path/to/file.shp") # S3 parquet with options sdf <- sx_read( "s3://bucket/path/file.parquet", options = list("aws.region" = "us-west-2") ) # HTTP parquet sdf <- sx_read("https://example.com/data.parquet") # Read shapefile with explicit encoding df <- sx_read("data.shp", shp_encoding = "CP1252") # Read specific layer from GPKG df <- sx_read("data.gpkg", layer = "counties") # Read with driver-specific open options df <- sx_read("data.csv", open_options = c("HEADERS=FORCE")) # Read partitioned parquet with Hive partitioning df <- sx_read("data_partitioned/", hive_partitioning = TRUE) # Read from existing DuckDB table df <- sx_read("my_table", conn = my_duckdb_conn) ## End(Not run)## Not run: # Auto-detect: parquet uses SedonaDB, shp uses DuckDB sdf <- sx_read("path/to/file.parquet") sdf <- sx_read("path/to/file.shp") # S3 parquet with options sdf <- sx_read( "s3://bucket/path/file.parquet", options = list("aws.region" = "us-west-2") ) # HTTP parquet sdf <- sx_read("https://example.com/data.parquet") # Read shapefile with explicit encoding df <- sx_read("data.shp", shp_encoding = "CP1252") # Read specific layer from GPKG df <- sx_read("data.gpkg", layer = "counties") # Read with driver-specific open options df <- sx_read("data.csv", open_options = c("HEADERS=FORCE")) # Read partitioned parquet with Hive partitioning df <- sx_read("data_partitioned/", hive_partitioning = TRUE) # Read from existing DuckDB table df <- sx_read("my_table", conn = my_duckdb_conn) ## End(Not run)
Simplify the geometry by removing points (vertices) that do not significantly contribute to the shape. Uses the Ramer-Douglas-Peucker algorithm.
sx_simplify( x, dTolerance, preserveTopology = FALSE, output = NULL, view_name = NULL, verbosity = NULL, ... )sx_simplify( x, dTolerance, preserveTopology = FALSE, output = NULL, view_name = NULL, verbosity = NULL, ... )
x |
Input object (sf, sedonadb_dataframe, or character view name). |
dTolerance |
Numeric. The tolerance distance for simplification. Vertices closer than this distance to the simplified line are removed. |
preserveTopology |
Logical. If |
output |
Character or NULL. Output type: Output types:
|
view_name |
Character (optional). Name to register the result as a persistent view in the active backend. If NULL (default), returns the result directly without creating a view. Not all backends support named views. Check backend-specific documentation for availability. |
verbosity |
Character or NULL. Controls message output for this function call.
If NULL (the default), uses the global |
... |
Ignored. Used to catch and warn about unsupported sf arguments. |
Result (type depends on output)
Displays useful information about the current configuration, including global options and the status of the SedonaDB context.
sx_sitrep(verbosity = NULL)sx_sitrep(verbosity = NULL)
verbosity |
Character or NULL. Controls message output for this function call.
If NULL (the default), uses the global |
Invisibly returns a list with the current status configuration.
sx_sitrep()sx_sitrep()
Executes a SQL query against SedonaDB, allowing R sedonadb_dataframe objects
or sf objects to be used directly in the SQL string via {} interpolation.
Supports piping: df |> sx_sql("SELECT * FROM {.}").
sx_sql(..., .envir = parent.frame(), verbosity = NULL)sx_sql(..., .envir = parent.frame(), verbosity = NULL)
... |
SQL string parts (passed to |
.envir |
Environment to evaluate expressions in. Defaults to |
verbosity |
Character or NULL. Controls message output for this function call.
If NULL (the default), uses the global |
A sedonadb_dataframe.
## Not run: nc <- sx_read("shape/nc.shp") # Reference object by name sx_sql("SELECT * FROM {nc} WHERE AREA > 0.1") # Reference via pipe nc |> sx_sql("SELECT * FROM {.} WHERE AREA > 0.1") ## End(Not run)## Not run: nc <- sx_read("shape/nc.shp") # Reference object by name sx_sql("SELECT * FROM {nc} WHERE AREA > 0.1") # Reference via pipe nc |> sx_sql("SELECT * FROM {.} WHERE AREA > 0.1") ## End(Not run)
Transforms the coordinates of the geometry column to a new Coordinate Reference System (CRS).
Calculations are performed in SedonaDB using ST_Transform.
sx_transform( x, crs, src_crs = NULL, output = NULL, view_name = NULL, verbosity = NULL, ... )sx_transform( x, crs, src_crs = NULL, output = NULL, view_name = NULL, verbosity = NULL, ... )
x |
A |
crs |
Target Coordinate Reference System. Can be:
|
src_crs |
Source Coordinate Reference System (optional).
If |
output |
Character or NULL. Output type: Output types:
|
view_name |
Character (optional). Name to register the result as a persistent view in the active backend. If NULL (default), returns the result directly without creating a view. Not all backends support named views. Check backend-specific documentation for availability. |
verbosity |
Character or NULL. Controls message output for this function call.
If NULL (the default), uses the global |
... |
Ignored. Used to catch sf-specific arguments. |
Result (type depends on output)
library(sf) nc <- st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE) # Transform to EPSG:5070 (Albers) res <- sx_transform(nc, 5070, output = "sf") # Register as view and transform sx_as_view(nc, "nc_raw") sx_transform("nc_raw", "EPSG:3857")library(sf) nc <- st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE) # Transform to EPSG:5070 (Albers) res <- sx_transform(nc, 5070, output = "sf") # Register as view and transform sx_as_view(nc, "nc_raw") sx_transform("nc_raw", "EPSG:3857")
Explicitly controls whether sx functions should use spherical geometry (S2) logic
by default.
sx_use_s2(use)sx_use_s2(use)
use |
Logical. |
This setting acts as a global default for sx functions. It can comprise:
Global Default: Unset (NULL). sx defaults to TRUE (Geometric safety) if backend supports it.
User Override: Set via this function.
Per-Call Override: Arguments use_s2 in sx_* functions take highest precedence.
Logical. The current setting (invisibly if use is provided).
Writes spatial data to disk. Parquet files use SedonaDB's native GeoParquet writer. Other formats (GeoJSON, Shapefile, GPKG) use DuckDB's GDAL-based COPY.
sx_write( data, path, gdal_driver = NULL, overwrite = FALSE, crs = NULL, compression = NULL, options = list(), verbosity = NULL, ... )sx_write( data, path, gdal_driver = NULL, overwrite = FALSE, crs = NULL, compression = NULL, options = list(), verbosity = NULL, ... )
data |
A |
path |
Character string. Path to the file (local, HTTP, or S3) to write to. |
gdal_driver |
GDAL driver name for writing spatial formats. If
For non-standard extensions, specify the driver explicitly.
Use |
overwrite |
Logical. If |
crs |
Output CRS (e.g., "EPSG:4326"). Passed to GDAL as |
compression |
Compression codec for Parquet files. Common options:
|
options |
Named list of additional options:
|
verbosity |
Character or NULL. Controls message output for this function call.
If NULL (the default), uses the global |
... |
Additional arguments (reserved for future use). |
The path invisibly.
## Not run: # Read nc data sdf <- sx_read(system.file("shape/nc.shp", package = "sf")) # Write to GeoJSON (auto-detected) sx_write(sdf, "nc.geojson") # Write to GeoParquet (native SedonaDB writer) sx_write(sdf, "nc.parquet") # Write to compressed parquet (via DuckDB) sx_write(sdf, "nc_compressed.parquet", compression = "zstd") # Write to Shapefile with explicit driver sx_write(sdf, "nc.shp", gdal_driver = "ESRI Shapefile") # Overwrite existing file sx_write(sdf, "nc.gpkg", overwrite = TRUE) # CRS override sx_write(sdf, "nc_3857.geojson", crs = "EPSG:3857") ## End(Not run)## Not run: # Read nc data sdf <- sx_read(system.file("shape/nc.shp", package = "sf")) # Write to GeoJSON (auto-detected) sx_write(sdf, "nc.geojson") # Write to GeoParquet (native SedonaDB writer) sx_write(sdf, "nc.parquet") # Write to compressed parquet (via DuckDB) sx_write(sdf, "nc_compressed.parquet", compression = "zstd") # Write to Shapefile with explicit driver sx_write(sdf, "nc.shp", gdal_driver = "ESRI Shapefile") # Overwrite existing file sx_write(sdf, "nc.gpkg", overwrite = TRUE) # CRS override sx_write(sdf, "nc_3857.geojson", crs = "EPSG:3857") ## End(Not run)