One of the most common data sources that our users connect to are relational databases, e.g. Oracle, MSSQL, PostgreSQL, etc. For those sources, we usually recommend using JDBC drivers to connect, and use JDBC Reader and JDBC Writer steps to work with the data.
But for big data sources, e.g. Databricks, within the Ataccama ONE Desktop environment there are a few different steps available. Ever wondered which one you should choose?
Hive Reader and Writer:
Hive Reader and Writers are the simplest to configure, but also offer the least options. You will need to specify the exact database, table and column names you want to read from / write to. For the Reader step you can provide some simple where clause in the Filter.
Spark SQL Reader:
For more complex SQL queries, you can use the Spark SQL Reader. This is not limited to where clauses, but also joins, aggregates etc.
Spark Reader and Writer:
For even more complex operations, you can use the Spark Reader and Writer steps. They are more complex to configure, using the Spark DataFrame reader / writer APIs.
JDBC Reader and Writer:
JDBC Reader and Writer can theoretically be used as a last resort, on low volumes of data. You will likely see slow performance on big data environments using JDBC connectors, and is generally not recommended.