For a use case we need to run DQ on CSV's loaded into HDFS (only). So no hive externally managed tables are available. Our use case is much like the desktop tutorial Hadoop (01.0.0 Read whole folder - MAP.plan), but we'd like to create VCI's and update them once a day for profiling, DQE and sample data.
Two Q's:
- How do we connect tot HDFS? (Simple JDBC we use in DBBeaver doesn't work)
- How do we update the VCI's?