Organizations, big and small are leveraging cloud driven data lakes and Delta Lake technologies. With Delta lake, we get ACID compliant & time travel along with other benefits. But with the introduction of time travel, every DML operation is a metadata operation and none of the older files gets deleted until a manual VACUUM command is issued. This along with the conviction that we can/should dump all the data to lake and never worry about it as costs are less resulted petabyte scale lakes even for smaller orgs.

For an effective management of such huge data store, we need to…

In the world of Spark, IOT data & other streaming data sources, I know its a little late to talk about ingesting data using sqoop from an RDBMS data source. But after having worked with both spark & sqoop, and if you are still in Hadoop rather than a cloud platform specific connector, I believe sqoop still got a role to play.

We will now talk a little bit about The Normal way of using Sqoop & The Smart way and see some performance metrics around it.

The Normal: We specify the connection parameters and number of mappers and specify…

Jerin Scaria

A Tech Enthusiast working on Big Data & Analytics.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store