Hello there,
I have a dataframe with the following...
+-----------------------------+-------------------------------------------------------------------+-------------------------------+
|entity_id |file_path
|other_useful_id |
+-----------------------------+-------------------------------------------------------------------+-------------------------------+
|id-01f7pqqbxddb3b1an6ntyqx6mg|gs://bucket1/path/to/id-01g4he5cb4xqn6s1999k6y1vbd/file_result.json|id-2-01g4he5cb4xqn6s1999k6y1vbd|
|id-01f7pqgbwms4ajmdtdedtwa3mf|gs://bucket1/path/to/id-01g4he5cbh52che104rwy603sr/file_result.json|id-2-01g4he5cbh52che104rwy603sr|
|id-01f7pqqbxejt3ef4ap9qcs78m5|gs://bucket1/path/to/id-01g4he5cbqmdv7dnx46sebs0gt/file_result.json|id-2-01g4he5cbqmdv7dnx46sebs0gt|
|id-01f7pqqbynh895ptpjjfxvk6dc|gs://bucket1/path/to/id-01g4he5cbx1kwhgvdme1s560dw/file_result.json|id-2-01g4he5cbx1kwhgvdme1s560dw|
+-----------------------------+-------------------------------------------------------------------+-------------------------------+
I would like to read each row from `file_path` and write the result to
another dataframe containing `entity_id`, `other_useful_id`,
`json_content`, `file_path`.
Assume that I already have the required HDFS url libraries in my classpath.
Please advice,
Muthu