Hi Chhavi, Currently there is no way to handle backtick(`) spark StructType. Hence the field name a.b and `a.b` are completely different within StructType.
To handle that, I have added a custom implementation fixing StringIndexer# validateAndTransformSchema. You can refer to the code on my github <https://github.com/skale1990/LearnSpark/blob/main/src/main/java/com/som/learnspark/TestCustomStringIndexer.scala> . *Regards,* *Someshwar Kale * On Sat, Jun 8, 2024 at 12:00 PM Chhavi Bansal <meetchhavi1...@gmail.com> wrote: > Hi Someshwar, > Thanks for the response, I have added my comments to the ticket > <https://issues.apache.org/jira/browse/SPARK-48463>. > > > Thanks, > Chhavi Bansal > > On Thu, 6 Jun 2024 at 17:28, Someshwar Kale <skale1...@gmail.com> wrote: > >> As a fix, you may consider adding a transformer to rename columns >> (perhaps replace all columns with dot to underscore) and use the renamed >> columns in your pipeline as below- >> >> val renameColumn = new >> RenameColumn().setInputCol("location.longitude").setOutputCol("location_longitude") >> val si = new >> StringIndexer().setInputCol("location_longitude").setOutputCol("longitutdee") >> val pipeline = new Pipeline().setStages(Array(renameColumn, si)) >> pipeline.fit(flattenedDf).transform(flattenedDf).show() >> >> >> refer my comment >> <https://issues.apache.org/jira/browse/SPARK-48463?focusedCommentId=17852751&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17852751> >> for >> elaboration. >> Thanks!! >> >> *Regards,* >> *Someshwar Kale* >> >> >> >> >> >> On Thu, Jun 6, 2024 at 3:24 AM Chhavi Bansal <meetchhavi1...@gmail.com> >> wrote: >> >>> Hello team >>> I was exploring feature transformation exposed via Mllib on nested >>> dataset, and encountered an error while applying any transformer to a >>> column with dot notation naming. I thought of raising a ticket on spark >>> https://issues.apache.org/jira/browse/SPARK-48463, where I have >>> mentioned the entire scenario. >>> >>> I wanted to get suggestions on what would be the best way to solve the >>> problem while using the dot notation. One workaround is to use`_` while >>> flattening the dataframe, but that would mean having an additional overhead >>> to convert back to `.` (dot notation ) since that’s the convention for our >>> other flattened data. >>> >>> I would be happy to make a contribution to the code if someone can shed >>> some light on how this could be solved. >>> >>> >>> >>> -- >>> Thanks and Regards, >>> Chhavi Bansal >>> >> > > -- > Thanks and Regards, > Chhavi Bansal >