Many thanks, Mich.
Is « foreach » the best construct to lookup items is a dataset such as the
below « telephonedirectory » data set?
val telrdd = spark.sparkContext.parallelize(Seq(« tel1 » , « tel2 » , « tel3
» …)) // the telephone sequence
// was read for a CSV file
val ds = spark.read.parquet(« /path/to/telephonedirectory » )
rdd .foreach(tel => {
longAcc.select(« * » ).rlike(« + » + tel)
})
> Le 1 avr. 2023 à 22:36, Mich Talebzadeh <[email protected]> a écrit :
>
> This may help
>
> Spark rlike() Working with Regex Matching Example
> <https://sparkbyexamples.com/spark/spark-rlike-regex-matching-examples/>s
> Mich Talebzadeh,
> Lead Solutions Architect/Engineering Lead
> Palantir Technologies Limited
>
> view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
> https://en.everybodywiki.com/Mich_Talebzadeh
>
>
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss,
> damage or destruction of data or any other property which may arise from
> relying on this email's technical content is explicitly disclaimed. The
> author will in no case be liable for any monetary damages arising from such
> loss, damage or destruction.
>
>
>
> On Sat, 1 Apr 2023 at 19:32, Philippe de Rochambeau <[email protected]
> <mailto:[email protected]>> wrote:
>> Hello,
>> I’m looking for an efficient way in Spark to search for a series of
>> telephone numbers, contained in a CSV file, in a data set column.
>>
>> In pseudo code,
>>
>> for tel in [tel1, tel2, …. tel40,000]
>> search for tel in dataset using .like(« %tel% »)
>> end for
>>
>> I’m using the like function because the telephone numbers in the data set
>> main contain prefixes, such as « + « ; e.g., « +3312224444 ».
>>
>> Any suggestions would be welcome.
>>
>> Many thanks.
>>
>> Philippe
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: [email protected]
>> <mailto:[email protected]>
>>