Re: Looping through a series of telephone numbers

Sean Owen Sun, 02 Apr 2023 07:18:50 -0700

That won't work, you can't use Spark within Spark like that.
If it were exact matches, the best solution would be to load both datasets
and join on telephone number.
For this case, I think your best bet is a UDF that contains the telephone
numbers as a list and decides whether a given number matches something in
the set. Then use that to filter, then work with the data set.
There are probably clever fast ways of efficiently determining if a string
is a prefix of a group of strings in Python you could use too.


On Sun, Apr 2, 2023 at 3:17 AM Philippe de Rochambeau <[email protected]>
wrote:

> Many thanks, Mich.
> Is « foreach »  the best construct to  lookup items is a dataset  such as
> the below «  telephonedirectory » data set?
>
> val telrdd = spark.sparkContext.parallelize(Seq(«  tel1 » , «  tel2 » , «  
> tel3 » …)) // the telephone sequence
>
> // was read for a CSV file
>
> val ds = spark.read.parquet(«  /path/to/telephonedirectory » )
>
>   rdd .foreach(tel => {
>     longAcc.select(«  * » ).rlike(«  + »  + tel)
>   })
>
>
>
>
> Le 1 avr. 2023 à 22:36, Mich Talebzadeh <[email protected]> a
> écrit :
>
> This may help
>
> Spark rlike() Working with Regex Matching Example
> <https://sparkbyexamples.com/spark/spark-rlike-regex-matching-examples/>s
> Mich Talebzadeh,
> Lead Solutions Architect/Engineering Lead
> Palantir Technologies Limited
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Sat, 1 Apr 2023 at 19:32, Philippe de Rochambeau <[email protected]>
> wrote:
>
>> Hello,
>> I’m looking for an efficient way in Spark to search for a series of
>> telephone numbers, contained in a CSV file, in a data set column.
>>
>> In pseudo code,
>>
>> for tel in [tel1, tel2, …. tel40,000]
>>         search for tel in dataset using .like(« %tel% »)
>> end for
>>
>> I’m using the like function because the telephone numbers in the data set
>> main contain prefixes, such as « + « ; e.g., « +3312224444 ».
>>
>> Any suggestions would be welcome.
>>
>> Many thanks.
>>
>> Philippe
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: [email protected]
>>
>>
>

Re: Looping through a series of telephone numbers

Reply via email to