Hi Meena,
It's not impossible, but it's unlikely that there's a bug in Spark SQL
randomly duplicating rows. The most likely explanation is there are more
records in the item table that match your sys/custumer_id/scode criteria
than you expect.
In your original query, try changing select rev.* to
Multiple applications can run at once, but you need to either configure
Spark or your applications to allow that. In stand-alone mode, each
application attempts to take all resources available by default. This
section of the documentation has more details:
https://spark.apache.org/docs/latest/spar
I use Spark in standalone mode. It works well, and the instructions on the
site are accurate for the most part. The only thing that didn't work for me
was the start_all.sh script. Instead, I use a simple script that starts the
master node, then uses SSH to connect to the worker machines and start t
from
> such loss, damage or destruction.
>
>
>
>
> On Thu, 17 Aug 2023 at 21:01, Patrick Tucci
> wrote:
>
>> Hi Mich,
>>
>> Here are my config values from spark-defaults.conf:
>>
>> spark.eventLog.enabled true
>> spark.eventLog.dir hdfs:/
ny
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
-5205b2/>
>
>
> https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical c
a or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Sun, 13 Aug 2023 at 11:48
k. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, dam
responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destructi
elying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Fri, 11 Aug 2023 at 11:26, Patrick Tucci
> wrote:
>
>> Thanks for the reply Step
>> has been because another job was still running and had already claimed some
>> of the resources (cores and memory).
>>
>> I think this can also happen if your configuration tries to claim
>> resources that will never be available.
>>
>> Cheers,
>&
deh-ph-d-5205b2/>
>>
>>
>> https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from
med.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Thu, 10 Aug 2023 at 18:39, Patrick Tucci
> wrote:
>
>> Hello,
>>
>> I'm attempting to run a query on Spark 3.4.0 throug
Hello,
I'm attempting to run a query on Spark 3.4.0 through the Spark
ThriftServer. The cluster has 64 cores, 250GB RAM, and operates in
standalone mode using HDFS for storage.
The query is as follows:
SELECT ME.*, MB.BenefitID
FROM MemberEnrollment ME
JOIN MemberBenefits MB
ON ME.ID = MB.Enroll
which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Sat, 29 Jul 2023 at 12:02, Pa
Hello,
I'm building an application on Spark SQL. The cluster is set up in
standalone mode with HDFS as storage. The only Spark application running is
the Spark Thrift Server using FAIR scheduling mode. Queries are submitted
to Thrift Server using beeline.
I have multiple queries that insert rows
in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Mon, 26 Jun 2023 at 16:33, Patrick Tucci
> wrote:
>
>> Hello,
>>
>> I'm using Spark 3.4.0 in standalone mode with Hadoop 3.3.5. The master
>&g
Hello,
I'm using Spark 3.4.0 in standalone mode with Hadoop 3.3.5. The master node
has 2 cores and 8GB of RAM. There is a single worker node with 8 cores and
64GB of RAM.
I'm trying to process a large pipe delimited file that has been compressed
with gzip (9.2GB zipped, ~58GB unzipped, ~241m reco
Window functions don't work like traditional GROUP BYs. They allow you to
partition data and pull any relevant column, whether it's used in the
partition or not.
I'm not sure what the syntax is for PySpark, but the standard SQL would be
something like this:
WITH InputData AS
(
SELECT 'USA' Coun
Thanks. How would I go about formally submitting a feature request for this?
On 2022/11/21 23:47:16 Andrew Melo wrote:
> I think this is the right place, just a hard question :) As far as I
> know, there's no "case insensitive flag", so YMMV
>
> On Mon, Nov 21, 20
Is this the wrong list for this type of question?
On 2022/11/12 16:34:48 Patrick Tucci wrote:
> Hello,
>
> Is there a way to set string comparisons to be case-insensitive
globally? I
> understand LOWER() can be used, but my codebase contains 27k lines of SQL
> and many string
Hello,
Is there a way to set string comparisons to be case-insensitive globally? I
understand LOWER() can be used, but my codebase contains 27k lines of SQL
and many string comparisons. I would need to apply LOWER() to each string
literal in the code base. I'd also need to change all the ETL/impor
22 matches
Mail list logo