Hello,
when I set spark.executor.cores e.g. to 8 cores and spark.executor.memory
to 8GB. It can allocate more executors with less cores for my app but each
executors gets 8GB RAM.
It is a problem because I can allocate more memory across cluster than
expected, the worst case is 8x 1core
Hi,
I understand it just as that they will provide some lower latency interface
and probably using jdbc so that 3rd party BI tools can integrate and query
streams like they would be static datasets. If BI will repeat the query it
will be updated. I don't know if BI tools are already heading
I have to ask my colleague if there is any specific error but I think it
just doesn't see files.
Petr
On Thu, Apr 21, 2016 at 11:54 AM, Petr Novak <oss.mli...@gmail.com> wrote:
> Hello,
> Impala (v2.1.0, Spark 1.6.0) can't read partitioned Parquet files saved
> from DF.partitionB
Hello,
Impala (v2.1.0, Spark 1.6.0) can't read partitioned Parquet files saved
from DF.partitionBy (using Python). Is there any known reason, some config?
Or it should generally work hence it is likely to be something wrong solely
on our side?
Many thanks,
Petr
Hi all,
I believe that it used to be in documentation that Standalone mode is not
for production. I'm either wrong or it was already removed.
Having a small cluster between 5-10 nodes is Standalone recommended for
production? I would like to go with Mesos but the question is if there is
real
How Arrows collide with Tungsten and its binary in-memory format. It will
still has to convert between them. I assume they use similar
concepts/layout hence it is likely the conversion can be quite efficient.
Or is there a change that the current Tungsten in memory format would be
replaced by
Hi all,
based on documenation:
"Spark 1.6.0 is designed for use with Mesos 0.21.0 and does not require any
special patches of Mesos."
We are considering Mesos for our use but this concerns me a lot. Mesos is
currently on v0.27 which we need for its Volumes feature. But Spark locks
us to 0.21
Either setting it programatically doesn't work:
sparkConf.setIfMissing("class", "...Main")
In my current setting moving main to another package requires to propagate
change to deploy scripts. Doesn't matter I will find some other way. Petr
On Fri, Sep 25, 2015 at 4:40 PM,
Ortherwise it seems it tries to load from a checkpoint which I have deleted
and cannot be found. Or it should work and I have wrong something else.
Documentation doesn't mention option with jar manifest, so I assume it
doesn't work this way.
Many thanks,
Petr
I'm sorry. Both approaches actually work. It was something else wrong with
my cluster. Petr
On Fri, Sep 25, 2015 at 4:53 PM, Petr Novak <oss.mli...@gmail.com> wrote:
> Either setting it programatically doesn't work:
> sparkConf.setIfMissing("class", "...Main")
You can have offsetRanges on workers f.e.
object Something {
var offsetRanges = Array[OffsetRange]()
def create[F : ClassTag](stream: InputDStream[Array[Byte]])
(implicit codec: Codec[F]: DStream[F] = {
stream transform { rdd =>
offsetRanges =
PM, Petr Novak <oss.mli...@gmail.com> wrote:
> Many thanks Cody, it explains quite a bit.
>
> I had couple of problems with checkpointing and graceful shutdown moving
> from working code in Spark 1.3.0 to 1.5.0. Having InterruptedExceptions,
> KafkaDirectStream couldn't initia
You can implement your own case class supporting more then 22 fields. It is
something like:
class MyRecord(val val1: String, val val2: String, ... more then 22,
in this case f.e. 26)
extends Product with Serializable {
def canEqual(that: Any): Boolean = that.isInstanceOf[MyRecord]
def
If you need to understand what is the magic Product then google up
Algebraic Data Types and learn it together with what is Sum type. One
option is http://www.stephanboyer.com/post/18/algebraic-data-types
Enjoy,
Petr
On Wed, Sep 23, 2015 at 9:07 AM, Petr Novak <oss.mli...@gmail.com> wrote:
Hi,
I have 2 streams and checkpointing with code based on documentation. One
stream is transforming data from Kafka and saves them to Parquet file. The
other stream uses the same stream and does updateStateByKey to compute some
aggregations. There is no gracefulShutdown.
Both use about this code
And probably the original source code
https://gist.github.com/koen-dejonghe/39c10357607c698c0b04
On Tue, Sep 22, 2015 at 10:37 AM, Petr Novak <oss.mli...@gmail.com> wrote:
> To complete design pattern:
>
> http://stackoverflow.com/questions/30450763/spark-streaming-and-
AM, Petr Novak <oss.mli...@gmail.com> wrote:
> Ahh the problem probably is async ingestion to Spark receiver buffers,
> hence WAL is required I would say.
>
> Petr
>
> On Tue, Sep 22, 2015 at 10:53 AM, Petr Novak <oss.mli...@gmail.com> wrote:
>
>> If MQTT can
;>>> Padma Ch
>>>>
>>>> On Mon, Sep 21, 2015 at 5:10 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>>
>>>>> You can use broadcast variable for passing connection information.
>>>>>
>>>>> Cheers
>>>
to
restart Spark job.
On Tue, Sep 22, 2015 at 10:53 AM, Petr Novak <oss.mli...@gmail.com> wrote:
> Ahh the problem probably is async ingestion to Spark receiver buffers,
> hence WAL is required I would say.
>
> On Tue, Sep 22, 2015 at 10:52 AM, Petr Novak <oss.mli...@gmail.com>
Ahh the problem probably is async ingestion to Spark receiver buffers,
hence WAL is required I would say.
Petr
On Tue, Sep 22, 2015 at 10:53 AM, Petr Novak <oss.mli...@gmail.com> wrote:
> If MQTT can be configured with long enough timeout for ACK and can buffer
> enough events w
Nice, thanks.
So the note in build instruction for 2.11 is obsolete? Or there are still
some limitations?
http://spark.apache.org/docs/latest/building-spark.html#building-for-scala-211
On Fri, Sep 11, 2015 at 2:19 PM, Petr Novak <oss.mli...@gmail.com> wrote:
> Nice, thanks.
>
&
, 2015 at 11:26 AM, Petr Novak <oss.mli...@gmail.com> wrote:
> I should read my posts at least once to avoid so many typos. Hopefully you
> are brave enough to read through.
>
> Petr
>
> On Mon, Sep 21, 2015 at 11:23 AM, Petr Novak <oss.mli...@gmail.com> wrote:
>
>
add @transient?
On Mon, Sep 21, 2015 at 11:36 AM, Petr Novak <oss.mli...@gmail.com> wrote:
> add @transient?
>
> On Mon, Sep 21, 2015 at 11:27 AM, Priya Ch <learnings.chitt...@gmail.com>
> wrote:
>
>> Hello All,
>>
>> How can i pass sparkCo
Great work.
On Fri, Sep 18, 2015 at 6:51 PM, Harish Butani
wrote:
> Hi,
>
> I have just posted a Blog on this:
> https://www.linkedin.com/pulse/combining-druid-spark-interactive-flexible-analytics-scale-butani
>
> regards,
> Harish Butani.
>
> On Tue, Sep 1, 2015 at
Internally Spark is using json4s and jackson parser v3.2.10, AFAIK. So if
you are using Scala they should be available without adding dependencies.
There is v3.2.11 already available but adding to my app was causing
NoSuchMethod exception so I would have to shade it. I'm simply staying on
v3.2.10
provided version is fine for us for now.
Regards,
Petr
On Mon, Sep 21, 2015 at 11:06 AM, Petr Novak <oss.mli...@gmail.com> wrote:
> Internally Spark is using json4s and jackson parser v3.2.10, AFAIK. So if
> you are using Scala they should be available without adding dependencies.
> T
I think you would have to persist events somehow if you don't want to miss
them. I don't see any other option there. Either in MQTT if it is supported
there or routing them through Kafka.
There is WriteAheadLog in Spark but you would have decouple stream MQTT
reading and processing into 2
d just need a confirmation from community that checkpointing and
graceful shutdown is actually working with KafkaDirectStream on 1.5.0 so
that I can look for a problem on my side.
Many thanks,
Petr
On Sun, Sep 20, 2015 at 12:58 PM, Petr Novak <oss.mli...@gmail.com> wrote:
> Hi Michal,
> ye
I should read my posts at least once to avoid so many typos. Hopefully you
are brave enough to read through.
Petr
On Mon, Sep 21, 2015 at 11:23 AM, Petr Novak <oss.mli...@gmail.com> wrote:
> I think you would have to persist events somehow if you don't want to miss
> them. I don't s
t;
>
> On 18 September 2015 at 10:28, Petr Novak <oss.mli...@gmail.com> wrote:
>
>> It might be connected with my problems with gracefulShutdown in Spark
>> 1.5.0 2.11
>> https://mail.google.com/mail/#search/petr/14fb6bd5166f9395
>>
>> Maybe Ctrl+C corrupts
val topics="first"
shouldn't it be val topics = Set("first") ?
On Sun, Sep 20, 2015 at 1:01 PM, Petr Novak <oss.mli...@gmail.com> wrote:
> val topics="first"
>
> shouldn't it be val topics = Set("first") ?
>
> On Sat, Sep 19, 2015 a
It might be connected with my problems with gracefulShutdown in Spark 1.5.0
2.11
https://mail.google.com/mail/#search/petr/14fb6bd5166f9395
Maybe Ctrl+C corrupts checkpoints and breaks gracefulShutdown?
Petr
On Fri, Sep 18, 2015 at 4:10 PM, Petr Novak <oss.mli...@gmail.com>
text)
[2015-09-11 22:33:05,899] INFO Shutdown hook called
(org.apache.spark.util.ShutdownHookManager)
[2015-09-11 22:33:05,899] INFO Deleting directory
/dfs/spark/tmp/spark-b466fc2e-9ab8-4783-87c2-485bac5c3cd6
(org.apache.spark.util.ShutdownHookManager)
Thanks,
Petr
On Mon, Sep 14, 2015 at 3:10
This one is generated, I suppose, after Ctrl+C
15/09/18 14:38:25 INFO Worker: Asked to kill executor
app-20150918143823-0001/0
15/09/18 14:38:25 INFO Worker: Asked to kill executor
app-20150918143823-0001/0
15/09/18 14:38:25 DEBUG
AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor]
I have tried it on Spark 1.3.0 2.10 and it works. The same code doesn't on
Spark 1.5.0 2.11. It would be nice if anybody could try on another
installation to ensure it is something wrong on my cluster.
Many thanks,
Petr
On Fri, Sep 18, 2015 at 4:07 PM, Petr Novak <oss.mli...@gmail.com>
...to ensure it is not something wrong on my cluster.
On Fri, Sep 18, 2015 at 4:09 PM, Petr Novak <oss.mli...@gmail.com> wrote:
> I have tried it on Spark 1.3.0 2.10 and it works. The same code doesn't on
> Spark 1.5.0 2.11. It would be nice if anybody could try on another
&g
Hi all,
it throws FileBasedWriteAheadLogReader: Error reading next item, EOF reached
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at
org.apache.spark.streaming.util.FileBaseWriteAheadLogReader.hasNext(FileBasedWriteAheadLogReader.scala:47)
WAL is not
Does it still apply for 1.5.0?
What actual limitation does it mean when I switch to 2.11? No JDBC
Thriftserver? No JDBC DataSource? No JdbcRDD (which is already obsolete I
believe)? Some more?
What library is the blocker to upgrade JDBC component to 2.11?
Is there any estimate when it could be
Hello,
my Spark streaming v1.3.0 code uses
sys.ShutdownHookThread {
ssc.stop(stopSparkContext = true, stopGracefully = true)
}
to use Ctrl+C in command line to stop it. It returned back to command line
after it finished batch but it doesn't with v1.4.0-v.1.5.0. Was the
behaviour or required
Hello,
sqlContext.parquetFile(dir)
throws exception " Unable to instantiate
org.apache.hadoop.hive.metastore.HiveMetaStoreClient"
The strange thing is that on the second attempt to open the file it is
successful:
try {
sqlContext.parquetFile(dir)
} catch {
case e: Exception =>
The same as
https://mail.google.com/mail/#label/Spark%2Fuser/14f64c75c15f5ccd
Please follow the discussion there.
On Tue, Aug 25, 2015 at 5:02 PM, Petr Novak oss.mli...@gmail.com wrote:
Hi all,
when I read parquet files with required fields aka nullable=false they
are read correctly. Then I
Hi all,
when I read parquet files with required fields aka nullable=false they
are read correctly. Then I save them (df.write.parquet) and read again all
my fields are saved and read as optional, aka nullable=true. Which means I
suddenly have files with incompatible schemas. This happens on
{
(_: Nothing) =
}
}
}
Many thanks for any advice, I'm sure its a noob question.
Petr
On Mon, Aug 17, 2015 at 1:12 PM, Petr Novak oss.mli...@gmail.com wrote:
Or can I generally create new RDD from transformation and enrich its
partitions with some metadata so that I would copy OffsetRanges in my
Or can I generally create new RDD from transformation and enrich its
partitions with some metadata so that I would copy OffsetRanges in my new
RDD in DStream?
On Mon, Aug 17, 2015 at 1:08 PM, Petr Novak oss.mli...@gmail.com wrote:
Hi all,
I need to transform KafkaRDD into a new stream
Hi all,
I need to transform KafkaRDD into a new stream of deserialized case
classes. I want to use the new stream to save it to file and to perform
additional transformations on it.
To save it I want to use offsets in filenames, hence I need OffsetRanges in
transformed RDD. But KafkaRDD is
Hello,
I would like to switch from Scala 2.10 to 2.11 for Spark app development.
It seems that the only thing blocking me is a missing
spark-streaming-kafka_2.11 maven package. Any plan to add it or am I just
blind?
Many thanks,
Vladimir
Thank you. HADOOP_CONF_DIR has been missing.
On Wed, Sep 24, 2014 at 4:48 PM, Matt Narrell matt.narr...@gmail.com
wrote:
Yes, this works. Make sure you have HADOOP_CONF_DIR set on your Spark
machines
mn
On Sep 24, 2014, at 5:35 AM, Petr Novak oss.mli...@gmail.com wrote:
Hello,
if our
Hello,
if our Hadoop cluster is configured with HA and fs.defaultFS points to a
namespace instead of a namenode hostname - hdfs://namespace_name/ - then
our Spark job fails with exception. Is there anything to configure or it is
not implemented?
Exception in thread main
48 matches
Mail list logo