2020-04-16 09:55:02 UTC - Adelina Brask: I believe you are right....but I don't 
know how to fix it...
----
2020-04-16 09:55:14 UTC - Adelina Brask: 
----
2020-04-16 09:55:47 UTC - Adelina Brask: 
`/opt/apache-pulsar-2.5.0/bin/pulsar-admin sinks create --name elastic 
--sink-config-file /opt/apache-pulsar-2.5.0/conf/elastic_sink.yml  --inputs 
public/default/test  --archive 
/opt/apache-pulsar-2.5.0/connectors/pulsar-io-elastic-search-2.5.0.nar`
----
2020-04-16 09:56:01 UTC - Adelina Brask: In the logs sink starts with schema 
"BYTES"
----
2020-04-16 09:56:23 UTC - Adelina Brask: then I create the sources
----
2020-04-16 09:56:28 UTC - Adelina Brask: 
`/opt/apache-pulsar-2.5.0/bin/pulsar-admin sources create --name netty 
--source-config-file /opt/apache-pulsar-2.5.0/conf/netty-source-config.yml 
--destination-topic-name public/default/test  --archive 
/opt/apache-pulsar-2.5.0/connectors/pulsar-io-netty-2.5.0.nar` 
----
2020-04-16 09:57:12 UTC - Adelina Brask: in the logs, it inherits schema type 
from Elastic (BYTES) and theoretically starts, but 10999 is not bind
----
2020-04-16 09:57:29 UTC - Adelina Brask: I can't find any schema related logs 
....
----
2020-04-16 09:59:11 UTC - Adelina Brask: If I start them in the opposite way 1. 
SOURCE starts with no schema, port is bind correctly. 2. SINK fails to start - 
maybe cause it has  schema BYTES and
----
2020-04-16 09:59:38 UTC - Adelina Brask: what I don't understand is, why both 
works in `localrun` but not with `create`
----
2020-04-16 10:00:07 UTC - Adelina Brask: isn't the same schemas policy they are 
using ?
----
2020-04-16 10:01:19 UTC - Adelina Brask: So Presto is started in cluster mode
----
2020-04-16 10:01:24 UTC - Adelina Brask: `[root@clstpulsar01 
apache-pulsar-2.5.0]# bin/pulsar sql`
`presto> SELECT * FROM system.runtime.nodes;`
               `node_id                |         http_uri          | 
node_version | coordinator | state`  
`--------------------------------------+---------------------------+--------------+-------------+--------`
 `ffffffff-gggg-gggg-gggg-ffffffffffff | <http://10.220.37.193:8081> | 
testversion  | false       | active` 
 `ffffffff-ffff-ffff-ffff-ffffffffffff | <http://10.220.37.191:8081> | 
testversion  | true        | active` 
 `ffffffff-eeee-eeee-eeee-ffffffffffff | <http://10.220.37.192:8081> | 
testversion  | false       | active` 
`(3 rows)`

`Query 20200416_100101_00007_fye7x, FINISHED, 2 nodes`
`Splits: 17 total, 17 done (100.00%)`
`0:00 [3 rows, 237B] [12 rows/s, 989B/s]`
----
2020-04-16 10:01:37 UTC - Adelina Brask: according to this: 
<http://pulsar.apache.org/docs/en/sql-deployment-configurations/>
----
2020-04-16 10:01:46 UTC - Adelina Brask: But when I query a table I get:
----
2020-04-16 10:02:08 UTC - Adelina Brask: `Query 20200416_100158_00008_fye7x 
failed: <http://java.io|java.io>.IOException: Failed to initialize ledger 
manager factory`
----
2020-04-16 10:04:10 UTC - Adelina Brask: 
----
2020-04-16 10:09:26 UTC - Adelina Brask: I feel that I don't get enough logs, 
or am I missing something?
----
2020-04-16 10:10:09 UTC - Adelina Brask: I feel that I don't get enough logs, 
or am I missing something? Is there a configuration for Presto logging ?
----
2020-04-16 10:12:38 UTC - Adelina Brask: ok..I found the --debug tag 
:slightly_smiling_face:
----
2020-04-16 10:13:25 UTC - Adelina Brask: `Query 20200416_101214_00000_czgjw 
failed: <http://java.io|java.io>.IOException: Failed to initialize ledger 
manager factory`
`java.lang.RuntimeException: <http://java.io|java.io>.IOException: Failed to 
initialize ledger manager factory`
        `at 
org.apache.pulsar.sql.presto.PulsarSplitManager.getSplits(PulsarSplitManager.java:134)`
        `at 
com.facebook.presto.split.SplitManager.getSplits(SplitManager.java:64)`
        `at 
com.facebook.presto.sql.planner.DistributedExecutionPlanner$Visitor.visitTableScan(DistributedExecutionPlanner.java:146)`
        `at 
com.facebook.presto.sql.planner.DistributedExecutionPlanner$Visitor.visitTableScan(DistributedExecutionPlanner.java:122)`
        `at 
com.facebook.presto.sql.planner.plan.TableScanNode.accept(TableScanNode.java:136)`
        `at 
com.facebook.presto.sql.planner.DistributedExecutionPlanner.doPlan(DistributedExecutionPlanner.java:108)`
        `at 
com.facebook.presto.sql.planner.DistributedExecutionPlanner.doPlan(DistributedExecutionPlanner.java:113)`
        `at 
com.facebook.presto.sql.planner.DistributedExecutionPlanner.plan(DistributedExecutionPlanner.java:85)`
        `at 
com.facebook.presto.execution.SqlQueryExecution.planDistribution(SqlQueryExecution.java:385)`
        `at 
com.facebook.presto.execution.SqlQueryExecution.start(SqlQueryExecution.java:287)`
        `at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)`
        `at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)`
        `at java.lang.Thread.run(Thread.java:748)`
`Caused by: <http://java.io|java.io>.IOException: Failed to initialize ledger 
manager factory`
        `at 
org.apache.bookkeeper.client.BookKeeper.&lt;init&gt;(BookKeeper.java:520)`
        `at 
org.apache.bookkeeper.client.BookKeeper.&lt;init&gt;(BookKeeper.java:368)`
        `at 
org.apache.bookkeeper.mledger.impl.ManagedLedgerFactoryImpl$DefaultBkFactory.&lt;init&gt;(ManagedLedgerFactoryImpl.java:183)`
        `at 
org.apache.bookkeeper.mledger.impl.ManagedLedgerFactoryImpl.&lt;init&gt;(ManagedLedgerFactoryImpl.java:122)`
        `at 
org.apache.bookkeeper.mledger.impl.ManagedLedgerFactoryImpl.&lt;init&gt;(ManagedLedgerFactoryImpl.java:114)`
        `at 
org.apache.pulsar.sql.presto.PulsarConnectorCache.initManagedLedgerFactory(PulsarConnectorCache.java:108)`
        `at 
org.apache.pulsar.sql.presto.PulsarConnectorCache.&lt;init&gt;(PulsarConnectorCache.java:66)`
        `at 
org.apache.pulsar.sql.presto.PulsarConnectorCache.getConnectorCache(PulsarConnectorCache.java:83)`
        `at 
org.apache.pulsar.sql.presto.PulsarSplitManager.getSplitsNonPartitionedTopic(PulsarSplitManager.java:224)`
        `at 
org.apache.pulsar.sql.presto.PulsarSplitManager.getSplits(PulsarSplitManager.java:126)`
        `... 12 more`
`Caused by: org.apache.bookkeeper.meta.exceptions.MetadataException: Failed to 
initialized ledger manager factory`
        `at 
org.apache.bookkeeper.meta.zk.ZKMetadataDriverBase.getLedgerManagerFactory(ZKMetadataDriverBase.java:243)`
        `at 
org.apache.bookkeeper.client.BookKeeper.&lt;init&gt;(BookKeeper.java:518)`
        `... 21 more`
`Caused by: <http://java.io|java.io>.IOException: Empty Ledger Root Path.`
        `at 
org.apache.bookkeeper.meta.AbstractZkLedgerManagerFactory.newLedgerManagerFactory(AbstractZkLedgerManagerFactory.java:158)`
        `at 
org.apache.bookkeeper.meta.zk.ZKMetadataDriverBase.getLedgerManagerFactory(ZKMetadataDriverBase.java:239)`
        `... 22 more`
----
2020-04-16 10:41:56 UTC - Adelina Brask: This is SOLVED :slightly_smiling_face: 
it was the wrong broker service URI in the config.
+1 : Ebere Abanonu, Sijie Guo, David Kjerrumgaard
----
2020-04-16 12:06:54 UTC - Ebere Abanonu: Thanks!
----
2020-04-16 12:09:03 UTC - Ebere Abanonu: I guess byte[] is also not fully 
supported too? Cause I got an error about converting to byte
----
2020-04-16 12:13:31 UTC - Ebere Abanonu: You could use pulsar sql, it is a 
connector for presto
----
2020-04-16 12:14:23 UTC - Damien: @Sijie Guo hi, I’m also very interested in a 
possible answer. Could you please point on Flink-Pulsar/Spark-Pulsar or even 
Pulsar-SQL the part of the code which is responsible to call the segment-reader 
or state-store  ?
----
2020-04-16 13:04:02 UTC - Penghui Li: The byte[] in pulsar convert to presto 
`VarbinaryType.VARBINARY`
----
2020-04-16 13:57:19 UTC - charlie: @charlie has joined the channel
----
2020-04-16 14:45:29 UTC - Subash Kunjupillai: Hi,
I'm trying to understand the need to 
<http://pulsar.apache.org/docs/en/deploy-bare-metal/#initialize-cluster-metadata|"Initialize
 Cluster Metadata">.

The reason for the query is, we are not planning to have a DNS server, in that 
case we need to mention each host information during this initialization 
process. I see it is explicitly mentioned in the doc that "You only need to 
write *once*", so I believe this need not be required when we are adding 
additional brokers to the cluster. So I'm trying to understand how this 
initialization is used by Broker. Can someone please help me to understand?
----
2020-04-16 15:20:11 UTC - Sijie Guo: @Adelina Brask thank you for sharing the 
logs. I will check the provided logs.
----
2020-04-16 15:20:43 UTC - Sijie Guo: Glad to know that it is solved.
+1 : Adelina Brask
----
2020-04-16 15:53:03 UTC - Vladimir Shchur: Hi! I'm testing schema, accoring to 
<http://pulsar.apache.org/docs/en/schema-evolution-compatibility/> there should 
be a setting `schemaRegistryCompatibilityCheckers` somewhere, but I've no idea 
where this setting resides. Can't find any place where to configure schema 
compatibility check strategy either
----
2020-04-16 16:52:59 UTC - Ebere Abanonu: I need further clarification. If the 
topic has message already will it be deleted too?
----
2020-04-16 16:55:46 UTC - Ebere Abanonu: The topic got delete even with 
messages produced but no consumer. Just want to produce and not consume. Going 
to use presto to query the data @Sijie Guo
----
2020-04-16 17:06:27 UTC - Adelina Brask: 
<http://pulsar.apache.org/docs/en/schema-manage/#adjust-compatibility>
`schemaRegistryCompatibilityCheckers` is not a setting you/we have access to. 
It's an automatic check that the producer/consumer uses. Using this link to can 
check and change the compatibility strategy.
To check the compatibility policy of your namespace use  `bin/pulsar-admin 
namespaces policies tenant/namespace`
----
2020-04-16 17:12:24 UTC - Adelina Brask: Hi Subash, If you don't initialize the 
cluster meta data , zookeeper won't know which bookies it is responsable for. 
Because on the other configuration files (broker or presto) we only give the 
zookeeper server names and not the bookies. The communication between brokers 
and bookies happens only through zookeeper. I understand it's a pain to do it 
again if you add another broker , but it's necessary for using all your 
ressources. If you had a DNS you only need to do it once.<http://broker.co| >
----
2020-04-16 17:13:39 UTC - David Kjerrumgaard: @Vladimir Shchur You can 
configure that at the namespace level.  
<https://pulsar.apache.org/docs/en/pulsar-admin/#set-schema-autoupdate-strategy>
----
2020-04-16 17:13:53 UTC - Adelina Brask: @Sijie Guo I appreciate having your 
eyes and help on this. I am getting desperate :slightly_smiling_face: I am 
failing to understand why sink + sources won't work together in cluster mode 
(only separately or only in locarun)
----
2020-04-16 17:15:48 UTC - David Kjerrumgaard: @Adelina Brask I saw your earlier 
post about having issues running a Source and Sink concurrently. I believe that 
using the above command to disable compatibility checks at the NS level should 
resolve that issue. Although longer term you probably need to get them both 
using the same schema
----
2020-04-16 17:17:08 UTC - Adelina Brask: It's a little missleading...as the 
name of the parameter is pulsar.broker-service-url (we used to 
<pulsar://whatever:6650> from other configs) but here is expects the web url 
:slightly_smiling_face: (<http://whatever:8080>)
----
2020-04-16 17:19:18 UTC - Vladimir Shchur: Thank you! Missed that documentation 
paragraph
+1 : Adelina Brask
----
2020-04-16 17:21:12 UTC - Adelina Brask: Hi David , I got them to use the same 
schema type , but with no help. My namespace compatibility strategy is now set 
to
 `"schema_auto_update_compatibility_strategy" : "Full",`
  `"schema_compatibility_strategy" : "UNDEFINED",`
----
2020-04-16 17:21:24 UTC - Adelina Brask: if try to play with those settings 
then :slightly_smiling_face: and turn off the compatibility checks
----
2020-04-16 17:23:27 UTC - David Kjerrumgaard: `ALWAYS_COMPATIBLE` might be the 
right option.
----
2020-04-16 17:24:47 UTC - Adelina Brask: I just tried it, delete the topics 
&amp; sources/sink and created them again - the problem persists
----
2020-04-16 17:25:17 UTC - Adelina Brask: But everything works in 
locarun...localrun uses the same settings/policies right?
----
2020-04-16 17:39:07 UTC - David Kjerrumgaard: It should.
----
2020-04-16 17:45:53 UTC - Sijie Guo: yeah. that is a good point. I am creating 
a github issue to improve it.
----
2020-04-16 17:49:38 UTC - Sijie Guo: 
<https://github.com/apache/pulsar/issues/6748>
----
2020-04-16 17:49:49 UTC - Sijie Guo: Contributions are welcome! 
:slightly_smiling_face:
+1 : Adelina Brask
----
2020-04-16 18:08:40 UTC - Adelina Brask: Nice. I feel like I am part of the 
family now :sunglasses:
wave : Sijie Guo
----
2020-04-16 18:08:54 UTC - Matteo Merli: So, for non-persistent topics, the 
messages are anyway not retained, so the ack path is not really needed
----
2020-04-16 18:09:38 UTC - Matteo Merli: since messages are not persisted, there 
are also no message ids (I agree it would be better to have them, for debug 
purpose though)
----
2020-04-16 18:14:31 UTC - Sijie Guo: @Adelina Brask there two separate issues 
here:

1. netty source can’t bound to a port. I think this is irrelative to the schema 
issue. It is just netty source wasn’t designed to run in a non-containerized 
environment. <http://pulsar.apache.org/docs/en/io-netty-source/#configuration> 
(check this documentation. It clarifies at the top). The issue is if you have 
multiple netty source instances running in the same machine, you will be 
encountering port conflicts issue.
2. schema issue. Currently elastic sink isn’t designed for schema awareness. It 
just takes raw bytes. So if you are using the elastic sink with our sources, 
make sure the input topic exists first before using elastic sink. Then it will 
work as well. 
      with that being said, if you are using datagen and elastic together, make 
sure starting datagen first. So datagen will create a topic with avro schema. 
Elastic sink will be able to run but it probably doesn’t work well since the 
bytes are avro encoded binary data. I don’t think Elastic is able to index it.

        If you are using netty source with elastic sink, these should work in 
general. I think the problem is more about the port conflict.
----
2020-04-16 18:27:43 UTC - Adelina Brask: I will try it out tomorrow. However, 
port is binding fine if I start by creating the netty source first (however 
then elastic sink fails). If I create the elastic sink first, it starts 
perfectly , however the netty won't start up. I fail to understand the logic 
and why everything works perfectly in localrun but we have issues in 
clustermode?
----
2020-04-16 18:27:59 UTC - Adelina Brask: @Sijie Guo thanks for your time.
----
2020-04-16 18:32:35 UTC - Rahul: @Matteo Merli As per the Release Plan 
documents I see the following dates are provided.
```December 2019 - March 2020
March 2020 - June 2020
June 2020 - September 2020
September 2020 - December 2020```
Currently I see there are some release candite tags (`2.5.1`) which is for bug 
fixes. When can we expect feature releases (`2.6.0`)?
----
2020-04-16 18:51:16 UTC - Sijie Guo: okay. I will give it a try as well.
----
2020-04-16 18:54:42 UTC - Ebere Abanonu: ```java.lang.ClassCastException: 
java.nio.HeapByteBuffer cannot be cast to [B
2020-04-16T18:53:25.559818923Z  at 
org.apache.pulsar.sql.presto.PulsarRecordCursor.getSlice(PulsarRecordCursor.java:496)
2020-04-16T18:53:25.559823223Z  at 
com.facebook.presto.$gen.CursorProcessor_20200416_185325_28.project_0(Unknown 
Source)
2020-04-16T18:53:25.559827723Z  at 
com.facebook.presto.$gen.CursorProcessor_20200416_185325_28.process(Unknown 
Source)
2020-04-16T18:53:25.559832623Z  at 
com.facebook.presto.operator.ScanFilterAndProjectOperator.processColumnSource(ScanFilterAndProjectOperator.java:237)
2020-04-16T18:53:25.559836923Z  at 
com.facebook.presto.operator.ScanFilterAndProjectOperator.getOutput(ScanFilterAndProjectOperator.java:229)
2020-04-16T18:53:25.559848923Z  at 
com.facebook.presto.operator.Driver.processInternal(Driver.java:373)
2020-04-16T18:53:25.559852723Z  at 
com.facebook.presto.operator.Driver.lambda$processFor$8(Driver.java:282)
2020-04-16T18:53:25.559856623Z  at 
com.facebook.presto.operator.Driver.tryWithLock(Driver.java:672)
2020-04-16T18:53:25.559860423Z  at 
com.facebook.presto.operator.Driver.processFor(Driver.java:276)
2020-04-16T18:53:25.559864323Z  at 
com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:973)
2020-04-16T18:53:25.559868223Z  at 
com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162)
2020-04-16T18:53:25.559872323Z  at 
com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:477)
2020-04-16T18:53:25.559876423Z  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
2020-04-16T18:53:25.559880323Z  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
2020-04-16T18:53:25.559884223Z  at java.lang.Thread.run(Thread.java:748)```
----
2020-04-16 18:55:17 UTC - Ebere Abanonu: byte[] issues in presto
----
2020-04-16 19:21:26 UTC - Sijie Guo: Interesting. Can you create an issue for 
that?
----
2020-04-16 19:26:05 UTC - Rahul: Found it. Its due by `May 15`. Means next 
release cycle.
----
2020-04-16 19:34:32 UTC - Ebere Abanonu: Issue created!
----
2020-04-16 20:01:42 UTC - Jared Mackey: I see, so acking is really pointless in 
non-persistent topics. What does it do when I ask to redeliver messages?
----
2020-04-16 21:28:43 UTC - Matteo Merli: Nothing really, because brokers are not 
buffering up :slightly_smiling_face:
----
2020-04-16 21:35:25 UTC - Frederic COLLIN: @Frederic COLLIN has joined the 
channel
----
2020-04-17 03:00:32 UTC - Tymm: Hi guys, i'm new with Pulsar and need some 
advice on designing the topic structure. I have couple of iot devices: 
temperature sensors, infrared sensors, etc, they each provides data in 
different structures with its own unique ID, and imagine a company has 10 sites 
and each site has 5 temperature sensor, and I want to generate hourly, weekly, 
daily reports using aggregation in pulsar function with enrichment from 
external MSSQL db (time start and time stop for the data which different across 
site).

Should I create topics that are module-based (temperature-sensor, 
infrared-sensor, ...) or device-based (device-unique-id-temperature, 
device-unique-id-infrared, ... ) ?
Is Pulsar SQL a better choice in term of aggregation of data?
----
2020-04-17 07:13:11 UTC - Adelina Brask: @Sijie Guo I fixed the issue. The 
incompatibility was somehow due to the S3 offloader. I think it was acting like 
a 'consumer' . The moment I turned it off and let some message in the topic, 
both the source and sink started normally. :slightly_smiling_face:
----
2020-04-17 07:22:04 UTC - Adelina Brask: If I were you I would create 1 
namepace for each site, then 1 topic pr. data structure, module based. Each 
topic will have a device ID parameter you can make agreggations based on.  But 
you are free to do whatever you wish..I mean if you need each sensor to have 
different output, it may be easier to make 1 topic pr module pr device .
----
2020-04-17 07:32:25 UTC - Adelina Brask: Hi guys. If I need data from a topic 
to both be consumed by Elastic sink and offloaded to s3 at the same time, what 
constellation/settings will you recommend? Should I duplicate the topic to 
another namespace that is exclusively made for offloading?
----
2020-04-17 07:34:36 UTC - Sijie Guo: No you don’t need to duplicate the topic. 
You can configure to offload a topic and the same topic can be consumed by any 
consumers.
----
2020-04-17 07:36:07 UTC - Sijie Guo: hmm interesting …
----
2020-04-17 07:36:44 UTC - Sijie Guo: ideally s3 offloader shouldn’t change any 
schema information.
----
2020-04-17 07:36:54 UTC - Adelina Brask: But woudn't the topic be empty after 
consumed ack?
----
2020-04-17 07:37:00 UTC - Sijie Guo: Can you constantly reproduce this issue?
----
2020-04-17 07:39:50 UTC - Sijie Guo: Ah, the messages will be kept if there are 
retention configured after they are consumed. So I took what I said above. You 
need to enable retention for this to work.
----
2020-04-17 07:40:29 UTC - Adelina Brask: ah , makes sense 
:slightly_smiling_face: Thanks a lot
----
2020-04-17 07:44:38 UTC - JG: yes, I will use the book keeper as event storage 
and going to replay events to create again the read part
----
2020-04-17 08:21:17 UTC - Adelina Brask: No I can't ... it still fails trying 
to reproduce the setup without the offloaders.  I need to spend some more time 
researching it . :slightly_smiling_face:
----
2020-04-17 08:59:01 UTC - Subash Kunjupillai: Hi Adelina,
Thanks for your response. During initialization we are providing only zookeeper 
and broker information, in that case how does Broker identifies the Bookie 
information? Is it hard coded somewhere in Broker to look for specific Znode 
for Bookie information? Also if broker is able to identify the Bookie through 
zookeeper,  I don't think we have to run this initialization again while adding 
more Brokers later. Please share your thoughts.
----

Reply via email to