Hi everyone,

This might be a totally premature optimization, but I want to at least 
start off in the right direction.

I'm pulling messages from a slack channel into an Airflow DAG and writing 
events to a table  if certain attributes of the slack message has changed. 
Slack message structures are really variable depending on how a message was 
posted (i.e., you just type something, you share a message from another 
channel, it has a file attachment, or you share it and add a message, etc. 
The dictionary you get is totally different and unreliable.) So I have a 
SlackMessage class that susses all of this out and creates an object with 
standardized attributes that I can rely on. It looks like this:

class SlackMessage:
    def __init__(self, message):
        # print(message + "\n")
        self.reactions = []
        self.ts = ""
        self.uuid = ""
        self.datetime = ""
        self.teams = []
        self.codename = ""
        self.needs_ticket = ""
        self.triage_status = ""
        self.resolution_type = ""
        self.message_type = ""
        self.message_text = ""
        self.file_urls = []
        self.message_url = ""
        self.local_url = ""
        self.jira_title = ""
        self.datetime_resolved = ""

        self.parse_message(message)

The init creates the attributes so I don't have to check for them to exist, 
and the parse_message method populates anything that can be at the time.

I'm pushing these objects to a triage_message table in which I want to add 
new records when a message has changed based on the reactions, 
triage_status, ticket_status, or teams values have changed. This is sort of 
intentionally not upsert. I want to preserve event data so that I can do 
analytics, like how long from the time the message was created until it 
obtained the "complete" reaction.

My SQLAlchemy model is different from the slack_message class and has some 
additional fields like a generated bigint ID, a foreign key to my data 
warehouse date dimension, and some automated date fields for creation and 
update time (although in this design, there really shouldn't be an updated 
date.)

The model looks like this:

class TriageMessage(Base):
    __tablename__ = "triage_message"

    uuid = Column(Text, nullable=False)
    dim_date_id = Column(
        ForeignKey("dim_date.id", ondelete="CASCADE", onupdate="CASCADE"),
        nullable=False,
    )
    ts = Column(Integer, nullable=False)
    datetime = Column(Date, nullable=False)
    datetime_resolved = Column(Date)
    codename = Column(Text, nullable=False)
    jira_title = Column(Text, nullable=False)
    local_url = Column(Text, nullable=False)
    message_type = Column(Text, nullable=False)
    message_url = Column(Text, nullable=False)
    needs_ticket = Column(Boolean, nullable=False)
    reactions = Column(ARRAY(Text()), nullable=False)
    teams = Column(ARRAY(Text()), nullable=False)
    triage_status = Column(Text, nullable=False)
    file_urls = Column(ARRAY(Text()))
    resolution_type = Column(Text)

I'm trying to figure out what the best way is to decide if a new record 
should be created. These aren't the same class, so it doesn't immediately 
make sense to me to create __eq__ and __ne__ method overrides on these.

There are two different approaches that do seem reasonable to me.

1. a comparison function

def slack_msg_eq_sqla_msg(slack_msg, sqla_msg):
    <compare the attributes I care about here and return True or False>

2. convert the slack message object to the sqla object and test for 
equality.

Does one of these approaches make more sense than the other to the group 
here?

Also bonus question! If the uuid of the message doesn't exist in the 
database at all, I can skip all of this and just write the new event. If 
I'm pulling 20k messages at a time from the slack channel and postgres, is 
it worth it to create a dict with uuid as the key so that I'm searching a 
hash table instead of doing a list comprehension to gather new items based 
on the uuid value?

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/da9401d3-53d2-48eb-bb3d-34c54acf2606n%40googlegroups.com.

Reply via email to