Hi everyone, This might be a totally premature optimization, but I want to at least start off in the right direction.
I'm pulling messages from a slack channel into an Airflow DAG and writing events to a table if certain attributes of the slack message has changed. Slack message structures are really variable depending on how a message was posted (i.e., you just type something, you share a message from another channel, it has a file attachment, or you share it and add a message, etc. The dictionary you get is totally different and unreliable.) So I have a SlackMessage class that susses all of this out and creates an object with standardized attributes that I can rely on. It looks like this: class SlackMessage: def __init__(self, message): # print(message + "\n") self.reactions = [] self.ts = "" self.uuid = "" self.datetime = "" self.teams = [] self.codename = "" self.needs_ticket = "" self.triage_status = "" self.resolution_type = "" self.message_type = "" self.message_text = "" self.file_urls = [] self.message_url = "" self.local_url = "" self.jira_title = "" self.datetime_resolved = "" self.parse_message(message) The init creates the attributes so I don't have to check for them to exist, and the parse_message method populates anything that can be at the time. I'm pushing these objects to a triage_message table in which I want to add new records when a message has changed based on the reactions, triage_status, ticket_status, or teams values have changed. This is sort of intentionally not upsert. I want to preserve event data so that I can do analytics, like how long from the time the message was created until it obtained the "complete" reaction. My SQLAlchemy model is different from the slack_message class and has some additional fields like a generated bigint ID, a foreign key to my data warehouse date dimension, and some automated date fields for creation and update time (although in this design, there really shouldn't be an updated date.) The model looks like this: class TriageMessage(Base): __tablename__ = "triage_message" uuid = Column(Text, nullable=False) dim_date_id = Column( ForeignKey("dim_date.id", ondelete="CASCADE", onupdate="CASCADE"), nullable=False, ) ts = Column(Integer, nullable=False) datetime = Column(Date, nullable=False) datetime_resolved = Column(Date) codename = Column(Text, nullable=False) jira_title = Column(Text, nullable=False) local_url = Column(Text, nullable=False) message_type = Column(Text, nullable=False) message_url = Column(Text, nullable=False) needs_ticket = Column(Boolean, nullable=False) reactions = Column(ARRAY(Text()), nullable=False) teams = Column(ARRAY(Text()), nullable=False) triage_status = Column(Text, nullable=False) file_urls = Column(ARRAY(Text())) resolution_type = Column(Text) I'm trying to figure out what the best way is to decide if a new record should be created. These aren't the same class, so it doesn't immediately make sense to me to create __eq__ and __ne__ method overrides on these. There are two different approaches that do seem reasonable to me. 1. a comparison function def slack_msg_eq_sqla_msg(slack_msg, sqla_msg): <compare the attributes I care about here and return True or False> 2. convert the slack message object to the sqla object and test for equality. Does one of these approaches make more sense than the other to the group here? Also bonus question! If the uuid of the message doesn't exist in the database at all, I can skip all of this and just write the new event. If I'm pulling 20k messages at a time from the slack channel and postgres, is it worth it to create a dict with uuid as the key so that I'm searching a hash table instead of doing a list comprehension to gather new items based on the uuid value? -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/sqlalchemy/da9401d3-53d2-48eb-bb3d-34c54acf2606n%40googlegroups.com.