option #1 seems much simpler I'd likely start with that
re uuid, I usually take the "existing" ids and put them in a dictionary so I know which ones to skip, absolutely. I don't understand what the "list comprehension" approach would entail that isn't using a hash lookup of some kind. On Sat, Apr 10, 2021, at 5:18 PM, Andrew Martin wrote: > Hi everyone, > > This might be a totally premature optimization, but I want to at least start > off in the right direction. > > I'm pulling messages from a slack channel into an Airflow DAG and writing > events to a table if certain attributes of the slack message has changed. > Slack message structures are really variable depending on how a message was > posted (i.e., you just type something, you share a message from another > channel, it has a file attachment, or you share it and add a message, etc. > The dictionary you get is totally different and unreliable.) So I have a > SlackMessage class that susses all of this out and creates an object with > standardized attributes that I can rely on. It looks like this: > > class SlackMessage: > def __init__(self, message): > # print(message + "\n") > self.reactions = [] > self.ts = "" > self.uuid = "" > self.datetime = "" > self.teams = [] > self.codename = "" > self.needs_ticket = "" > self.triage_status = "" > self.resolution_type = "" > self.message_type = "" > self.message_text = "" > self.file_urls = [] > self.message_url = "" > self.local_url = "" > self.jira_title = "" > self.datetime_resolved = "" > > self.parse_message(message) > > The init creates the attributes so I don't have to check for them to exist, > and the parse_message method populates anything that can be at the time. > > I'm pushing these objects to a triage_message table in which I want to add > new records when a message has changed based on the reactions, triage_status, > ticket_status, or teams values have changed. This is sort of intentionally > not upsert. I want to preserve event data so that I can do analytics, like > how long from the time the message was created until it obtained the > "complete" reaction. > > My SQLAlchemy model is different from the slack_message class and has some > additional fields like a generated bigint ID, a foreign key to my data > warehouse date dimension, and some automated date fields for creation and > update time (although in this design, there really shouldn't be an updated > date.) > > The model looks like this: > > class TriageMessage(Base): > __tablename__ = "triage_message" > > uuid = Column(Text, nullable=False) > dim_date_id = Column( > ForeignKey("dim_date.id", ondelete="CASCADE", onupdate="CASCADE"), > nullable=False, > ) > ts = Column(Integer, nullable=False) > datetime = Column(Date, nullable=False) > datetime_resolved = Column(Date) > codename = Column(Text, nullable=False) > jira_title = Column(Text, nullable=False) > local_url = Column(Text, nullable=False) > message_type = Column(Text, nullable=False) > message_url = Column(Text, nullable=False) > needs_ticket = Column(Boolean, nullable=False) > reactions = Column(ARRAY(Text()), nullable=False) > teams = Column(ARRAY(Text()), nullable=False) > triage_status = Column(Text, nullable=False) > file_urls = Column(ARRAY(Text())) > resolution_type = Column(Text) > > I'm trying to figure out what the best way is to decide if a new record > should be created. These aren't the same class, so it doesn't immediately > make sense to me to create __eq__ and __ne__ method overrides on these. > > There are two different approaches that do seem reasonable to me. > > 1. a comparison function > > def slack_msg_eq_sqla_msg(slack_msg, sqla_msg): > <compare the attributes I care about here and return True or False> > > 2. convert the slack message object to the sqla object and test for equality. > > Does one of these approaches make more sense than the other to the group here? > > Also bonus question! If the uuid of the message doesn't exist in the database > at all, I can skip all of this and just write the new event. If I'm pulling > 20k messages at a time from the slack channel and postgres, is it worth it to > create a dict with uuid as the key so that I'm searching a hash table instead > of doing a list comprehension to gather new items based on the uuid value? > > > -- > SQLAlchemy - > The Python SQL Toolkit and Object Relational Mapper > > http://www.sqlalchemy.org/ > > To post example code, please provide an MCVE: Minimal, Complete, and > Verifiable Example. See http://stackoverflow.com/help/mcve for a full > description. > --- > You received this message because you are subscribed to the Google Groups > "sqlalchemy" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to sqlalchemy+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/sqlalchemy/da9401d3-53d2-48eb-bb3d-34c54acf2606n%40googlegroups.com > > <https://groups.google.com/d/msgid/sqlalchemy/da9401d3-53d2-48eb-bb3d-34c54acf2606n%40googlegroups.com?utm_medium=email&utm_source=footer>. -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/sqlalchemy/114f29c0-a9eb-4e42-a3ea-023228c16042%40www.fastmail.com.