option #1 seems much simpler I'd likely start with that

re uuid, I usually take the "existing" ids and put them in a dictionary so I 
know which ones to skip, absolutely.  I don't understand what the "list 
comprehension" approach would entail that isn't using a hash lookup of some 
kind.



On Sat, Apr 10, 2021, at 5:18 PM, Andrew Martin wrote:
> Hi everyone,
> 
> This might be a totally premature optimization, but I want to at least start 
> off in the right direction.
> 
> I'm pulling messages from a slack channel into an Airflow DAG and writing 
> events to a table  if certain attributes of the slack message has changed. 
> Slack message structures are really variable depending on how a message was 
> posted (i.e., you just type something, you share a message from another 
> channel, it has a file attachment, or you share it and add a message, etc. 
> The dictionary you get is totally different and unreliable.) So I have a 
> SlackMessage class that susses all of this out and creates an object with 
> standardized attributes that I can rely on. It looks like this:
> 
> class SlackMessage:
>     def __init__(self, message):
>         # print(message + "\n")
>         self.reactions = []
>         self.ts = ""
>         self.uuid = ""
>         self.datetime = ""
>         self.teams = []
>         self.codename = ""
>         self.needs_ticket = ""
>         self.triage_status = ""
>         self.resolution_type = ""
>         self.message_type = ""
>         self.message_text = ""
>         self.file_urls = []
>         self.message_url = ""
>         self.local_url = ""
>         self.jira_title = ""
>         self.datetime_resolved = ""
> 
>         self.parse_message(message)
> 
> The init creates the attributes so I don't have to check for them to exist, 
> and the parse_message method populates anything that can be at the time.
> 
> I'm pushing these objects to a triage_message table in which I want to add 
> new records when a message has changed based on the reactions, triage_status, 
> ticket_status, or teams values have changed. This is sort of intentionally 
> not upsert. I want to preserve event data so that I can do analytics, like 
> how long from the time the message was created until it obtained the 
> "complete" reaction.
> 
> My SQLAlchemy model is different from the slack_message class and has some 
> additional fields like a generated bigint ID, a foreign key to my data 
> warehouse date dimension, and some automated date fields for creation and 
> update time (although in this design, there really shouldn't be an updated 
> date.)
> 
> The model looks like this:
> 
> class TriageMessage(Base):
>     __tablename__ = "triage_message"
> 
>     uuid = Column(Text, nullable=False)
>     dim_date_id = Column(
>         ForeignKey("dim_date.id", ondelete="CASCADE", onupdate="CASCADE"),
>         nullable=False,
>     )
>     ts = Column(Integer, nullable=False)
>     datetime = Column(Date, nullable=False)
>     datetime_resolved = Column(Date)
>     codename = Column(Text, nullable=False)
>     jira_title = Column(Text, nullable=False)
>     local_url = Column(Text, nullable=False)
>     message_type = Column(Text, nullable=False)
>     message_url = Column(Text, nullable=False)
>     needs_ticket = Column(Boolean, nullable=False)
>     reactions = Column(ARRAY(Text()), nullable=False)
>     teams = Column(ARRAY(Text()), nullable=False)
>     triage_status = Column(Text, nullable=False)
>     file_urls = Column(ARRAY(Text()))
>     resolution_type = Column(Text)
> 
> I'm trying to figure out what the best way is to decide if a new record 
> should be created. These aren't the same class, so it doesn't immediately 
> make sense to me to create __eq__ and __ne__ method overrides on these.
> 
> There are two different approaches that do seem reasonable to me.
> 
> 1. a comparison function
> 
> def slack_msg_eq_sqla_msg(slack_msg, sqla_msg):
>     <compare the attributes I care about here and return True or False>
> 
> 2. convert the slack message object to the sqla object and test for equality.
> 
> Does one of these approaches make more sense than the other to the group here?
> 
> Also bonus question! If the uuid of the message doesn't exist in the database 
> at all, I can skip all of this and just write the new event. If I'm pulling 
> 20k messages at a time from the slack channel and postgres, is it worth it to 
> create a dict with uuid as the key so that I'm searching a hash table instead 
> of doing a list comprehension to gather new items based on the uuid value?
> 
> 

> -- 
> SQLAlchemy - 
> The Python SQL Toolkit and Object Relational Mapper
>  
> http://www.sqlalchemy.org/
>  
> To post example code, please provide an MCVE: Minimal, Complete, and 
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> description.
> --- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to sqlalchemy+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/sqlalchemy/da9401d3-53d2-48eb-bb3d-34c54acf2606n%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/sqlalchemy/da9401d3-53d2-48eb-bb3d-34c54acf2606n%40googlegroups.com?utm_medium=email&utm_source=footer>.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/114f29c0-a9eb-4e42-a3ea-023228c16042%40www.fastmail.com.

Reply via email to