On Tuesday, 9 September 2025 18:32:14 Central European Summer Time Sasha Levin wrote: > Add a new 'b4 dig' subcommand that uses AI agents to discover related > emails for a given message ID. This helps developers find all relevant > context around patches including previous versions, bug reports, reviews, > and related discussions. > > The command: > - Takes a message ID and constructs a detailed prompt about email > relationships > - Calls a configured AI agent script to analyze and find related messages > - Downloads all related threads from lore.kernel.org > - Combines them into a single mbox file for easy review > > Key features: > - Outputs a simplified summary showing only relationships and reasons > - Creates a combined mbox with all related threads (deduped) > - Provides detailed guidance to AI agents about kernel workflow patterns > > Configuration: > The AI agent script is configured via: > -c AGENT=/path/to/agent.sh (command line) > dig-agent: /path/to/agent.sh (config file) > > The agent script receives a prompt file and should return JSON with > related message IDs and their relationships. > > Example usage: > > $ b4 -c AGENT=agent.sh dig [email protected] > Analyzing message: [email protected] > Fetching original message... > Looking up > https://lore.kernel.org/[email protected] > Grabbing thread from > lore.kernel.org/all/[email protected]/t.mbox.gz > Subject: [PATCH V3 6.6.y] mm: introduce and use {pgd,p4d}_populate_kernel() > From: Harry Yoo <[email protected]> > Constructing agent prompt... > Calling AI agent: agent.sh > Calling agent: agent.sh /tmp/tmpz1oja9_5.txt > Parsing agent response... > Found 17 related messages: > > Related Messages Summary: > ------------------------------------------------------------ > [PARENT] Greg KH's stable tree failure notification that initiated this 6.6.y > backport request > [V1] V1 of the 6.6.y backport patch > [V2] V2 of the 6.6.y backport patch > [RELATED] Same patch backported to 5.15.y stable branch > [RELATED] Greg KH's stable tree failure notification for 5.15.y branch > [RELATED] Same patch backported to 6.1.y stable branch > [COVER] V5 mainline patch series cover letter that was originally merged > [RELATED] V5 mainline patch 1/3: move page table sync declarations > [RELATED] V5 mainline patch 2/3: the original populate_kernel patch that's > being backported > [RELATED] V5 mainline patch 3/3: x86 ARCH_PAGE_TABLE_SYNC_MASK definition > [RELATED] RFC V1 cover letter - earliest version of this patch series > [RELATED] RFC V1 patch 1/3 - first introduction of populate_kernel helpers > [RELATED] RFC V1 patch 2/3 - x86/mm definitions > [RELATED] RFC V1 patch 3/3 - convert to _kernel variant > [RELATED] Baoquan He's V3 patch touching same file (mm/kasan/init.c) > [RELATED] Baoquan He's V2 patch touching same file (mm/kasan/init.c) > [RELATED] Baoquan He's V1 patch touching same file (mm/kasan/init.c) > ------------------------------------------------------------ > > The resulting mbox would look like this: > > 1 O Jul 09 Harry Yoo ( 102) [RFC V1 PATCH mm-hotfixes 0/3] mm, > arch: A more robust approach to sync top level kernel page tables > 2 O Jul 09 Harry Yoo ( 143) ├─>[RFC V1 PATCH mm-hotfixes 1/3] mm: > introduce and use {pgd,p4d}_populate_kernel() > 3 O Jul 11 David Hildenbra ( 33) │ └─> > 4 O Jul 13 Harry Yoo ( 56) │ └─> > 5 O Jul 13 Mike Rapoport ( 67) │ └─> > 6 O Jul 14 Harry Yoo ( 46) │ └─> > 7 O Jul 15 Harry Yoo ( 65) │ └─> > 8 O Jul 09 Harry Yoo ( 246) ├─>[RFC V1 PATCH mm-hotfixes 2/3] > x86/mm: define p*d_populate_kernel() and top-level page table sync > 9 O Jul 09 Andrew Morton ( 12) │ ├─> > 10 O Jul 10 Harry Yoo ( 23) │ │ └─> > 11 O Jul 11 Harry Yoo ( 34) │ │ └─> > 12 O Jul 11 Harry Yoo ( 35) │ │ └─> > 13 O Jul 10 kernel test rob ( 79) │ └─> > 14 O Jul 09 Harry Yoo ( 300) ├─>[RFC V1 PATCH mm-hotfixes 3/3] > x86/mm: convert {pgd,p4d}_populate{,_init} to _kernel variant > 15 O Jul 10 kernel test rob ( 80) │ └─> > 16 O Jul 09 Harry Yoo ( 31) └─>Re: [RFC V1 PATCH mm-hotfixes 0/3] > mm, arch: A more robust approach to sync top level kernel page tables > 17 O Aug 18 Harry Yoo ( 262) [PATCH V5 mm-hotfixes 0/3] mm, x86: > fix crash due to missing page table sync and make it harder to miss > 18 O Aug 18 Harry Yoo ( 72) ├─>[PATCH V5 mm-hotfixes 1/3] mm: move > page table sync declarations to linux/pgtable.h > 19 O Aug 18 David Hildenbra ( 20) │ └─> > 20 O Aug 18 Harry Yoo ( 239) ├─>[PATCH V5 mm-hotfixes 2/3] mm: > introduce and use {pgd,p4d}_populate_kernel() > 21 O Aug 18 David Hildenbra ( 60) │ ├─> > 22 O Aug 18 kernel test rob ( 150) │ ├─> > 23 O Aug 18 Harry Yoo ( 161) │ │ └─> > 24 O Aug 21 Harry Yoo ( 85) │ ├─>[PATCH] mm: fix KASAN build error > due to p*d_populate_kernel() > 25 O Aug 21 kernel test rob ( 18) │ │ ├─> > 26 O Aug 21 Lorenzo Stoakes ( 100) │ │ ├─> > 27 O Aug 21 Harry Yoo ( 62) │ │ │ └─> > 28 O Aug 21 Lorenzo Stoakes ( 18) │ │ │ └─> > 29 O Aug 21 Harry Yoo ( 90) │ │ └─>[PATCH v2] mm: fix KASAN build > error due to p*d_populate_kernel() > 30 O Aug 21 kernel test rob ( 18) │ │ ├─> > 31 O Aug 21 Dave Hansen ( 24) │ │ └─> > 32 O Aug 22 Harry Yoo ( 56) │ │ └─> > 33 O Aug 22 Andrey Ryabinin ( 91) │ │ ├─> > 34 O Aug 27 Harry Yoo ( 98) │ │ │ └─> > 35 O Aug 22 Dave Hansen ( 63) │ │ └─> > 36 O Aug 25 Andrey Ryabinin ( 72) │ │ └─> > 37 O Aug 22 Harry Yoo ( 103) │ └─>[PATCH v3] mm: fix KASAN build > error due to p*d_populate_kernel() > 38 O Aug 18 Harry Yoo ( 113) ├─>[PATCH V5 mm-hotfixes 3/3] > x86/mm/64: define ARCH_PAGE_TABLE_SYNC_MASK and arch_sync_kernel_mappings() > 39 O Aug 18 David Hildenbra ( 72) │ └─> > 40 O Aug 18 David Hildenbra ( 15) └─>Re: [PATCH V5 mm-hotfixes 0/3] mm, > x86: fix crash due to missing page table sync and make it harder to miss > 41 O Aug 18 Harry Yoo ( 277) [PATCH] mm: introduce and use > {pgd,p4d}_populate_kernel() > 42 O Aug 18 Harry Yoo ( 277) [PATCH] mm: introduce and use > {pgd,p4d}_populate_kernel() > 43 O Aug 18 Harry Yoo ( 277) [PATCH] mm: introduce and use > {pgd,p4d}_populate_kernel() > 44 O Sep 06 gregkh@linuxfou ( 24) FAILED: patch "[PATCH] mm: introduce > and use {pgd,p4d}_populate_kernel()" failed to apply to 6.6-stable tree > 45 O Sep 08 Harry Yoo ( 303) ├─>[PATCH 6.6.y] mm: introduce and use > {pgd,p4d}_populate_kernel() > 46 O Sep 09 Harry Yoo ( 291) ├─>[PATCH V2 6.6.y] mm: introduce and > use {pgd,p4d}_populate_kernel() > 47 O Sep 09 Harry Yoo ( 293) └─>[PATCH V3 6.6.y] mm: introduce and > use {pgd,p4d}_populate_kernel() > 48 O Sep 06 gregkh@linuxfou ( 24) FAILED: patch "[PATCH] mm: introduce > and use {pgd,p4d}_populate_kernel()" failed to apply to 6.1-stable tree > 49 O Sep 08 Harry Yoo ( 303) ├─>[PATCH 6.1.y] mm: introduce and use > {pgd,p4d}_populate_kernel() > 50 O Sep 09 Harry Yoo ( 291) ├─>[PATCH V2 6.1.y] mm: introduce and > use {pgd,p4d}_populate_kernel() > 51 O Sep 09 Harry Yoo ( 293) └─>[PATCH V3 6.1.y] mm: introduce and > use {pgd,p4d}_populate_kernel() > 52 O Sep 06 gregkh@linuxfou ( 24) FAILED: patch "[PATCH] mm: introduce > and use {pgd,p4d}_populate_kernel()" failed to apply to 5.15-stable tree > 53 O Sep 08 Harry Yoo ( 273) ├─>[PATCH 5.15.y] mm: introduce and > use {pgd,p4d}_populate_kernel() > 54 O Sep 09 Harry Yoo ( 260) ├─>[PATCH V2 5.15.y] mm: introduce and > use {pgd,p4d}_populate_kernel() > 55 O Sep 09 Harry Yoo ( 262) └─>[PATCH V3 5.15.y] mm: introduce and > use {pgd,p4d}_populate_kernel() > > The prompt includes extensive documentation about lore.kernel.org's search > capabilities, limitations (like search index lag), and kernel workflow > patterns > to help AI agents effectively find related messages. > > Assisted-by: Claude Code
Hi Sasha, it doesn't seem like Assisted-by is the right terminology here, as the code itself makes me believe it was written wholesale by your preferred LLM with minimal oversight, and then posted to the list. A non-exhaustive code review inline, as it quickly became clear this wasn't worth further time invested in reviewing. > Signed-off-by: Sasha Levin <[email protected]> > --- > src/b4/command.py | 17 ++ > src/b4/dig.py | 630 ++++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 647 insertions(+) > create mode 100644 src/b4/dig.py > > diff --git a/src/b4/command.py b/src/b4/command.py > index 455124d..f225ae5 100644 > --- a/src/b4/command.py > +++ b/src/b4/command.py > @@ -120,6 +120,11 @@ def cmd_diff(cmdargs: argparse.Namespace) -> None: > b4.diff.main(cmdargs) > > > +def cmd_dig(cmdargs: argparse.Namespace) -> None: > + import b4.dig > + b4.dig.main(cmdargs) > + > + > class ConfigOption(argparse.Action): > """Action class for storing key=value arguments in a dict.""" > def __call__(self, parser: argparse.ArgumentParser, > @@ -399,6 +404,18 @@ def setup_parser() -> argparse.ArgumentParser: > help='Submit the token received via verification > email') > sp_send.set_defaults(func=cmd_send) > > + # b4 dig > + sp_dig = subparsers.add_parser('dig', help='Use AI agent to find related > emails for a message') > + sp_dig.add_argument('msgid', nargs='?', > + help='Message ID to analyze, or pipe a raw message') > + sp_dig.add_argument('-o', '--output', dest='output', default=None, > + help='Output mbox filename (default: > <msgid>-related.mbox)') > + sp_dig.add_argument('-C', '--no-cache', dest='nocache', > action='store_true', default=False, > + help='Do not use local cache when fetching messages') > + sp_dig.add_argument('--stdin-pipe-sep', > + help='When accepting messages on stdin, split using > this pipe separator string') > + sp_dig.set_defaults(func=cmd_dig) > + > return parser > > > diff --git a/src/b4/dig.py b/src/b4/dig.py > new file mode 100644 > index 0000000..007f7d0 > --- /dev/null > +++ b/src/b4/dig.py > @@ -0,0 +1,630 @@ > +#!/usr/bin/env python3 > +# -*- coding: utf-8 -*- > +# SPDX-License-Identifier: GPL-2.0-or-later > +# > +# b4 dig - Use AI agents to find related emails > +# > +__author__ = 'Sasha Levin <[email protected]>' > + > +import argparse > +import logging > +import subprocess > +import sys > +import os > +import tempfile > +import json > +import urllib.parse > +import gzip > +import mailbox > +import email.utils > +from typing import Optional, List, Dict, Any > + > +import b4 > + > +logger = b4.logger > + > + > +def construct_agent_prompt(msgid: str) -> str: > + """Construct a detailed prompt for the AI agent to find related > emails.""" > + > + # Clean up the message ID > + if msgid.startswith('<'): > + msgid = msgid[1:] > + if msgid.endswith('>'): > + msgid = msgid[:-1] str.removeprefix and str.removesuffix exist for this precise purpose. > [... snipped robot wrangling ...] > + > + > +def call_agent(prompt: str, agent_cmd: str) -> Optional[str]: > + """Call the configured agent script with the prompt.""" > + > + # Expand user paths > + agent_cmd = os.path.expanduser(agent_cmd) > + > + if not os.path.exists(agent_cmd): > + logger.error('Agent command not found: %s', agent_cmd) > + return None > + > + if not os.access(agent_cmd, os.X_OK): > + logger.error('Agent command is not executable: %s', agent_cmd) > + return None Why does this check exist? Why does the previous check exist? Wouldn't it be better to just handle the exception subprocess.run will throw? > + > + try: > + # Write prompt to a temporary file to avoid shell escaping issues > + with tempfile.NamedTemporaryFile(mode='w', suffix='.txt', > delete=False) as tmp: > + tmp.write(prompt) > + tmp_path = tmp.name I'm so glad we now have tmp_path so I don't have to write out tmp.name every time > + > + # Call the agent script with the prompt file as argument > + logger.info('Calling agent: %s %s', agent_cmd, tmp_path) > + result = subprocess.run( > + [agent_cmd, tmp_path], > + capture_output=True, > + text=True > + ) > + > + if result.returncode != 0: > + logger.error('Agent returned error code %d', result.returncode) > + if result.stderr: > + logger.error('Agent stderr: %s', result.stderr) > + return None > + > + return result.stdout > + > + except subprocess.TimeoutExpired: You don't set a timeout in the subprocess.run parameters, so this is dead code. > + logger.error('Agent command timed out after 5 minutes') > + return None > + except Exception as e: > + logger.error('Error calling agent: %s', e) > + return None > + finally: > + # Clean up temp file > + if 'tmp_path' in locals(): > + try: > + os.unlink(tmp_path) > + except: > + pass This is pointless. Had you (or rather, Claude doing business as you) not set delete=False, and simply indented everything that needs the temporary file to be within the `with` clause, then this code could be removed. > + > + > +def parse_agent_response(response: str) -> List[Dict[str, str]]: > + """Parse the agent's response to extract message IDs.""" > + > + related = [] > + > + try: > + # Try to find JSON in the response > + # Agent might return additional text, so we look for JSON array > + import re > + json_match = re.search(r'\[.*?\]', response, re.DOTALL) > + if json_match: > + json_str = json_match.group(0) > + data = json.loads(json_str) > + > + if isinstance(data, list): > + for item in data: > + if isinstance(item, dict) and 'msgid' in item: > + related.append({ > + 'msgid': item.get('msgid', ''), > + 'relationship': item.get('relationship', > 'related'), > + 'reason': item.get('reason', 'No reason > provided') > + }) > + else: > + # Fallback: try to extract message IDs from plain text > + # Look for patterns that look like message IDs > + msgid_pattern = > re.compile(r'[a-zA-Z0-9][a-zA-Z0-9\.\-_]+@[a-zA-Z0-9][a-zA-Z0-9\.\-]+\.[a-zA-Z]+') > + for match in msgid_pattern.finditer(response): > + msgid = match.group(0) > + if msgid != '': # Don't include the original > + related.append({ > + 'msgid': msgid, > + 'relationship': 'related', > + 'reason': 'Found in agent response' > + }) > + > + except json.JSONDecodeError as e: > + logger.warning('Could not parse JSON from agent response: %s', e) > + except Exception as e: > + logger.error('Error parsing agent response: %s', e) > + > + return related > + > + > +def get_message_info(msgid: str) -> Optional[Dict[str, Any]]: > + """Retrieve basic information about a message.""" > + > + msgs = b4.get_pi_thread_by_msgid(msgid, onlymsgids={msgid}, > with_thread=False) > + if not msgs: > + return None > + > + msg = msgs[0] > + > + return { > + 'subject': msg.get('Subject', 'No subject'), > + 'from': msg.get('From', 'Unknown'), > + 'date': msg.get('Date', 'Unknown'), > + 'msgid': msgid > + } > + > + > +def download_and_combine_threads(msgid: str, related_messages: > List[Dict[str, str]], > + output_file: str, nocache: bool = False) -> > int: > + """Download thread mboxes for all related messages and combine into one > mbox file.""" > + > + message_ids = [msgid] # Start with original message > + > + # Add all related message IDs > + for item in related_messages: > + if 'msgid' in item: > + message_ids.append(item['msgid']) > + > + # Collect all messages from all threads > + seen_msgids = set() > + all_messages = [] > + > + # Download thread for each message > + # But be smart about what we include - don't mix unrelated series > + for msg_id in message_ids: > + logger.info('Fetching thread for %s', msg_id) > + > + # For better control, fetch just the specific thread, not everything > + # Use onlymsgids to limit scope when possible > + msgs = b4.get_pi_thread_by_msgid(msg_id, nocache=nocache) > + > + if msgs: > + # Try to detect thread boundaries and avoid mixing unrelated > series > + thread_messages = [] > + base_subject = None > + > + for msg in msgs: > + msg_msgid = b4.LoreMessage.get_clean_msgid(msg) > + > + # Skip if we've already seen this message > + if msg_msgid in seen_msgids: > + continue > + > + # Get the subject to check if it's part of the same series > + subject = msg.get('Subject', '') > + > + # Extract base subject (remove Re:, [PATCH], version > numbers, etc) > + import re > + base = re.sub(r'^(Re:\s*)*(\[.*?\]\s*)*', '', > subject).strip() > + > + # Set the base subject from the first message > + if base_subject is None and base: > + base_subject = base > + > + # Add the message > + if msg_msgid: > + seen_msgids.add(msg_msgid) > + thread_messages.append(msg) > + > + all_messages.extend(thread_messages) > + else: > + logger.warning('Could not fetch thread for %s', msg_id) > + > + # Sort messages by date to maintain chronological order > + all_messages.sort(key=lambda m: > email.utils.parsedate_to_datetime(m.get('Date', 'Thu, 1 Jan 1970 00:00:00 > +0000'))) > + > + # Write all messages to output mbox file using b4's proper mbox functions > + logger.info('Writing %d messages to %s', len(all_messages), output_file) > + > + total_messages = len(all_messages) > + > + if total_messages > 0: > + # Use b4's save_mboxrd_mbox function which properly handles mbox > format > + with open(output_file, 'wb') as outf: > + b4.save_mboxrd_mbox(all_messages, outf) > + > + logger.info('Combined mbox contains %d unique messages', total_messages) > + return total_messages > + > + > +def main(cmdargs: argparse.Namespace) -> None: > + """Main entry point for b4 dig command.""" > + > + # Get the message ID > + msgid = b4.get_msgid(cmdargs) > + if not msgid: > + logger.critical('Please provide a message-id') > + sys.exit(1) > + > + # Clean up message ID > + if msgid.startswith('<'): > + msgid = msgid[1:] > + if msgid.endswith('>'): > + msgid = msgid[:-1] Well, good thing we're duplicating the subpar code from before. > + > + logger.info('Analyzing message: %s', msgid) > + > + # Get the agent command from config > + config = b4.get_main_config() > + agent_cmd = None > + > + # Check command-line config override > + if hasattr(cmdargs, 'config') and cmdargs.config: > + if 'AGENT' in cmdargs.config: > + agent_cmd = cmdargs.config['AGENT'] dict.get exists > + > + # Fall back to main config > + if not agent_cmd: > + agent_cmd = config.get('dig-agent', config.get('agent', None)) > + > + if not agent_cmd: > + logger.critical('No AI agent configured. Set dig-agent in config or > use -c AGENT=/path/to/agent.sh') > + logger.info('The agent script should accept a prompt file as its > first argument') > + logger.info('and return a JSON array of related message IDs to > stdout') > + sys.exit(1) > + > + # Get info about the original message > + logger.info('Fetching original message...') > + msg_info = get_message_info(msgid) > + if msg_info: > + logger.info('Subject: %s', msg_info['subject']) > + logger.info('From: %s', msg_info['from']) > + else: > + logger.warning('Could not retrieve original message info') > + > + # Construct the prompt > + logger.info('Constructing agent prompt...') > + prompt = construct_agent_prompt(msgid) > + > + # Call the agent > + logger.info('Calling AI agent: %s', agent_cmd) > + response = call_agent(prompt, agent_cmd) > + > + if not response: > + logger.critical('No response from agent') > + sys.exit(1) > + > + # Parse the response > + logger.info('Parsing agent response...') > + related = parse_agent_response(response) > + > + if not related: > + logger.info('No related messages found') > + sys.exit(0) > + > + # Display simplified results > + logger.info('Found %d related messages:', len(related)) > + print() > + print('Related Messages Summary:') > + print('-' * 60) > + > + for item in related: > + relationship = item.get('relationship', 'related') > + reason = item.get('reason', '') > + > + print(f'[{relationship.upper()}] {reason}') > + > + print('-' * 60) > + print() > + > + # Generate output mbox filename > + if hasattr(cmdargs, 'output') and cmdargs.output: > + mbox_file = cmdargs.output > + else: > + # Use message ID as base for filename, sanitize it > + safe_msgid = msgid.replace('/', '_').replace('@', > '_at_').replace('<', '').replace('>', '') str.translate exists > + mbox_file = f'{safe_msgid}-related.mbox' > + > + # Download and combine all threads into one mbox > + logger.info('Downloading and combining all related threads...') > + nocache = hasattr(cmdargs, 'nocache') and cmdargs.nocache dict.get exists > + total_messages = download_and_combine_threads(msgid, related, mbox_file, > nocache=nocache) > + > + if total_messages > 0: > + logger.info('Success: Combined mbox saved to %s (%d messages)', > mbox_file, total_messages) > + print(f'✓ Combined mbox file: {mbox_file}') > + print(f' Total messages: {total_messages}') > + print(f' Related threads: {len(related) + 1}') # +1 for original > + else: > + logger.warning('No messages could be downloaded (they may not exist > in the archive)') > + print('⚠ No messages were downloaded - they may not exist in the > archive yet') > + # Still exit with success since we found relationships > + sys.exit(0) > I did not even remotely look over all the code, but when people on your other agentic evangelism series pointed out how it'll result in lazy patches from people who should know better, then this is kind of the type of thing they probably meant.
