Hi,

First, thank you very much for your reply.

On Tue, Jan 09, 2018 at 10:25:11PM +0000, Alan Gauld via Tutor wrote:
On 09/01/18 14:20, YU Bo wrote:

But, i am facing an interesting question.I have no idea to deal with it.

I don;t think you have given us enough context to
be able to help much. WE would need some idea of
the input and output data (both current and desired)



It sounds like you are building some kind of pretty printer.
Maybe you could use Pythons pretty-print module as a design
template? Or maybe even use some of it directly. It just
depends on your data formats etc.

Yes. I think python can deal with it directly.


In fact, this is a patch from lkml,my goal is to design a kernel podcast
for myself to focus on what happened in kernel.

Sorry, I've no idea what lkml is nor what kernel you are talking about.

Can you show us what you are receiving, what you are
currently producing and what you are trying to produce?

Some actual code might be an idea too.
And the python version and OS.

Sorry, i don't to explain it.But, my code is terribly.

lkml.py:

```code
#!/usr/bin/python
# -*- coding: UTF-8 -*-
# File Name: lkml.py
# Author: Bo Yu

""" This is source code in page that i want to get

"""
import sys
reload(sys)
sys.setdefaultencoding('utf8')

import urllib2
from bs4 import BeautifulSoup
import requests

import chardet
import re

# import myself print function

from get_content import print_content

if __name__ == '__main__':
   comment_url = []
   target = 'https://www.spinics.net/lists/kernel/threads.html'
   req = requests.get(url=target)
   req.encoding = 'utf-8'
   content = req.text
   bf = BeautifulSoup(content ,'lxml') # There is no problem

   context = bf.find_all('strong')
   for ret in context[0:1]:
         for test in ret:
               print '\t'
                x = re.split(' ', str(test))
                y = re.search('"(.+?)"', str(x)).group(1)
                comment_url.append(target.replace("threads.html", str(y)))

   for tmp_target in comment_url:
        print "===This is a new file ==="
        print_content(tmp_target, 'utf-8', 'title')

```
get_content.py:

```code
#!/usr/bin/python
# -*- coding: UTF-8 -*-
# File Name: get_content.py

import urllib2
from bs4 import BeautifulSoup
import requests

import chardet
import re

def print_content(url, charset, find_id):
   req = requests.get(url=url)
   req.encoding = charset
   content = req.text
   bf = BeautifulSoup(content ,'lxml')
   article_title = bf.find('h1')
   #author = bf.find_all('li')
   commit = bf.find('pre')
   print '\t'
   print article_title.get_text()
   print '\t'
   x = str(commit.get_text())
   print x
```
python --version: Python 2.7.13
OS: debian 9
usage: python lkml.py
output: oh...
https://pastecode.xyz/view/04645424

Please ignore my print debug format.

This is my code and i can get text like output above.
So, simple my quzz:
I dont know how to delete strings after special word, for example:

```text
The registers rax, rcx and rdx are touched when controlling IBRS
so they need to be saved when they can't be clobbered.

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 45a63e0..3b9b238 100644
...
```
I want to delete string from *diff --git* to end, because too many code is here

Whatever, thanks!



--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Reply via email to