python去除html标签的几种方法

微信扫一扫,分享到朋友圈

python去除html标签的几种方法
收藏 02
import re
from bs4 import BeautifulSoup
from lxml import etree
html = '<p>你好</p><br/><font>哈哈</font><b>大家好</b>'
# 方法一
pattern = re.compile(r'<[^>]+>',re.S)
result = pattern.sub('', html)
print(result)
# 方法二
soup = BeautifulSoup(html,'html.parser')
print(soup.get_text())
# 方法三
response = etree.HTML(text=html)
# print(dir(response))
print(response.xpath('string(.)'))

# 你好哈哈大家好
# 你好哈哈大家好
# 你好哈哈大家好

 

一个热爱互联网的咸鱼
上一篇

python处理纯文本内容添加html标签

你也可能喜欢

2 条评论

  1. I was excited to uncover this website. I need to to thank you for ones time for this particularly fantastic read!! I definitely really liked every part of it and i also have you bookmarked to look at new things in your site.

  2. Greetings! Very useful advice in this particular post!
    It is the little changes that make the largest changes. Thanks a lot
    for sharing!

发表评论

您的电子邮件地址不会被公开。 必填项已用 * 标注

提示:点击验证后方可评论!

插入图片

热门

    抱歉,30天内未发布文章!
返回顶部