HOME > 紫 > Python 絖潟若
Python 絖潟若
Python ц綵違cユ茯сc障帥鐚綵吟<≪茯よВ鐚
python abc篏帥堺с篏帥箴紊絲障с紊c頳絲障篏帥鴻abc馹鋎帥abc号┤>羈吾鴻荐宴違c膂≦茹cс
障
潟若絖潟若(ゃ茵憗絖絲上≫)с
utf-8 絖潟若с
Python unicode 絖潟若сCPU <≪筝х
Python str 絖潟若cゃс阪ュ
unicode 吾с篏
潟若潟若ゃ潟 16 蚊 3042 с10 蚊 12354 с 12354 篏帥c茵 unicode 篏
>>> unichr(12354) u'\u3042'
吾с腆冴帥
>>> import types >>> type(unichr(12354)) < type 'unicode'>
unicode с
帥若吾阪
unicode 絲乗援医帥若阪帥
>>> import sys >>> sys.stdout.write(unichr(12354)) Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode character u'\u3042' in position 0:
ordinal not in range(128)
若冴若潟荀с unicode 障上阪с
>>> sys.stdout.write(unichr(12354).encode('utf-8'))
阪с
帥若<ゃ阪unicode с薈с隙潟若c潟違激鴻cゃ潟潟若腱帥若 utf-8 cсс utf-8 潟潟若
絲乗援医 print unicode 吾с筝絎障ゃ潟潟若帥若阪違
>>> print unichr(12354)
罩c阪鐚絨腱医с鐚鴻ゃ篏帥c<ゃ吾莨若翫障
$ echo "print unichr(12354)" > test.py $ python test.py $ $ python test.py > test.txt Traceback (most recent call last): File "test.py", line 1, in <module> print unichr(12354) UnicodeEncodeError: 'ascii' codec can't encode character u'\u3042' in position 0:
ordinal not in range(128)
腱 Python 腱帥若 utf-8 сャc罔羣阪冴羂 utf-8 潟潟若<ゃゃ腱潟若c潟違激鴻障сャc鐚python ゃ絨祉罔羣阪ゃc<ゃゃcф紊鐚
<print 潟潟若ゃ阪с
>>> print unichr(12354).encode('utf-8')
<ゃ篏帥c<ゃ吾莨若с
сゃ潟潟若腮蕁吾с
>>> import types >>> type(unichr(12354).encode('utf-8')) <type 'str'>
str 吾ссstr 1 ゃ絖茯よВс
潟若吾сutf-8 с潟潟若 str 吾сс<ゃ阪с
帥若ュ
帥若ュ帥若潟若c潟違激鴻cゃ緇
>>> '' '\xe3\x81\x82'
腱帥若 utf-8 с蚊筝絖絲障 3 ゃゃュ ゃ unicode 吾с篏
>>> unicode('',encoding='utf-8') u'\u3042'
сencoding='utf-8' 膃筝綣 '' utf-8 с潟潟若ゃс unicode 吾с紊腓冴с
python сゃ unicode 吾с紊潟若鐚潟若 encoding 絎с<鐚
unicode | 潟若 | str 鐚ゃ鐚 |
潟潟若 |
筝сc
>>> ''.decode('utf-8') u'\u3042'
cс
腱絲乗援医с帥若ュ utf-8 с篁絎с
>>> u'' u'\u3042'
c腟緇
<ゃ阪ュ
篁ヤutf-8 帥若罔羣阪ュ茯帥鴻帥<ゃ吾阪ュ堺紊違
>>> import codecs >>> f=codecs.open('a.txt', 'r', 'utf-8') >>> l=f.readline() >>> f.close() >>> l u'\u3042\n' >>> type(l) <type 'unicode'> >>> print l.encode('utf-8')
<ゃ潟若c潟違激鴻絎с腟<ゃ茯水cс吾с unicode 潟若
障違<ゃ茯水у荅蚊<ゃ潟若c潟違激鴻絎c茹i
$ cat test.py #coding: euc-jp print type('') print type(u'') print u''.encode('utf-8')
utf-8 <ゃc #coding: euc-jp 茯ゃャ障上茵
$ python test.py File "test.py", line 2 SyntaxError: 'euc_jp' codec can't decode bytes in position 12-13: illegal multibyte sequence
若冴с違<ゃ潟若c潟違激鴻絎 euc 紊絎茵帥
$ nkf -e test.py > test_euc.py $ python test_euc.py <type 'str'> <type 'unicode'>
euc ф吾 '' euc c潟潟若ゃ鐚str 鐚u'' unicode 潟若筝箴緇с unicode utf-8 潟潟若utf-8 帥若茵腓冴
ASCII 号┤
Python ASCII 絖宴cс障 ASCII 絎臂絖ユ絖罸鴻障障ャ篏帥с
潟若ゃ潟若違贋違 unicode 吾с篏翫篏
>>> unichr(12354) u'\u3042'
医茵腓冴a篏
>>> unichr(97) u'a'
篋冴荀絖茵腓冴
絲乗援医сュユ茯
>>> '' '\xe3\x81\x82'
ゃ筝潟腓冴潟若 ASCII 絎臂絖
>>> 'a' 'a'
篋冴荀茵腓冴
筝 unicode 吾с罔羣阪 write 若莎激
>>> sys.stdout.write(unichr(12354)) Traceback (most recent call last): File "<stdin>", line 1, in ≶module> UnicodeEncodeError: 'ascii' codec can't encode character u'\u3042' in position 0: ordinal not in range(128)
a unicode 吾с罔羣阪 write 潟若障頫腓冴
>>> sys.stdout.write(chr(97)) a
ュゃ筝潟 unicode 吾с絖ユ茯
>>> unicode('') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 0: ordinal not in range(128)
c薈с
>>> unicode('',encoding='utf-8') u'\u3042'
膃筝綣違潟若c潟違激鴻薈 ASCII 絎臂膀蚊
>>> unicode('a') u'a'
藥c
ASCII 号┤蚊ゃ chr() ∽違ASCII 潟若鐚贋逸綣違ゃ篏綣違 0 127 障с絎
>>> chr(97) 'a'
с
>>> unichr(12354).encode('utf-8') '\xe3\x81\x82'
+ 羲膊絖潟若吾с紕g潟潟若ゃ紕gс
ASCII 膀蚊ゃ unicode 吾сg帥
>>> chr(97) + unichr(12355) u'a\u3043'
茵潟若若冴 ASCII 号┤
ゃaссc
>>> unichr(12354).encode('utf-8') + unichr(12355) Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 0:
ordinal not in range(128)
若冴
絖激ャ len() ∽違篏帥蚊膈 unicode 薈ASCII 紊т紊綵吟医吟ASCII 号┤蚊ゃ
ASCII 茯号┤篏帥c潟若c障ユ茯宴箴紊c茲違ゃ絖с OK 吾鴻堺сASCII 絎у号┤逸罩翫兏荀罩c茹cャ鐚ユ茯篏帥膂≦с
絎顔≪ц儀馹荀帥(1) sqlite3
潟若c潟違激鴻馹冴罔羣阪ュ<ゃсゃ茯粋昭т戎c潟潟若 str 吾с絖unicode 吾с馹莎激
sqlite3 膂≦ SQL 若帥若鴻сpython с≪吾ャ若腟粋昭х亜篏帥ф
罨<箴unichr(12354) т unicode 吾с若帥若水ャ
import sqlite3 con = sqlite3.connect(":memory:") cur = con.cursor() cur.execute("create table test (u)") cur.execute("insert into test(u) values (?)",(unichr(12354),)) cur.execute("select * from test") r=cur.fetchone() print r[0] print type(r[0])
select сunicode 吾с緇
сunichr(12354) 篁c unichr(12354).encode('utf-8') cutf-8 с潟潟若 str 吾с筝
sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.
若冴ゃ水ャ鐚潟若吾с水ャ鐚сchr(97) ASCII 膀蚊ゃ篏帥若冴鐚吟 latin-1 障с綛恰鐚翫 select unicode 吾с緇
絎顔≪ц儀馹荀帥(2) tweepy
潟若c潟違激鴻馹冴罔羣阪ュ<ゃсゃ茯粋昭т戎c吾с緇馹莎激
tweepy twitter API 宴ゃсtwitter 潟若絲上tweepy 潟若吾с緇ゃ緇荀帥
#!/usr/bin/env python import tweepy sqlite3file='/home/hoshino/diary/twitter_log.sqlite' credential = { 'CONSUMER_KEY': 'xxxxxxxxxxxxxxxxxxxxx', 'CONSUMER_SECRET': 'xxxxxxxxxxxxxxxxxxxxx', 'ACCESS_KEY': 'xxxxxxxxxxxxxxxxxxxxx', 'ACCESS_SECRET': 'xxxxxxxxxxxxxxxxxxxxx'} maxcount=-1 auth = tweepy.OAuthHandler(credential['CONSUMER_KEY'], credential['CONSUMER_SECRET']) auth.set_access_token(credential['ACCESS_KEY'], credential['ACCESS_SECRET']) api = tweepy.API(auth) t=api.home_timeline(count=1) print type(t[0].author.screen_name) print type(t[0].text)
鴻若潟若 str с腮水絎鴻 unicode у
Python3
Python 絖宴Python3 у紊膂≦障с str unicode 腟延桁с綵九剛сс障違
若
http://www.python.jp/doc/2.5/lib/encodings-overview.html
http://docs.python.org/release/3.0.1/howto/unicode.html
(荐 http://www.geocities.jp/tan9ent/unicode.html)
鐚Python3 翫鐚
http://d.hatena.ne.jp/fgshun/20090901/1251818730