HOME > 茯絖 > 沿 > Awk с潟若拷
Awk с潟若拷
GNU Awk 篏帥潟若刻鴻<ゃ鐚KWIC 綣潟潟潟若潟鴻篏阪茲純若CSV 紊茵荐膊純茯粋昭帥遺潟若鴻篏帥c荅絎罎膣≪帥
障潟若鴻若 GUI с若吟祉潟罔羣阪ュ<潟莟р
鐚茗 to 潟若拷潟宴障鴻荅鐚鐚
kwic.awk
鴻若鴻у阪茯c罕鴻<ゃ茵鐚筝膓鐚茯罎膣≪KWIC 翫鴻阪 GNU Awk 鴻鐚ptx 帥鐚
$ gawk -f kwic.awk target="squalid" before=2 after=2 sample.txt limbo of squalid London; his 0038_0258 sample.txt starved and squalid; but of 0083_0170 sample.txt fog and squalid pitfalls, amid 0160_0205 sample.txt origin and squalid life, with 0218_0104 sample.txt the same squalid defeat; yet 0244_0287 sample.txt Total hits: 5 out of 30355 words
純若 sort 潟潟罔羣阪ュ<潟 :-)
潟若
2008綛220ユ壕鐚帥違ゃ潟若合 trunc 激с潟菴遵激с喝сtarget t, before b, after a ュ筝鐚
後
gawk -f kwic.awk OPTIONS input-file [input-file ...]
阪
茵茯臂 帥若蚊茯臂 緇膓茯臂 篏茵_篏茯 <ゃ
激с
鐚罎膣鐚 target 障 t 罎膣√茯罩h頫憗筝"" у茯絲乗院鐚鐚 荀篁ヤ love of 筝障 FS 鐚鐚c若 祉若帥筝や札筝鴻若刻 阪荐垩茲違茯蚊love of 緇 鐚strict="" 鐚с茯茯 ^[^a-zA-Z]* 茯絨障 [^a-zA-Z]*$ ^[^a-zA-Z]*love[^a-zA-Z]*$ ^[^a-zA-Z]*of[^a-zA-Z]*$ 緇 FS 阪荐垩蚊ュ<ゃ筝茯 罩h頫憗筝ゃゃ 鐚IGNORECASE=1鐚с紊ф絖絨絖阪ャ 篁 GNU Awk 泣若罩h頫憗障鞘戎 1 茯障 2 茯茯違羆冴障c罩h 茵憗筝号 [箴] word word word, loved? love loved love.* love lover lovely as .* as as soon as as far as (have|had) have had like[ds]? like liked likes strict ゃ 0 篁ュс違 target ゃ茹i藥 ^[^a-zA-Z]* 臀 [^a-zA-Z]*$ 緇臀茵 0 ctarget="love" strict=1 lovely, glove, beloved strict 激с潟с generous :-) IGNORECASE GNU Awk 激с潟 "" 潟違 紊ф絖絨絖阪ャ kwic.awk 1 lselect target 茵腟莨若 ュ茵絲障罩h頫憗筝 target 筝ゃ筝紊翫綺筝 鐚茵腓削 before 障 b 帥若蚊茯臂ゃ茵茯よ;腓冴 1 after 障 a 帥若蚊茯臂ゃ緇膓茯よ;腓冴 1 width, offset 帥若蚊茯臂よ;腓阪鐚"offset" + "width" * (帥若蚊茯-1) 茵茯臂よ;腓阪鐚"offset" + "width" * ("before"-1) 茵茯臂よ;腓阪鐚"offset" + "width" * ("before"-1) offset=10, width=7 pad 茵茯緇膓茯違 before after ゃ羣 篁c腥堺茵腓冴c帥 \ id 1 с<ゃ筝罎膣√茯臂ゅ茯篏臀茵腓冴 0 с罩≪ 篏茵_篏茯綵√<ゃ篏帥 sfind.awk 茯祉潟潟鴻緇с 1 filename 1 с絲障<ゃ茵腓冴0 ф罩≪ 1 msg 2 с違紫茯違罔羣若阪鐚鐚 1 с罔羣阪0 с阪吟 GNU Awk Windows (DOS) 筝ц軌c翫0 1 trunc ゃ "" 篁ュ茯帥若割札緇茵腓冴 帥違ゃ鴻帥違茵腓冴 "" idformat 茯鐚鐚ゃ若awk fprint 後 %04d_%04d fnameformat <ゃ茵腓冴若awk fprint 後 %-8s 鐚篁鐚 RS GNU Awk 激с潟ュ茵阪 kwic.awk "\r\n|\r|\n"鐚壕鐚 FS GNU Awk 激с潟茯阪 kwic.awk " *"鐚筝や札筝鴻若刻
腮 GNU Awk сュ<ゃ腟茵壕хc<ゃ緇c帥茯粋昭障
腮 GNU Awk сIGNORECASE=1 鐚kwic.awk с鐚с\W 罩h頫憗鴻宴若激若宴潟鴻 \w 荅箴<
sfind.awk
篏茵篏茯筝鴻<ゃ茯祉潟潟鴻緇 GNU Awk 鴻鐚篏茵篏茯 kwic.awk 阪鐚
$ gawk -f sfind.awk id=0244_0287 sample.txt Sampson Brass and his sister, whose crime against society is much more serious, pass their later years in the same <squalid> defeat; yet we f eel assured that the virile Sally, at all events, made a much better f ight against the consequences of her rascality. (L0244 W0287 sample.tx t)
潟若
後
gawk -f sfind.awk OPTIONS input-file
激с
id 倶_倶違筝筝ょ倶違純若鴻 <ゃ筝罎膣√茯障茵垩 篋ょ倶違綵荅画с綵荅峨茯篏茯 de id 篋ゃ倶違阪筝 "_" es 綣決粋;腓冴罎膣√茯水ャ絖 < ee 綣決粋;腓冴罎膣√茯緇水ャ絖 > abb Mr. 茯腓冴ゃ帥若潟 罩h頫憗т "Mrs\.|Prof\.|[A-Z][a-z]?\." RS GNU Awk 激с潟ュ茵阪 sfind.awk "\r\n|\r|\n"鐚壕鐚 FS GNU Awk 激с潟茯阪 sfind.awk " *"鐚筝や札筝鴻若刻 trunc ゃ "" 篁ュ茯帥若割札緇茵腓冴 帥違ゃ鴻帥違茵腓冴 ""
莇
腮激сс腮腴篏帥翫罨<罎膣∵莎よ蚊ゃ
$ gawk -f sfind.awk es=$'\e[31m' ee=$'\e[0m' id=0244_0091 sample.txt We learn the tone of voice, the trick of utterance; he declared that every word spoken by his characters was audible to him. (L0244 W0091 sample.txt)
篏睡箴
激с∽育
祉激с潟篏睡箴с帥ゃ潟違膀膣篁ヤ激с∽違 .bashrc 吾篋膊 /PATH/TO/ kwic.awk sfind.awk 臀c臀篏帥
kwic 鴻<ゃ KWIC 綣潟潟潟若潟鴻篏 kwic2csv kwic 阪 csv 紊 sfind 鴻<ゃ茵茯茯緇 text2csv 鴻激ュ阪鴻 csv 紊 kwicex kwic 阪茵絲上 csvkwic 祉潟潟劫篏篁 kwic 潟潟潟若潟鴻 CSV т
# KWIC kwic () { gawk -f /PATH/TO/kwic.awk "$@" } sfind () { gawk -f /PATH/TO/sfind.awk "$@" } kwic2csv () { sed -e " s/^ *//; s/\"/\"\"/g; s/ */\",\"'/g; s/^/\"'/; s/$/\"/" $* } text2csv () { sed -e 's/\"/\"\"/g; s/\\\ */\",\"/g; s/^/\"/; s/$/\"/' $* } kwicex () { for n in $@; do fil=$n; done gawk 'BEGIN{RS="\r\n|\r|\n"}{print $(NF-1) " " $NF}' $fil | while read line do sfind "$@" id=$line done } csvkwic () { kwictext=`mktemp /tmp/temp.XXXXXX` kwic "$@" > kwictext paste -d "," <(kwic2csv kwictext) <(kwicex $@ kwictext |text2csv) rm kwictext }
純若
kwic.awk 阪 sort 潟潟ф翫罨<箴2,1,4,5 潟篏帥茲翫鐚紊ф絖絨絖阪ャ荐埌∴茯腥榊純∴激с潟ф腓冴鐚
$ kwic a=2 b=2 t=word sample.txt | sort -bdf -k 2,2 -k 1,1 -k 4,5 Total hits: 16 out of 30355 words \ A word is called 0077_002 sample.txt -- a word unknown to 0328_374 sample.txt disrespectful a word, he must 0152_130 sample.txt (a convenient word of our 0038_058 sample.txt indeed; every word has its 0158_317 sample.txt that every word spoken by 0292_091 sample.txt Had the word been in 0216_026 sample.txt if the word be permitted; 0309_019 sample.txt known the word). Indeed, was 0214_119 sample.txt of the word; and in 0307_181 sample.txt of the word; but to 0047_151 sample.txt of the word. He gives 0286_117 sample.txt of the word he never 0077_214 sample.txt of the word, is Harold 0314_201 sample.txt of the word, would assuredly 0071_067 sample.txt a vile word -- which, 0215_209 sample.txt
Windows 潟潟祉潟罕сmsg 激с潟1 0 鐚 2鐚awk gawk 3.1.5 for Windows, 純若 sortf (c) 莟絣倶d2001鐚 http://www.vector.co.jp ユ鐚帥softf GNU sort 篌若激с潟絎号ゃ sort 潟潟
C:\tmp>gawk -f kwic.awk msg=0 target="word" sample.txt | sortf -bdf +0 -1 +2 -3
箴ゃ KWIC 潟潟潟若潟鴻茵荐膊純篏
$ csvkwic t=squalid b=2 a=2 sample.txt > sample.csv Total hits: 5 out of 30355 words
茵荐膊純ц粋昭с帥
c桁鐚純若茵荐膊純違鐚篁ヤcsvkwic 茵cу憗
障sample.txt squalid 茯緇 2 茯 kwic 綣翫kwic.txt 篆絖с筝荳荀
$ kwic t="squalid" b=2 a=2 sample.txt > kwic.txt Total hits: 5 out of 30355 words $ cat kwic.txt limbo of squalid London; his 0038_258 sample.txt starved and squalid; but of 0083_170 sample.txt fog and squalid pitfalls, amid 0160_205 sample.txt origin and squalid life, with 0218_104 sample.txt the same squalid defeat; yet 0244_287 sample.txt
CSV <ゃ紊帥
$ kwic2csv kwic.txt > kwic.csv $ cat kwic.csv "limbo","of","squalid","London;","his","0038_258","sample.txt" "starved","and","squalid;","but","of","0083_170","sample.txt" "fog","and","squalid","pitfalls,","amid","0160_205","sample.txt" "origin","and","squalid","life,","with","0218_104","sample.txt" "the","same","squalid","defeat;","yet","0244_287","sample.txt"
ID 篏帥c緇箴篏荀
$ sfind id=0244_287 sample.txt Sampson Brass and his sister, whose crime against society is much 鐚筝ワ pass their later years in the same <squalid> defeat; yet 鐚筝ワ of her rascality. (L0208 W287 sample.txt)
蚊ゃ蚊ゃ√茵 kwic.txt 絲上箴 sample.txt 篏帥
$ kwicex kwic.txt > example.txt
箴 (esample.txt) 茵 1 潟 CSV <ゃ (example.csv) 篏KWIC 潟潟潟若潟鴻 CSV <ゃ鐚kwic.csv鐚篏帥
$ text2csv example.txt > example.csv $ paste -d "," kwic.csv example.csv > sample.csv
荅帥違ゃ潟若鴻篏帥
c荅帥違ゃ鴻篏帥c帥箴word 荅翫綵√壕ゃ茯帥鴻帥
絎号ユс Linux ссTreeTagger (http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/) 篏帥c帥荅帥違ャ≪宴若激с潟鴻<ゃュ筝Penn-Treebank tagset 鐚http://www.cis.upenn.edu/~treebank/home.html鐚緇c鐚絨≦宍鐚帥違ゃ阪
$ treetagger_root/cmd/tree-tagger-english sample.txt | sed -e 's/SENT/\ /' > sample.tag $ cat sample.tag More JJR more than IN than thirty CD thirty years NNS year have VHP have elapsed VVN elapse . SENT .
sample.txt 鴻<ゃ TreeTagger 帥阪 3 潟阪c冴1 潟鴻с2 潟荅帥違罨<祉潟潟鴻腥肴с祉篏帥сsed у綏ャsample.tag 篆絖 sample.tag ュkwic.awk 帥
茯阪荐垩絎激с鰹FS鐚ゃ壕(с \n Windows сc吟 \r\n 鐚阪荐垩絎激с鰹RS鐚腥肴鐚с \n\n Windows с \r\n\r\n 鐚絎違壕ャ<ゃ鴻吾宴
潟違激с target ゃ.*\tJJ\t.*\n.*\tNNS?\tword
篁紙茯 [TAB] JJ [TAB] 篁紙潟 [壕] 篁紙茯 [TAB] NN 障 NNS [TAB] 潟 word
茵憗JJ 綵√壕NN 筝荅莖荅NNS 筝荅茲医就ゃ障綵√壕鐚潟 word 荅帥若潟罎膣≪
trunc="\t" 帥違ゃ茯帥遺札緇鐚帥篁ュ鐚с腟荀ャ篁鴻
阪 2 茯鐚ゃ障 word 茯鐚с純若帥
$ kwic RS="\n\n" FS="\n" t=".*\tJJ\t.*\n.*\tNNS?\tword" trunc="\t" sample.tag | sort -bdf -k 2,2 Total hits: 10 out of 34702 words a convenient word of 0006_0023 sample.tag \ Hard words are 0595_0001 sample.tag In other words , 0405_0002 sample.tag in other words , 0147_0027 sample.tag in other words , 0876_0023 sample.tag 's own words ) 0328_0021 sample.tag In plain words , 0898_0002 sample.tag some such words as 0743_0030 sample.tag a vile word -- 0728_0030 sample.tag the weighty words of 0296_0018 sample.tag
word(s) 綵√壕冴convenient, hard, other, own, plain, such, vile, wighty
ゃс茯篏臀若随ID鐚篏帥c祉潟潟鴻緇帥RS="\n\n" FS="\n" 激с潟筝鏆荀trunc 荐絎荀ャ
$ sfind RS="\n\n" FS="\n" trunc="\t" id=0898_0002 sample.tag In <plain> words , then , we are speaking of a very loathsome creature ; a sluttish , drunken , avaricious , dishonest woman . (L0898 W0002 sampl e.tag)
糸墾
茵 1 茯糸墾違糸墾ゃ≪<純若帥
gawk -f kwic.awk target="words?" sample.txt | awk '{print $1}' | sort -bfd | uniq -ci | sort -bdf -k 1nr,1 -k 2,2
茵 2 茯腟糸墾違糸墾ゃ緇筝ゃゃ若篏帥≪<純若帥
gawk -f kwic.awk before=2 target="words?" sample.txt | awk '{print $1, $2}' | sort -bfd -k 2,2 -k 1,1 | uniq -ci | sort -bdf -k 1nr,1 -k 3,3 -k 2,2