HOME > 茯絖 > 沿 > Awk с潟若拷
Awk с潟若拷
GNU Awk 篏帥潟若刻鴻<ゃ鐚KWIC 綣潟潟潟若潟鴻篏阪茲純若CSV 紊茵荐膊純茯粋昭帥遺潟若鴻篏帥c荅絎罎膣≪帥
障潟若鴻若 GUI с若吟祉潟罔羣阪ュ<潟莟р
鐚茗 to 潟若拷潟宴障鴻荅鐚鐚
kwic.awk
鴻若鴻у阪茯c罕鴻<ゃ茵鐚筝膓鐚茯罎膣≪KWIC 翫鴻阪 GNU Awk 鴻鐚ptx 帥鐚
$ gawk -f kwic.awk target="squalid" before=2 after=2 sample.txt
limbo of squalid London; his 0038_0258 sample.txt
starved and squalid; but of 0083_0170 sample.txt
fog and squalid pitfalls, amid 0160_0205 sample.txt
origin and squalid life, with 0218_0104 sample.txt
the same squalid defeat; yet 0244_0287 sample.txt
Total hits: 5 out of 30355 words
純若 sort 潟潟罔羣阪ュ<潟 :-)
潟若
2008綛220ユ壕鐚帥違ゃ潟若合 trunc 激с潟菴遵激с喝сtarget t, before b, after a ュ筝鐚
後
gawk -f kwic.awk OPTIONS input-file [input-file ...]
阪
茵茯臂 帥若蚊茯臂 緇膓茯臂 篏茵_篏茯 <ゃ
激с
鐚罎膣鐚
target 障 t 罎膣√茯罩h頫憗筝"" у茯絲乗院鐚鐚
荀篁ヤ
love of 筝障 FS 鐚鐚c若
祉若帥筝や札筝鴻若刻
阪荐垩茲違茯蚊love of 緇
鐚strict="" 鐚с茯茯
^[^a-zA-Z]* 茯絨障 [^a-zA-Z]*$
^[^a-zA-Z]*love[^a-zA-Z]*$ ^[^a-zA-Z]*of[^a-zA-Z]*$
緇
FS 阪荐垩蚊ュ<ゃ筝茯
罩h頫憗筝ゃゃ
鐚IGNORECASE=1鐚с紊ф絖絨絖阪ャ
篁 GNU Awk 泣若罩h頫憗障鞘戎
1 茯障 2 茯茯違羆冴障c罩h
茵憗筝号
[箴]
word word word,
loved? love loved
love.* love lover lovely
as .* as as soon as as far as
(have|had) have had
like[ds]? like liked likes
strict ゃ 0 篁ュс違
target ゃ茹i藥 ^[^a-zA-Z]* 臀
[^a-zA-Z]*$ 緇臀茵 0
ctarget="love" strict=1
lovely, glove, beloved
strict 激с潟с generous :-)
IGNORECASE GNU Awk 激с潟 "" 潟違
紊ф絖絨絖阪ャ
kwic.awk 1
lselect target 茵腟莨若
ュ茵絲障罩h頫憗筝
target 筝ゃ筝紊翫綺筝
鐚茵腓削
before 障 b 帥若蚊茯臂ゃ茵茯よ;腓冴 1
after 障 a 帥若蚊茯臂ゃ緇膓茯よ;腓冴 1
width, offset 帥若蚊茯臂よ;腓阪鐚"offset" + "width" * (帥若蚊茯-1)
茵茯臂よ;腓阪鐚"offset" + "width" * ("before"-1)
茵茯臂よ;腓阪鐚"offset" + "width" * ("before"-1)
offset=10, width=7
pad 茵茯緇膓茯違 before after ゃ羣
篁c腥堺茵腓冴c帥 \
id 1 с<ゃ筝罎膣√茯臂ゅ茯篏臀茵腓冴
0 с罩≪
篏茵_篏茯綵√<ゃ篏帥
sfind.awk 茯祉潟潟鴻緇с
1
filename 1 с絲障<ゃ茵腓冴0 ф罩≪
1
msg 2 с違紫茯違罔羣若阪鐚鐚
1 с罔羣阪0 с阪吟
GNU Awk Windows (DOS) 筝ц軌c翫0 1
trunc ゃ "" 篁ュ茯帥若割札緇茵腓冴
帥違ゃ鴻帥違茵腓冴
""
idformat 茯鐚鐚ゃ若awk fprint 後
%04d_%04d
fnameformat <ゃ茵腓冴若awk fprint 後
%-8s
鐚篁鐚
RS GNU Awk 激с潟ュ茵阪
kwic.awk "\r\n|\r|\n"鐚壕鐚
FS GNU Awk 激с潟茯阪
kwic.awk " *"鐚筝や札筝鴻若刻
腮 GNU Awk сュ<ゃ腟茵壕хc<ゃ緇c帥茯粋昭障
腮 GNU Awk сIGNORECASE=1 鐚kwic.awk с鐚с\W 罩h頫憗鴻宴若激若宴潟鴻 \w 荅箴<
sfind.awk
篏茵篏茯筝鴻<ゃ茯祉潟潟鴻緇 GNU Awk 鴻鐚篏茵篏茯 kwic.awk 阪鐚
$ gawk -f sfind.awk id=0244_0287 sample.txt Sampson Brass and his sister, whose crime against society is much more serious, pass their later years in the same <squalid> defeat; yet we f eel assured that the virile Sally, at all events, made a much better f ight against the consequences of her rascality. (L0244 W0287 sample.tx t)
潟若
後
gawk -f sfind.awk OPTIONS input-file
激с
id 倶_倶違筝筝ょ倶違純若鴻
<ゃ筝罎膣√茯障茵垩
篋ょ倶違綵荅画с綵荅峨茯篏茯
de id 篋ゃ倶違阪筝 "_"
es 綣決粋;腓冴罎膣√茯水ャ絖 <
ee 綣決粋;腓冴罎膣√茯緇水ャ絖 >
abb Mr. 茯腓冴ゃ帥若潟
罩h頫憗т "Mrs\.|Prof\.|[A-Z][a-z]?\."
RS GNU Awk 激с潟ュ茵阪
sfind.awk "\r\n|\r|\n"鐚壕鐚
FS GNU Awk 激с潟茯阪
sfind.awk " *"鐚筝や札筝鴻若刻
trunc ゃ "" 篁ュ茯帥若割札緇茵腓冴
帥違ゃ鴻帥違茵腓冴
""
莇
腮激сс腮腴篏帥翫罨<罎膣∵莎よ蚊ゃ
$ gawk -f sfind.awk es=$'\e[31m' ee=$'\e[0m' id=0244_0091 sample.txt We learn the tone of voice, the trick of utterance; he declared that every word spoken by his characters was audible to him. (L0244 W0091 sample.txt)
篏睡箴
激с∽育
祉激с潟篏睡箴с帥ゃ潟違膀膣篁ヤ激с∽違 .bashrc 吾篋膊 /PATH/TO/ kwic.awk sfind.awk 臀c臀篏帥
kwic 鴻<ゃ KWIC 綣潟潟潟若潟鴻篏 kwic2csv kwic 阪 csv 紊 sfind 鴻<ゃ茵茯茯緇 text2csv 鴻激ュ阪鴻 csv 紊 kwicex kwic 阪茵絲上 csvkwic 祉潟潟劫篏篁 kwic 潟潟潟若潟鴻 CSV т
# KWIC
kwic ()
{
gawk -f /PATH/TO/kwic.awk "$@"
}
sfind ()
{
gawk -f /PATH/TO/sfind.awk "$@"
}
kwic2csv ()
{
sed -e "
s/^ *//;
s/\"/\"\"/g;
s/ */\",\"'/g;
s/^/\"'/;
s/$/\"/" $*
}
text2csv ()
{
sed -e 's/\"/\"\"/g; s/\\\ */\",\"/g; s/^/\"/; s/$/\"/' $*
}
kwicex ()
{
for n in $@; do fil=$n; done
gawk 'BEGIN{RS="\r\n|\r|\n"}{print $(NF-1) " " $NF}' $fil |
while read line
do
sfind "$@" id=$line
done
}
csvkwic ()
{
kwictext=`mktemp /tmp/temp.XXXXXX`
kwic "$@" > kwictext
paste -d "," <(kwic2csv kwictext) <(kwicex $@ kwictext |text2csv)
rm kwictext
}
純若
kwic.awk 阪 sort 潟潟ф翫罨<箴2,1,4,5 潟篏帥茲翫鐚紊ф絖絨絖阪ャ荐埌∴茯腥榊純∴激с潟ф腓冴鐚
$ kwic a=2 b=2 t=word sample.txt | sort -bdf -k 2,2 -k 1,1 -k 4,5
Total hits: 16 out of 30355 words
\ A word is called 0077_002 sample.txt
-- a word unknown to 0328_374 sample.txt
disrespectful a word, he must 0152_130 sample.txt
(a convenient word of our 0038_058 sample.txt
indeed; every word has its 0158_317 sample.txt
that every word spoken by 0292_091 sample.txt
Had the word been in 0216_026 sample.txt
if the word be permitted; 0309_019 sample.txt
known the word). Indeed, was 0214_119 sample.txt
of the word; and in 0307_181 sample.txt
of the word; but to 0047_151 sample.txt
of the word. He gives 0286_117 sample.txt
of the word he never 0077_214 sample.txt
of the word, is Harold 0314_201 sample.txt
of the word, would assuredly 0071_067 sample.txt
a vile word -- which, 0215_209 sample.txt
Windows 潟潟祉潟罕сmsg 激с潟1 0 鐚 2鐚awk gawk 3.1.5 for Windows, 純若 sortf (c) 莟絣倶d2001鐚 http://www.vector.co.jp ユ鐚帥softf GNU sort 篌若激с潟絎号ゃ sort 潟潟
C:\tmp>gawk -f kwic.awk msg=0 target="word" sample.txt | sortf -bdf +0 -1 +2 -3
箴ゃ KWIC 潟潟潟若潟鴻茵荐膊純篏
$ csvkwic t=squalid b=2 a=2 sample.txt > sample.csv Total hits: 5 out of 30355 words
茵荐膊純ц粋昭с帥

c桁鐚純若茵荐膊純違鐚篁ヤcsvkwic 茵cу憗
障sample.txt squalid 茯緇 2 茯 kwic 綣翫kwic.txt 篆絖с筝荳荀
$ kwic t="squalid" b=2 a=2 sample.txt > kwic.txt
Total hits: 5 out of 30355 words
$ cat kwic.txt
limbo of squalid London; his 0038_258 sample.txt
starved and squalid; but of 0083_170 sample.txt
fog and squalid pitfalls, amid 0160_205 sample.txt
origin and squalid life, with 0218_104 sample.txt
the same squalid defeat; yet 0244_287 sample.txt
CSV <ゃ紊帥
$ kwic2csv kwic.txt > kwic.csv $ cat kwic.csv "limbo","of","squalid","London;","his","0038_258","sample.txt" "starved","and","squalid;","but","of","0083_170","sample.txt" "fog","and","squalid","pitfalls,","amid","0160_205","sample.txt" "origin","and","squalid","life,","with","0218_104","sample.txt" "the","same","squalid","defeat;","yet","0244_287","sample.txt"
ID 篏帥c緇箴篏荀
$ sfind id=0244_287 sample.txt Sampson Brass and his sister, whose crime against society is much 鐚筝ワ pass their later years in the same <squalid> defeat; yet 鐚筝ワ of her rascality. (L0208 W287 sample.txt)
蚊ゃ蚊ゃ√茵 kwic.txt 絲上箴 sample.txt 篏帥
$ kwicex kwic.txt > example.txt
箴 (esample.txt) 茵 1 潟 CSV <ゃ (example.csv) 篏KWIC 潟潟潟若潟鴻 CSV <ゃ鐚kwic.csv鐚篏帥
$ text2csv example.txt > example.csv $ paste -d "," kwic.csv example.csv > sample.csv
荅帥違ゃ潟若鴻篏帥
c荅帥違ゃ鴻篏帥c帥箴word 荅翫綵√壕ゃ茯帥鴻帥
絎号ユс Linux ссTreeTagger (http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/) 篏帥c帥荅帥違ャ≪宴若激с潟鴻<ゃュ筝Penn-Treebank tagset 鐚http://www.cis.upenn.edu/~treebank/home.html鐚緇c鐚絨≦宍鐚帥違ゃ阪
$ treetagger_root/cmd/tree-tagger-english sample.txt | sed -e 's/SENT/\ /' > sample.tag $ cat sample.tag More JJR more than IN than thirty CD thirty years NNS year have VHP have elapsed VVN elapse . SENT .
sample.txt 鴻<ゃ TreeTagger 帥阪 3 潟阪c冴1 潟鴻с2 潟荅帥違罨<祉潟潟鴻腥肴с祉篏帥сsed у綏ャsample.tag 篆絖 sample.tag ュkwic.awk 帥
茯阪荐垩絎激с鰹FS鐚ゃ壕(с \n Windows сc吟 \r\n 鐚阪荐垩絎激с鰹RS鐚腥肴鐚с \n\n Windows с \r\n\r\n 鐚絎違壕ャ<ゃ鴻吾宴
潟違激с target ゃ.*\tJJ\t.*\n.*\tNNS?\tword
篁紙茯 [TAB] JJ [TAB] 篁紙潟 [壕] 篁紙茯 [TAB] NN 障 NNS [TAB] 潟 word
茵憗JJ 綵√壕NN 筝荅莖荅NNS 筝荅茲医就ゃ障綵√壕鐚潟 word 荅帥若潟罎膣≪
trunc="\t" 帥違ゃ茯帥遺札緇鐚帥篁ュ鐚с腟荀ャ篁鴻
阪 2 茯鐚ゃ障 word 茯鐚с純若帥
$ kwic RS="\n\n" FS="\n" t=".*\tJJ\t.*\n.*\tNNS?\tword" trunc="\t" sample.tag |
sort -bdf -k 2,2
Total hits: 10 out of 34702 words
a convenient word of 0006_0023 sample.tag
\ Hard words are 0595_0001 sample.tag
In other words , 0405_0002 sample.tag
in other words , 0147_0027 sample.tag
in other words , 0876_0023 sample.tag
's own words ) 0328_0021 sample.tag
In plain words , 0898_0002 sample.tag
some such words as 0743_0030 sample.tag
a vile word -- 0728_0030 sample.tag
the weighty words of 0296_0018 sample.tag
word(s) 綵√壕冴convenient, hard, other, own, plain, such, vile, wighty
ゃс茯篏臀若随ID鐚篏帥c祉潟潟鴻緇帥RS="\n\n" FS="\n" 激с潟筝鏆荀trunc 荐絎荀ャ
$ sfind RS="\n\n" FS="\n" trunc="\t" id=0898_0002 sample.tag In <plain> words , then , we are speaking of a very loathsome creature ; a sluttish , drunken , avaricious , dishonest woman . (L0898 W0002 sampl e.tag)
糸墾
茵 1 茯糸墾違糸墾ゃ≪<純若帥
gawk -f kwic.awk target="words?" sample.txt |
awk '{print $1}' | sort -bfd | uniq -ci | sort -bdf -k 1nr,1 -k 2,2
茵 2 茯腟糸墾違糸墾ゃ緇筝ゃゃ若篏帥≪<純若帥
gawk -f kwic.awk before=2 target="words?" sample.txt |
awk '{print $1, $2}' | sort -bfd -k 2,2 -k 1,1 |
uniq -ci | sort -bdf -k 1nr,1 -k 3,3 -k 2,2
