1. ホーム
  2. スクリプト・コラム
  3. パイソン

Python jiabaライブラリの使用方法について説明

2022-01-02 11:46:10

jiabaライブラリの使用方法

jieba ライブラリは、Pythonのための優れたサードパーティ製中国語単語分割ライブラリです。 jieba は、正確モード、完全モード、検索エンジンモードの3つのモードをサポートしており、ここでは、3つのモードの特徴を紹介します。

正確モード : ステートメントを最も正確にスライス&ダイスしようとする。

フルモード : 文中に含まれる可能性のあるすべての単語をスライスし、高速に処理するが、冗長なデータが存在する

サーチエンジンモード : 正確なパターンに基づいて、長い単語を再びスライスする

1. jiebaライブラリのインストール

Fully automatic installation: easy_install jieba or pip install jieba / pip3 install jieba


  • セミオートマチックインストール:最初のダウンロード http://pypi.python.org/pypi/jieba をクリックし、解凍して実行します。 python setup.py install
  • 手動インストール:jiebaディレクトリをカレントディレクトリまたはsite-packagesディレクトリに配置します。
  • import jiebaによる参照

以下のように、完全に自動でインストールされます。

Win+R ==> cmd
pip install jieba

python.exe -m pip install --upgrade pip

インストール時にpipのバージョンエラーが発生することがある

pipディレクトリに移動してアップデートするだけです


pipディレクトリに移動して、アップデートするだけです。

pyCharm

The past few days have been quite unsettling. Tonight, sitting in the courtyard, I suddenly remembered that the lotus pond, which I walked past day after day, should have a different look in the light of the full moon. The moon was rising, and the laughter of the children on the road outside the wall was no longer audible; my wife was humming a sleep song in a daze as she slapped a leap in the house. I quietly put on my big shirt and took the door with me to go out. Along the lotus pond is a small winding road of coal dust. This is a secluded road; few people walk during the day, and it is even lonelier at night. Around the lotus pond, many trees grew, luxuriant (wěng) and lush. On the side of the road, there are some willows and some trees whose names are unknown. On nights when there is no moonlight, the road is gloomy and a bit scary. Tonight is very good, although the moonlight is still faint. I was the only one on the road, pacing with my hands behind my back (duó). This world seems to be mine; I am also like beyond my usual self, in another world. I love to be lively, but also love to be calm; I love to live in groups, but also love to be alone. Like tonight, under the pale moon, I can think about everything and nothing, and I feel free. The things I must do and say during the day can now be ignored. This is the beauty of being alone, and I will enjoy the endless lotus fragrance and moonlight. The pond of lotus flowers is full of leaves. The leaves are very high out of the water, like the skirt of a pavilion dancer. In the middle of the layers of leaves, there are some white flowers dotted sporadically, some curling (niǎo,nuó) open, some shyly beating; just like a grain of pearl, and like the stars in the blue sky, and like the beauty just out of the bath. The breeze sent wisps of fragrance, as if a distant tall building's faint song. At this time, the leaves and flowers also had a slight tremor, like lightning, which suddenly spread across the lotus pond. The leaves were shoulder-to-shoulder and close together, so there was a solid blue ripple. The leaves are veined underneath (mò) the flowing water, which is covered and cannot be seen in some colors; but the leaves are more beautiful. Moonlight, like flowing water, quietly cascaded on the leaves and flowers. A thin green mist floated in the lotus pond. The leaves and flowers seemed to be washed in buttermilk; and they were like a dream covered with light veils. Although it was a full moon, there was a light cloud in the sky, so it could not shine brightly; but I thought it was just the right time - a sound sleep is indispensable, but a nap is also a special flavor. The moonlight was shining through the trees, and the bushes high up in the sky were falling in dappled black shadows, as craggy as ghosts; the sparse shadows of the curved willows were painted on the lotus leaves. The moonlight in the pond was not uniform; but the light and shadows had a harmonious melody, like the famous song played on the Van Gogh (ē) Ling (the translation of the English violin). On all sides of the lotus pond, far and near, high and low are trees, and willows are the most numerous. These trees surrounded the pond; only a few gaps were left on the side of the path, as if for moonlight. The color of the trees was always shady, and at first glance it looked like a cloud of smoke; but the richness of the willows was discernible even in the smoke. The tops of the trees were faintly covered with distant hills, with only some carelessness. There are one or two road lights leaking through the trees, and they are the eyes of a thirsty sleeper. The most lively thing at this time is the sound of cicadas in the trees and frogs in the water; but the lively thing is theirs, I have nothing. Suddenly, I remembered about lotus picking. It seems to be an old custom in the south of the Yangtze River, and it was prevalent during the Six Dynasties; I know it from the poems. The lotus-pickers are young women, who go there in small boats, singing lively songs. It goes without saying that there were many lotus-pickers and people watching the lotus-picking. It was a lively season and also a season of flirtation. Emperor Liang Yuandi's "Cai Lian Fu" said well: "So the demon child Yuan (yuà) is a young woman. So the demon children Yuan (yuàn) women, swinging the boat heart Xu; tidy (yì) head Xu back, and pass the feather cup; scull (zhào) will move and algae hanging, the boat wants to move and Ping open. You are slender waist bundle of vegetation, delayed GuBu; summer beginning of spring, the leaves are tender flowers, afraid to stain the clothes and shallow smile, fear to pour the boat and converge the train (jū). You can see the scenery of the playfulness at that time. This is really interesting, but unfortunately, we have long been unable to enjoy it. Then I remembered the lines from "Xi Zhou Qu". The lotus seeds are as clear as water when you look down and get them. If there are lotus pickers tonight, the lotus here is also considered to be "over the head"; only the shadow of some flowing water, is not possible. This makes me think about Jiangnan in the end. --I was thinking about this, and I looked up, but I was already in front of my own door; I gently pushed the door in, and there was no sound, my wife had been asleep for a long time. でインストールします。

設定を開き、Project Interpreterを検索し、右側のウィンドウの+記号を選択し、クリックし、検索ボックスの中に
jiebaとインストールをクリックします。

2、ハスの単語頻度統計

ロータス.txt

from, since, since, fight, to, to, to, in, by, to, in, to, while, when, when with, along, along with
by, according to, follow, in accordance with, by, in the spirit of, with, through, according to, take, than
Because, because, due to, for, for the sake of, for the sake of
be, give, let, call, return, by, put, will, manage
to, for, about, with, and, to, for, to, to, with, except
with, and, with, with, and, or, and
And, and, and, and, or
not only, not only, although, but, however, if, with, because, so
of, got, ground
with, had, over
also, this, in
like, like, generally
to, even, we, the
the, the, bar, it, ah, with, well, chant, just, just, also, also, la, my, with, with
What, what, ah, bar
(a), (b), (c), (d). The following is a list of the most important information about the company.


中国語dummy.txt

```python
import jieba
# Read the contents of a file
def read_content():
    f = open("lotus.txt", encoding='utf-8') # set the file encoding format when reading
    content = f.read()
    f.close()
    return content
# Print the information
def print_info(values=[]):
    for item in values:
        print(item)
# Main function
if __name__ == '__main__':
    # print_info(read_content())
    content = read_content()
    article = jieba.lcut(content) # splitting characters into word lists
    dic = {}
    for word in article:
        if word not in dic:
            dic[word] = 1
        else:
            dic[word] += 1
    swd = sorted(list(dic.items()), key=lambda lst: lst[1], reverse=True) # count the number of occurrences of each word, sort from highest to lowest
    f1 = open('Chinese dummy words.txt', encoding="utf-8") # exclude those dummy words, conjunctions, punctuation, etc.
    stop_wds = f1.read()
    f1.close()
    for kword, times in swd:
        if kword not in stop_wds: # If the current word is not included in the excluded words, the number of occurrences is lost
            print(kword, times)


コード

```python
import jieba
# Read the contents of a file
def read_content():
    f = open("lotus.txt", encoding='utf-8') # set the file encoding format when reading
    content = f.read()
    f.close()
    return content
# Print the information
def print_info(values=[]):
    for item in values:
        print(item)
# Main function
if __name__ == '__main__':
    # print_info(read_content())
    content = read_content()
    article = jieba.lcut(content) # splitting characters into word lists
    dic = {}
    for word in article:
        if word not in dic:
            dic[word] = 1
        else:
            dic[word] += 1
    swd = sorted(list(dic.items()), key=lambda lst: lst[1], reverse=True) # count the number of occurrences of each word, sort from highest to lowest
    f1 = open('Chinese dummy words.txt', encoding="utf-8") # exclude those dummy words, conjunctions, punctuation, etc.
    stop_wds = f1.read()
    f1.close()
    for kword, times in swd:
        if kword not in stop_wds: # If the current word is not included in the excluded words, the number of occurrences is lost
            print(kword, times)


実行中の結果です。

概要

この記事はこれで終わりです。この記事があなたの助けになることを願っていますし、BinaryDevelopの他のコンテンツにもっと注目していただけることを願っています