This tool is making Dialogue Model for Slack
Thsi tool has 3 functions
1: Slack Communication 2: Learning by the Chainer 3: Collect Twitter Data
This tool is making the Dialogue Model
If you see the detail about it, you see the below
http://qiita.com/GushiSnow/items/79ca7deeb976f50126d7
If you don't install pyenv and virtualenv you have to install bellow ####Prepare Install linux
apt-get install pyenv
apt-get install virtualenv
Mac
brew install pyenv
brew install virtualenv
brew install homebrew/science/hdf5
####Prepare Inastall2
pyenv install 3.4.1
pyenv rehash
pyenv local 3.4.1
virtualenv -p ~/.pyenv/versions/3.4.1/bin/python3.4 my_env
source my_env/bin/activate
pip install -r requirement.txt
pip install chainer=="1.5.1"
####Prepare the Data
PreTrain
>[Wikipedia for Japanese]https://dumps.wikimedia.org/jawiki/latest/<br>
Train
>[Dialogue Data]https://sites.google.com/site/dialoguebreakdowndetection/<br>
SQLite
touch twitter_data.db
1: You set the wikipedia title data on the word2vec folder
2: Rename the data to jawiki-latest-all-titles-in-ns0
3: You execute the bellow script. You can get the Word2Vec Model
If you try to confirm the this code, you have to reduce the wikipedia data. The below script is the get the 5000 random data
sh random_choice.sh {Wikipedia Title data name} > {Random 5000 Choosing Wikipedia Title data name}
1: You have to make the data
folder
2: You get the Broken Dialogue corpus. And you make the file bellow
dev/〇〇.json
dev/■ ■.json
dev/◇◇◇.json
3: It is possible to split the data player_1
, player_2
in the bellow script
https://github.com/SnowMasaya/Chainer-Slack-Twitter-Dialogue/blob/master/data_load.py
4: You have to split the each word in the sentence. You use mecab library.
And you set the bellow data on the data
folder
player_1_wakati
player_2_wakati
####Prepare Twitter Key
https://apps.twitter.com/
####Prepare enviroment.yml
twitter:
consumer_key: your consumer key
consumer_secret: your consumer secret
token: your api token
token_secret: your token secret
mecab: your mecab dictionary
Slack
slack:
api_token: your api token
channel: your channel
user: your user token
mecab: your mecab dictionary
Installing a library bellow
##Requirements
Python 3.4+
Mecab and neolog-dict
numpy
chainer
ipython
notebook
jinja2
pyzmq
tornado
cython
gensim
PyYAML
requests
requests_oauthlib
djehuty
flask-slackbot
flask
mecab-python
future
websocket-client
####Confirm library
ipython
Learning Chainer
*You execute python
ipython notebook
Slack Communication
*You execute python
cd slack
python app.py
Get the Twitter Data
*You execute python
cd twitter
python twitter_get_usr_timeline.py
python sqlite_twitter.py
Dialogue ipython notebook and Encoder Decoder Model
- slack/ ... Slack Code
- util/ ... Encoder Decoder tools
- twitter/ ... Twitter Code
- word2vec/ ... Word2Vec Code
The MIT License (MIT)
Copyright (c) 2015 Masaya Ogushi
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
Chainer
Python Slack
Chainer Machine Translation
Dialogue Data
Chainer Word2Vec
Wikipedia WordNet 日本語 Wikipedia Entity ベクトル