在Python中,类比于pymysql包连接MySQL数据库,可以利用impala包的impala.dbapi连接Hive数据库,建立起来的连接和游标cursor.execute(sql)之后,后续查询操作基本上与pymysql相似。所需要的库的安装过程如下:

参考:

安装环境:Ubuntu20.04 LTS

按以下步骤来安装即可:

1
2
3
4
5
6
pip install six
pip install bit_array
pip install thriftpy
apt-get install python-dev libsasl2-dev gcc
pip install thrift_sasl
pip install impyla

其中apt-get install python-dev libsasl2-dev gcc是为了解决在pip install thrift_sasl中需要编译而缺少相应软件产生的如下报错(Windows系统可参考这里):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting thrift_sasl
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/73/d3/588654faef5511afadc1a091d32fcdbb24ae5f2d90b380874aee68a717f9/thrift_sasl-0.4.2.tar.gz
Collecting thrift>=0.10.0 (from thrift_sasl)
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/97/1e/3284d19d7be99305eda145b8aa46b0c33244e4a496ec66440dac19f8274d/thrift-0.13.0.tar.gz (59kB)
|████████████████████████████████| 61kB 684kB/s
Collecting sasl>=0.2.1 (from thrift_sasl)
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/8e/2c/45dae93d666aea8492678499e0999269b4e55f1829b1e4de5b8204706ad9/sasl-0.2.1.tar.gz
Collecting six>=1.13.0 (from thrift_sasl)
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/65/eb/1f97cb97bfc2390a276969c6fae16075da282f5058082d4cb10c6c5c1dba/six-1.14.0-py2.py3-none-any.whl
Building wheels for collected packages: thrift-sasl, thrift, sasl
Building wheel for thrift-sasl (setup.py) ... done
Created wheel for thrift-sasl: filename=thrift_sasl-0.4.2-cp37-none-any.whl size=4010 sha256=d9aff46bdb4423f147da5e2809198b9feff9d54259b627c6b6d716640b3cc842
Stored in directory: /root/.cache/pip/wheels/92/22/93/59527f7435acb500da2c80d4eb038377e752009fa47e842fba
Building wheel for thrift (setup.py) ... done
Created wheel for thrift: filename=thrift-0.13.0-cp37-cp37m-linux_x86_64.whl size=414111 sha256=113f6ddd0744dea046e5d8764c5858c9d397ca9d910004719662e6ccda1c9792
Stored in directory: /root/.cache/pip/wheels/dc/f4/14/0cd659ffc6431d0a24534f04087f6239494daf4fb3531c542a
Building wheel for sasl (setup.py) ... error
ERROR: Command errored out with exit status 1:
command: /home/cuper/anaconda3/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-zp2a5lr8/sasl/setup.py'"'"'; __file__='"'"'/tmp/pip-install-zp2a5lr8/sasl/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-fc4awf9y --python-tag cp37
cwd: /tmp/pip-install-zp2a5lr8/sasl/
Complete output (30 lines):
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.7
creating build/lib.linux-x86_64-3.7/sasl
copying sasl/__init__.py -> build/lib.linux-x86_64-3.7/sasl
running egg_info
writing sasl.egg-info/PKG-INFO
writing dependency_links to sasl.egg-info/dependency_links.txt
writing requirements to sasl.egg-info/requires.txt
writing top-level names to sasl.egg-info/top_level.txt
reading manifest file 'sasl.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'sasl.egg-info/SOURCES.txt'
copying sasl/saslwrapper.cpp -> build/lib.linux-x86_64-3.7/sasl
copying sasl/saslwrapper.h -> build/lib.linux-x86_64-3.7/sasl
copying sasl/saslwrapper.pyx -> build/lib.linux-x86_64-3.7/sasl
running build_ext
building 'sasl.saslwrapper' extension
creating build/temp.linux-x86_64-3.7
creating build/temp.linux-x86_64-3.7/sasl
gcc -pthread -B /home/cuper/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Isasl -I/home/cuper/anaconda3/include/python3.7m -c sasl/saslwrapper.cpp -o build/temp.linux-x86_64-3.7/sasl/saslwrapper.o
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
In file included from sasl/saslwrapper.cpp:254:
sasl/saslwrapper.h:22:10: fatal error: sasl/sasl.h: 没有那个文件或目录
22 | #include <sasl/sasl.h>
| ^~~~~~~~~~~~~
compilation terminated.
error: command 'gcc' failed with exit status 1
----------------------------------------
ERROR: Failed building wheel for sasl
Running setup.py clean for sasl
Successfully built thrift-sasl thrift
Failed to build sasl
ERROR: astroid 2.3.1 requires typed-ast<1.5,>=1.4.0; implementation_name == "cpython" and python_version < "3.8", which is not installed.
ERROR: astroid 2.3.1 has requirement six==1.12, but you'll have six 1.14.0 which is incompatible.
Installing collected packages: six, thrift, sasl, thrift-sasl
Found existing installation: six 1.12.0
Uninstalling six-1.12.0:
Successfully uninstalled six-1.12.0
Running setup.py install for sasl ... error
ERROR: Command errored out with exit status 1:
command: /home/cuper/anaconda3/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-zp2a5lr8/sasl/setup.py'"'"'; __file__='"'"'/tmp/pip-install-zp2a5lr8/sasl/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-fm0zp25n/install-record.txt --single-version-externally-managed --compile
cwd: /tmp/pip-install-zp2a5lr8/sasl/
Complete output (30 lines):
running install
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.7
creating build/lib.linux-x86_64-3.7/sasl
copying sasl/__init__.py -> build/lib.linux-x86_64-3.7/sasl
running egg_info
writing sasl.egg-info/PKG-INFO
writing dependency_links to sasl.egg-info/dependency_links.txt
writing requirements to sasl.egg-info/requires.txt
writing top-level names to sasl.egg-info/top_level.txt
reading manifest file 'sasl.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'sasl.egg-info/SOURCES.txt'
copying sasl/saslwrapper.cpp -> build/lib.linux-x86_64-3.7/sasl
copying sasl/saslwrapper.h -> build/lib.linux-x86_64-3.7/sasl
copying sasl/saslwrapper.pyx -> build/lib.linux-x86_64-3.7/sasl
running build_ext
building 'sasl.saslwrapper' extension
creating build/temp.linux-x86_64-3.7
creating build/temp.linux-x86_64-3.7/sasl
gcc -pthread -B /home/cuper/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Isasl -I/home/cuper/anaconda3/include/python3.7m -c sasl/saslwrapper.cpp -o build/temp.linux-x86_64-3.7/sasl/saslwrapper.o
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
In file included from sasl/saslwrapper.cpp:254:
sasl/saslwrapper.h:22:10: fatal error: sasl/sasl.h: 没有那个文件或目录
22 | #include <sasl/sasl.h>
| ^~~~~~~~~~~~~
compilation terminated.
error: command 'gcc' failed with exit status 1
----------------------------------------
ERROR: Command errored out with exit status 1: /home/cuper/anaconda3/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-zp2a5lr8/sasl/setup.py'"'"'; __file__='"'"'/tmp/pip-install-zp2a5lr8/sasl/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-fm0zp25n/install-record.txt --single-version-externally-managed --compile Check the logs for full command output.

安装完成后测试Python代码:

1
from impala.dbapi import connect

无报错证明安装成功,即可实现通过Python连接Hive数据库。