# 共享单车数据取自福建省「2021数字中国创新大赛大数据赛道」比赛的「城市管理大数据专题」赛事数据的网页
# https://data.xm.gov.cn/contest-series/digit-china-2021/index.html#/3/competition_data
#
# 代码是以立方数据学院-出行数据分析-试学课程提供的代码为基础,结合比赛的赛题解析指导教程后,自行加了一部分内容
# https://www.lifangshuju.com/#/exercise/154/474/1535
# https://coggle.club/learn/dcic2021/task1
#
#
# Python 前置操作:导入要用到的套件并设置路径变量 PATH 的值
#
# 让 Python 不显示 warnings
import warnings
warnings.filterwarnings("ignore")
# import 必要的包
import os, codecs
import pandas as pd
import numpy as np
import geopandas as gpd
import folium
import matplotlib.pyplot as plt
from shapely.geometry import Polygon
#设置路径的环境变量
PATH = '../input/'
#
# 识别停车数据,找出某一时间的停车需求
#
#读取共享单车数据
data_bike = pd.read_csv(PATH + 'gxdc_dd.csv',header = None)
data_bike.head(5)
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
0 | BICYCLE_ID | LATITUDE | LONGITUDE | LOCK_STATUS | UPDATE_TIME |
1 | f8f99ef8d9bd3942580c2f8f5d1232ba | 24.495537 | 118.126619 | 0 | 2020/12/22 6:25:56 |
2 | 8d1abc077be52af3eecf8340f4ea6981 | 24.443596 | 118.083372 | 0 | 2020/12/22 6:25:57 |
3 | 1122da7c68701a8d60df2eb7a89b6452 | 24.485108 | 118.092266 | 1 | 2020/12/22 6:25:58 |
4 | 324943a3613a133055f4f2e4162cef5f | 24.501391 | 118.083 | 1 | 2020/12/22 6:25:59 |
#重命名数据的列
data_bike.columns = ['BIKE_ID','LATITUDE','LONGITUDE','LOCK_STATUS','DATA_TIME']
#对数据根据车辆与时间排序
data_bike = data_bike.sort_values(by = ['BIKE_ID','DATA_TIME'])
#数据列整体上移一行,赋值给新的列
data_bike['BIKE_ID1'] = data_bike['BIKE_ID'].shift(-1)
for i in ['BIKE_ID', 'LATITUDE', 'LONGITUDE', 'LOCK_STATUS', 'DATA_TIME']:
data_bike[i+'1'] = data_bike[i].shift(-1)
#去除非同一车辆的记录
data_bike = data_bike[data_bike['BIKE_ID'] == data_bike['BIKE_ID1']]
#提取其中的停车信息
data_bike = data_bike[data_bike['LOCK_STATUS'] == 1]
#保留有用的列
data_bike = data_bike[['BIKE_ID','LATITUDE','LONGITUDE','DATA_TIME','DATA_TIME1']]
#取某一时间
timstamp = '2020/12/25 8:00:00'
#提取这一时间的停车需求
parking_points = data_bike[(data_bike['DATA_TIME']<=timstamp)&(data_bike['DATA_TIME1']>=timstamp)]
parking_points
BIKE_ID | LATITUDE | LONGITUDE | DATA_TIME | DATA_TIME1 | |
---|---|---|---|---|---|
403452 | 0003fd17bc68116eb180bf44a52f732e | 24.485607 | 118.08957 | 2020/12/24 7:51:59 | 2020/12/25 8:17:04 |
175105 | 00099beec7fae3b22e0623e5ce418a94 | 24.494702 | 118.11861 | 2020/12/23 7:29:52 | 2020/12/25 9:31:02 |
204214 | 0009ae177de68e6e980e9dde661b8893 | 24.489962 | 118.099409 | 2020/12/25 7:21:59 | 2020/12/25 8:05:16 |
213637 | 000b454e98643295e788e53aa5211339 | 24.49453 | 118.077935 | 2020/12/25 7:22:11 | 2020/12/25 8:56:11 |
257663 | 000bad3dd680aafe9e1649f520425cdb | 24.502134 | 118.13618 | 2020/12/25 7:37:10 | 2020/12/25 8:24:07 |
... | ... | ... | ... | ... | ... |
471055 | ffe89d92569993a825f59b922634c436 | 24.479525 | 118.188519 | 2020/12/24 9:29:08 | 2020/12/25 8:35:01 |
525046 | ffec978f70ad5ff0c79fd06556e317e7 | 24.521295 | 118.112611 | 2020/12/24 9:37:33 | 2020/12/25 8:02:29 |
196169 | fff026902d1941f62e887345af0aebd3 | 24.490286 | 118.091058 | 2020/12/25 7:05:05 | 2020/12/25 8:06:25 |
207177 | fffa31a63e060b5dba9a7299935786f4 | 24.465929 | 118.075376 | 2020/12/25 7:46:54 | 2020/12/25 8:07:35 |
472735 | fffd0d92511854adf6fb89064d4e5540 | 24.540753 | 118.138672 | 2020/12/24 6:46:56 | 2020/12/25 8:12:12 |
14626 rows × 5 columns
#
#下面处理电子栅栏(停车点)的地理信息
#
#读取电子栅栏数据
data_fence = pd.read_csv(PATH + 'gxdc_tcd.csv')
#为电子栅栏生成多边形几何信息
geometry = []
for i in range(len(data_fence)):
exec('points = ['+data_fence['FENCE_LOC'].iloc[i]+']')
geometry.append(Polygon(points))
data_fence['geometry'] = geometry
#将数据表转换为geodataframe
data_fence = gpd.GeoDataFrame(data_fence)
data_fence = data_fence.drop('FENCE_LOC',axis = 1)
data_fence.head(10)
FENCE_ID | geometry | |
---|---|---|
0 | 长乐路0_L_A17001 | POLYGON ((118.10320 24.52734, 118.10322 24.527... |
1 | 长乐路0_L_A17002 | POLYGON ((118.10317 24.52730, 118.10320 24.527... |
2 | 长乐路0_L_A17003 | POLYGON ((118.10323 24.52739, 118.10326 24.527... |
3 | 长乐路0_L_A17004 | POLYGON ((118.10326 24.52742, 118.10328 24.527... |
4 | 长乐路0_L_A17005 | POLYGON ((118.10295 24.52700, 118.10298 24.527... |
5 | 长乐路0_L_A17006 | POLYGON ((118.10292 24.52696, 118.10294 24.526... |
6 | 长乐路0_L_A17007 | POLYGON ((118.10258 24.52651, 118.10261 24.526... |
7 | 长乐路0_L_A17008 | POLYGON ((118.10266 24.52661, 118.10268 24.526... |
8 | 长乐路0_L_A17009 | POLYGON ((118.10269 24.52665, 118.10271 24.526... |
9 | 长乐路0_L_A17010 | POLYGON ((118.10272 24.52669, 118.10274 24.526... |
#存储数据,shp格式
data_fence.to_file(r'data_fence')
#存储数据,geojson格式,可上传到 http://geojson.io/ 看地图上的停车点详细分布信息
data_fence.to_file(r'data_fence.json',driver = 'GeoJSON',encoding = 'utf-8')
#在地图上列出所有厦门市的停车点
m = folium.Map(location=[24.482426, 118.136506], zoom_start=13)
folium.GeoJson(
data_fence.to_json(),
name='geojson'
).add_to(m)
m
#
#下面对停车数据与电子栅栏(停车点)数据做空间匹配
#
#要找到哪些停车需求在栅栏内:空间连接法
#geopandas的空间连接方法:
?gpd.sjoin
#将停车需求转化为geopandas,方便后续匹配
parking_points = gpd.GeoDataFrame(parking_points)
parking_points['geometry'] = gpd.points_from_xy(parking_points['LONGITUDE'],parking_points['LATITUDE'])
#空间连接:提取在栅栏内的停车需求
parking_points = gpd.sjoin(parking_points,data_fence,how = 'left')
#保存在栅栏内的停车需求
a = parking_points[-parking_points['FENCE_ID'].isnull()]
#保存在栅栏外的停车需求
b = parking_points[parking_points['FENCE_ID'].isnull()]
#要找到哪些停车需求在栅栏外:KDTree
#提取每个电子栅栏的边界
data_fence_boundary = data_fence.copy()
data_fence_boundary['geometry'] = data_fence_boundary.boundary
#尝试绘制前几个栅栏
data_fence_boundary.iloc[:5].plot()
<AxesSubplot:>
#定义函数,用cKDTree匹配点与点,点与线
import numpy as np
from scipy.spatial import cKDTree
import itertools
from operator import itemgetter
def ckdnearest_point(gdA, gdB):
'''
输入两个geodataframe,gdfA、gdfB均为点,该方法会为gdfA表连接上gdfB中最近的点,并添加距离字段dsit
'''
#提取gdA中的所有点要素
nA = np.array(list(gdA.geometry.apply(lambda x: (x.x, x.y))))
#提取gdB中的所有点要素
nB = np.array(list(gdB.geometry.apply(lambda x: (x.x, x.y))))
#为gdB表的点建立KDTree
btree = cKDTree(nB)
#在gdB的KDTree中查询gdA的点,dist为距离,idx为gdB中离gdA最近的坐标点
dist, idx = btree.query(nA, k=1)
#构建匹配的结果
gdf = pd.concat(
[gdA.reset_index(drop=True), gdB.loc[idx, gdB.columns != 'geometry'].reset_index(drop=True),
pd.Series(dist, name='dist')], axis=1)
return gdf
def ckdnearest_line(gdfA, gdfB):
'''
输入两个geodataframe,其中gdfA为点,gdfB为线,该方法会为gdfA表连接上gdfB中最近的线,并添加距离字段dsit
'''
#提取gdA中的所有点要素
A = np.concatenate(
[np.array(geom.coords) for geom in gdfA.geometry.to_list()])
#把gdfB的几何坐标提取到B,此时B为一个大list中包含多个小list,每个小list代表一个几何图形,小list中为坐标
#B=[[[要素1坐标1],[要素1坐标2],...],[[要素2坐标1],[要素2坐标2],...]]
B = [np.array(geom.coords) for geom in gdfB.geometry.to_list()]
#B_ix代表B中的每个坐标点分别属于B中的哪个几何图形
B_ix = tuple(itertools.chain.from_iterable(
[itertools.repeat(i, x) for i, x in enumerate(list(map(len, B)))]))
#把B表展开,B=[[要素1坐标1],[要素1坐标2],...,[要素2坐标2],[要素2坐标2],...]
B = np.concatenate(B)
#为B表建立KDTree
ckd_tree = cKDTree(B)
#在B的KDTree中查询A的点,dist为距离,idx为B中离A最近的坐标点
dist, idx = ckd_tree.query(A, k=1)
#由坐标点对应到几何要素
idx = itemgetter(*idx)(B_ix)
#构建匹配的结果
gdf = pd.concat(
[gdfA.reset_index(drop=True), gdfB.loc[idx, gdfB.columns != 'geometry'].reset_index(drop=True),
pd.Series(dist, name='dist').reset_index(drop=True)], axis=1)
return gdf
#把栅栏外的停车点匹配到最近栅栏
b = ckdnearest_line(b, data_fence_boundary)
#整理数据然后组合
a = a[['BIKE_ID', 'LATITUDE', 'LONGITUDE', 'DATA_TIME', 'DATA_TIME1',
'geometry','FENCE_ID']]
b.columns = ['BIKE_ID', 'LATITUDE', 'LONGITUDE', 'DATA_TIME', 'DATA_TIME1',
'geometry', 'index_right', 'FENCE_ID1', 'FENCE_ID', 'dist']
b = b[['BIKE_ID', 'LATITUDE', 'LONGITUDE', 'DATA_TIME', 'DATA_TIME1',
'geometry','FENCE_ID', 'dist']]
parking_points = pd.concat([a,b])
#可视化
# 统计每个栅栏停车需求数
parking_points['FENCE_ID'].value_counts()
观日路(望海路至会展路段 )_R_1 156 观日路0_L_1 92 望海路0_R_1 86 观日路0_R_2 74 前埔东路_R_1 54 ... 机场北路_R_2 1 东渡路0_R_12 1 枋湖南路_R_5 1 嘉禾路0_L_34 1 豆仔尾路(禾祥西路至湖滨南路段) _R_5 1 Name: FENCE_ID, Length: 5594, dtype: int64
#停车需求最高的前40个站点
parking_points['FENCE_ID'].value_counts().head(40)
观日路(望海路至会展路段 )_R_1 156 观日路0_L_1 92 望海路0_R_1 86 观日路0_R_2 74 前埔东路_R_1 54 云顶中路0_L_A03002 53 望海路0_R_2 50 云顶北路0_R_45 47 象屿路0_R_1 46 望海路0_L_1 46 岭兜西路 _R_1 43 蔡岭路 _R_2 42 育秀东路_L_A09001 40 吕岭路0_L_A07005 40 虎仔路0_L_1 39 长岸路_L_13 36 莲前东路_R_4 31 领事馆路_L_B10001 31 仙岳路0_L_A14001 31 创新路_R_11 30 吕岭路_R_B10001 30 翔云路0_L_1 29 吕岭路0_L_A07006 29 长岸路_L_8 28 观日路0_R_1 26 展鸿路0_R_A07001 26 蔡岭路 _R_3 26 展鸿路辅路_L_B10004 25 展鸿路_L_B10003 25 高崎南五路0_R_7 25 金边路0_R_A04004 25 天湖路_L_21 24 嘉禾路_L_9 24 云顶北路_L_4 24 展鸿路_L_B10001 23 云顶北路0_L_47 23 高崎南十二路0_R_1 23 护安路_L_1 22 创新路_L_3 22 创新路(马垅路至火炬路)_R_3 21 Name: FENCE_ID, dtype: int64
#创建图
fig = plt.figure(1,(10,10),dpi = 200)
ax = plt.subplot(111)
#选择某一栅栏
fence = '育秀东路_L_A09001'
#绘制栅栏内外的停车需求
parking_points[parking_points['FENCE_ID']==fence].plot(ax = ax)
parking_points[(parking_points['FENCE_ID']==fence)&(parking_points['dist'].isnull())].plot(ax = ax,color = 'r')
#绘制栅栏
data_fence[data_fence['FENCE_ID']==fence].plot(ax = ax,color = 'g',alpha=0.3)
#显示图
plt.show()