Urban public bicycle dispatching optimization method

Fei Lin; Yang Yang; Shihua Wang; Yudi Xu; Hong Ma; Ritai Yu

doi:10.7717/peerj-cs.224

Urban public bicycle dispatching optimization method

Fei Lin¹, Yang Yang ^1,2, Shihua Wang¹, Yudi Xu¹, Hong Ma¹, Ritai Yu¹

1School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China

2Institute of Intelligent Media Technology, Communication University of Zhejiang, Hangzhou, China

DOI: 10.7717/peerj-cs.224

Published: 2019-10-14
Accepted: 2019-09-05
Received: 2019-03-02

Academic Editor: Xiangjie Kong

Subject Areas: Data Mining and Machine Learning, Software Engineering
Keywords: Multi-objective optimization, Public bicycle system, Community discovery algorithm, Regional scheduling workload, Elite strategy

Copyright: © 2019 Lin et al.
Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.

Cite this article: Lin F, Yang Y, Wang S, Xu Y, Ma H, Yu R. 2019. Urban public bicycle dispatching optimization method. PeerJ Computer Science 5:e224 https://doi.org/10.7717/peerj-cs.224

The authors have chosen to make the review history of this article public.

Abstract

Unreasonable public bicycle dispatching area division seriously affects the operational efficiency of the public bicycle system. To solve this problem, this paper innovatively proposes an improved community discovery algorithm based on multi-objective optimization (CDoMO). The data set is preprocessed into a lease/return relationship, thereby it calculated a similarity matrix, and the community discovery algorithm Fast Unfolding is executed on the matrix to obtain a scheduling scheme. For the results obtained by the algorithm, the workload indicators (scheduled distance, number of sites, and number of scheduling bicycles) should be adjusted to maximize the overall benefits, and the entire process is continuously optimized by a multi-objective optimization algorithm NSGA2. The experimental results show that compared with the clustering algorithm and the community discovery algorithm, the method can shorten the estimated scheduling distance by 20%–50%, and can effectively balance the scheduling workload of each area. The method can provide theoretical support for the public bicycle dispatching department, and improve the efficiency of public bicycle dispatching system.

Introduction

With the progress of urbanization, people’s awareness of low carbon life and health is increasing. The public bicycle system can provide a green and healthy way to travel, and gradually become an important part of the public transport system. However, the study of the division of public bicycle dispatching area is still in the primary stage. The division of the public bicycle scheduling area has two purposes: decomposing the scheduling between large-scale sites, and reducing the computational complexity of path planning.

At present, the mainstream regional division method is based on the urban administrative area, and each area is an independent scheduling area. However, the boundaries of residents’ travel are not as clear as the administrative areas. With the development of the city, the links between the areas are more closely related, so the division based on urban administrative areas is lack of scientific basis. Tulabandhula & Bodas (2018) proposed a passenger monitoring system for dispatching vehicles in a public transportation network, it monitors passengers at the station, vehicle scheduling information and processed hardware equipment. Because the size and population density of each administrative area are different, the number of sites in each area varies greatly. Pan, Jun-Yi & Min (2015) designed a heuristic simulated annealing hybrid search algorithm for large-scale VRP distributed problems. Firstly, based on the actual road network of GIS, the mathematical model is established. Secondly, the large-scale VRP path planning problem is studied. The administrative area is large in size and concentrated in population. Forma, Raviv & Tzur (2015) considers the spatial nature of public bicycle rentals, and the original inventory factor of bicycles. Then the paper establishes a regional maximum diameter distance constraint model. Finally, the best classification results are obtained by heuristic algorithms to minimize the overall inventory cost. Therefore, there are more sites, public bicycle turnover is high, and dispatching workload is large; but if there are fewer sites, public bicycle turnover is low, and dispatching workload is small. Above all, lack of a scientific planning method often leads to higher scheduling capital costs. Schuijbroek, Hampshire & Hoeve (2016) applied the maximum algebra algorithm to the division of the public bicycle scheduling area, and the paper established the corresponding partition mathematical model. The goal of zoning is to minimize the maximum completion time based on a reasonable level of service.

Aiming at the problems, this paper proposes an improved community discovery algorithm based on multi-objective optimization. By using this innovative algorithm, the results show that the algorithm brings three major benefits: it can effectively shorten the public bicycle scheduling distance, improve the scheduling efficiency, and effectively balance the workload of regional scheduling.

Related Work

The division of the public bicycle dispatching area involves operational research, and researchers have made significant contribution. Public bicycles and buses, as well as cargo transport vehicles are public transport, and their operations have similarities. Therefore, they can learn dispatching methods from each other. Kloimllner (Miranda-Bront et al., 2017) decomposes the problem of public bicycles into two sub-problems: scheduling area partitioning and scheduling path planning. Then create an integer programming model to achieve as few bicycle rental points as possible.

In addition, other researchers chose to use clustering algorithms. Phanikrishnakishore & Madamsetti (2014) used the rental rules between public bicycle stations, the space of public bicycle stations, and the non-spatial attributes of public bicycle stations, as well as the self-flow characteristics, using association rules to classify sites with strong correlation into the same category. Finally, various types of site space enclosed areas serve as the scheduling area for public bicycles. Zhang, Liang & Wei (2017) proposed a public bicycle scheduling area division scheme based on the improved K-means clustering algorithm. In the data analysis, the algorithm effectively estimates the k central sites at the initial central site. After the K-means clustering algorithm is divided, the edge sites are clustered and adjusted again according to the scheduling requirements. Xu, Qin & Ma (2017) integrates the spatial relationship between the sites, and the lease relationship of the bicycle, establishes the similarity matrix of the site, and proposes the parameters of the regional coupling, quantifies the degree of connection between the regions, and finally uses the clustering algorithm to obtain the corresponding result. Long, Szeto & Huang (2014) established a dynamic regional scheduling model, for large-scale public bicycle scheduling problems, and proposed a multi-stage re-optimized dynamic clustering algorithm, integrates optimal division, task balance between regions and regions. Within the balance of demand, three factors are progressively clustered, and in the process of solving, the abnormal sites are continuously split to gradually improve the clustering results. Dziauddin, Powe & Alvanides (2015) has studied the public bicycle dispatching area, found that there are often abnormal sites in the division, and he proposed a K-Center algorithm, adaptively limits the capacity of the rental site. Hartmann Tolić, Martinović & Crnjac Milić (2018) analyzed the spatial attributes and community structure of public bicycles, and the paper used the community discovery algorithm to analyze the community structure of public bicycles in Washington, London and Boston, and verified the existence of community structure in the public bicycle network.

The main method of scheduling area division is model method (Dubey & Borkar, 2015) and clustering algorithm (Sun, Zhang & Du, 2015). The model method requires abstract research, and there are many constraints and it is not easy to solve. Clustering algorithm is very difficult to determine the number of clusters, and it is difficult to evaluate. Moreover, the scheduling workload has no evaluation criteria, and it does not consider whether the workload is balanced. Therefore, this paper proposes a new method to solve the problem.

Scheduling Area Division Model Design

This part establishes the division model of public bicycle scheduling area, including the description of the model, and the assumptions of some conditions, and some interpretations of the parameters. Finally, this chapter will propose a lease/return point demand forecasting model, the data obtained from this model can help this paper verify whether CDoMO’s estimated total dispatch distance is the shortest.

Problem description

At present, the clustering algorithm is mainly used to solve the problem of scheduling area division. The data set abbreviated to DS is preprocessed using a data preprocessing program. Turn a data set into a lease/return relationship abbreviated to LRR between sites. Then, through the similarity calculation between the sites, the similarity matrix abbreviated to SM is generated (Yanping, Decai & Duoning, 2017). Conversion from DS to SM, as shown in Eq. (3.1), where R_ij represents the similarity between site i and site j, Q_ij represents the number of bicycles rented from the site i and returned to the site j, Q_ji represents the number of bicycles rented from the site j and returned to the site i. (3.1) $DS \overset{c_{1}}{\to} LRR = \begin{matrix} [\begin{matrix} \begin{matrix} Q_{11} & Q_{12} \\ Q_{21} & Q_{22} \end{matrix} & \begin{matrix} \dots & Q_{1 j} \\ ⋱ & Q_{2 j} \end{matrix} \\ \begin{matrix} ⋮ & ⋮ \\ Q_{i 1} & Q_{i 2} \end{matrix} & \begin{matrix} ⋱ & ⋮ \\ \dots & Q_{i j} \end{matrix} \end{matrix}] \\ [\begin{matrix} \begin{matrix} Q_{11} & Q_{12} \\ Q_{12} & Q_{22} \end{matrix} & \begin{matrix} \dots & Q_{1 i} \\ ⋱ & Q_{2 i} \end{matrix} \\ \begin{matrix} ⋮ & ⋮ \\ Q_{j 1} & Q_{j 2} \end{matrix} & \begin{matrix} ⋱ & ⋮ \\ \dots & Q_{j i} \end{matrix} \end{matrix}] \end{matrix} \overset{c_{2}}{\to} SM = [\begin{matrix} \begin{matrix} R_{11} & R_{12} \\ R_{21} & R_{22} \end{matrix} & \begin{matrix} \dots & R_{1 j} \\ ⋱ & R_{2 j} \end{matrix} \\ \begin{matrix} ⋮ & ⋮ \\ R_{i 1} & R_{i 2} \end{matrix} & \begin{matrix} ⋱ & ⋮ \\ \dots & R_{i j} \end{matrix} \end{matrix}]$

The conversion process represented by c₁ and c₂ is as follows: Eqs. (3.2), (3.3), M represents the time range, which is based on the number of days: (3.2) $c_{1} : progressing program$ (3.3) $c_{2} : R_{i j} = \frac{Q_{i j} + Q_{j i}}{M}$ (3.4) $SM = [\begin{matrix} \begin{matrix} R_{11} & R_{12} \\ R_{21} & R_{22} \end{matrix} & \begin{matrix} \dots & R_{1 j} \\ ⋱ & R_{2 j} \end{matrix} \\ \begin{matrix} ⋮ & ⋮ \\ R_{i 1} & R_{i 2} \end{matrix} & \begin{matrix} ⋱ & ⋮ \\ \dots & R_{i j} \end{matrix} \end{matrix}] \overset{CA}{\to} DR = \{R_{1}, R_{2}, \dots, R_{n}\}$

Finally, the clustering algorithm abbreviated to CA is used for dividing, R_n stands for dividing into n independent scheduling areas is shown in Eq. (3.4). If the division result abbreviated to DR conforms to the lease/return law abbreviated to LRL, the user can actively complete a part of the scheduling work to reduce the scheduling workload. However, in the actual scheduling area division, in order to obtain the highest comprehensive benefits, the regional division should not only conform to the law, but also achieve the balance of scheduling workload as much as possible (Shpak et al., 2017). The regional scheduling workload is mainly determined by the distance within the area and the number of stations in the area. Z₁ and Z₂ should be as small as possible if the regional workload is balanced. This balance problem can be transformed into a multi-objective optimization problem. The objective function f is shown in Eq. (3.5): (3.5) $DR = \{R_{1}, R_{2}, \dots, R_{n}\} \overset{MOO}{\to} min f = {[Z_{1}, Z_{2}]}^{T}$ Z₁: Variance of the dispatch distance Z₂: variance of the number of sites

MOO: Multi-objective optimization

Calculation of Z₁ in the following Eq. (3.6), n represents the number of areas, D_i represents the estimated dispatch distance of area i, and $\bar{D}$ represents the average of the estimated dispatch distances: (3.6) $Z_{1} = \frac{1}{n - 1} \sum_{i = 1}^{n} {(D_{i} - \bar{D})}^{2}$

Calculation of Z₂ in the following Eq. (3.7), n represents the number of areas, N_i represents the number of internal sites in area i, and $\bar{N}$ represents the average number of internal sites: (3.7) $Z_{2} = \frac{1}{n - 1} \sum_{i = 1}^{n} {(N_{i} - \bar{N})}^{2}$ $s . t .$ (3.8) $S = [(S_{i} - P) \cup (S_{j} - P)] \cup P$

Equation (3.8) indicates that each site must be divided into an area. S_i and S_j represent the partition set. P represents the parking lot sites collection: (3.9) $[[S_{i} - P] \cap [S_{j} - P]] = \emptyset (i \neq j)$

Equation (3.9) indicates that a site can only be divided into an area: (3.10) $S_{i} \cap P \neq \emptyset, S_{j} \cap P \neq \emptyset$

Equation (3.10) indicates that each scheduling area contains at least one dispatch center. There are two optimization goals for this issue:

Minimize the variance between the estimated dispatch distance between each area;
Minimize the variance between the numbers of sites in each area.

Model assumptions and parameter description

The scheduling area dividing process is complicated, and the abstract model involves many parameters. In order to make the model as close as possible to the actual division, before the model is established, some assumptions about the scheduling area dividing process are assumed:

The scheduling distance of each area can be estimated theoretically, the estimated scheduling distance is approximately equal to the actual scheduling distance;
Dispatching vehicles are not limited by driving time and mileage;
Only one dispatching vehicle in each area is responsible for bicycle dispatch;
Model of the dispatching vehicle is consistent with all parameters;
The dispatching vehicle departs from the dispatching center, completes the dispatching task, and then returns to the original dispatching center, regardless of vehicle failure, and other unexpected factors.

Based on the problem description and model assumptions, the parameters and variables of the model in Table 1 are defined.

Table 1:

Parameters and variables of the model.

Based on the problem description and model assumptions, the parameters and variables of the model are defined.

Parameters/variables	Parameter/variable meaning
n	The number of areas
i, j	Area number
D_i	The estimated scheduling distance of the area i
D_j	The estimated scheduling distance of the area j
$\bar{D}$	Regional estimated dispatch distance average
N_i	The number of sites in area i
N_j	The number of sites in area j
$\bar{N}$	Average number of sites within the area
S	Collection of sites
S_i	Site division set for area i
S_j	Site division set for area j
P	parking lot site collection

DOI: 10.7717/peerjcs.224/table-1

Table 2:

Exact values.

In the measurement of the similarity of weather, the weather is split into five levels and assigned corresponding values. The exact values are shown in the table.

Weather	Value
Heavy snow, heavy rain	1
Snow, light snow, moderate rain, light rain	0.75
Foggy	0.5
Sunny and cloudy	0.25

DOI: 10.7717/peerjcs.224/table-2

Leasing demand forecasting model

After the scheduling area is divided, in order to calculate the estimated total distance of the scheduling, it is necessary to ensure that the demand for the lease/return site is known, so it is necessary to predict the scheduling demand for the lease/return site in the future. This section will be divided into 24-time periods in hours per day named t, $t \in \{0, 1, \dots, 23\}$ . A Meteorology Similarity Weighted K-Nearest-Neighbour (MSWK) method is introduced to predict the number of least and returned bicycle at the site.

Leasing number forecast model

MSWK is an improved method for predicting lease/return bicycle quantity based on KNN algorithm. Analysed the amount of leasing in a similar time period to predict future leasing. Weather, temperature, humidity, winds speed, and visibility are measured in five indicators.

In the measurement of the similarity of weather, the weather is split into five levels and assigned corresponding values. The exact values are shown in Table 2.

The quantified weather conditions at p and q for two days t is denoted by $W_{D_{p}^{t}}$ and $W_{D_{q}^{t}}$ , respectively, and the weather similarities for t in p and q are defined as follows (Eq. (3.11)): (3.11) $λ_{1} = \frac{1}{2 π σ_{1}} e^{- \frac{{(W_{D_{p}^{t}} - W_{D_{q}^{t}})}^{2}}{σ_{1}^{2}}}$

The temperatures of the p and q two days t periods are denoted by $F_{D_{p}^{t}}$ and $F_{D_{q}^{t}}$ . The temperature similarities of the t time periods in p and q are defined as follows (Eq. (3.12)): (3.12) $λ_{2} = \frac{1}{2 π σ_{2}} e^{- \frac{{(F_{D_{p}^{t}} - F_{D_{q}^{t}})}^{2}}{σ_{2}^{2}}}$

The three dimensions of humidity, wind speed, and visibility are represented by a 3-D Gaussian kernel function, and $H_{D_{p}^{t}}, S_{D_{p}^{t}}, V_{D_{p}^{t}}$ represents the humidity, wind speed, and visibility of the t time period in p, respectively. The humidity, wind speed, and visibility similarity of p and q periods in t are defined as follows (Eq. (3.13)): (3.13) $λ_{3} = \frac{1}{2 π σ} e^{- (\frac{{(H_{D_{p}^{t}} - H_{D_{q}^{t}})}^{2}}{σ_{3}^{2}} + \frac{{(S_{D_{p}^{t}} - S_{D_{q}^{t}})}^{2}}{σ_{4}^{2}} + \frac{{(V_{D_{p}^{t}} - V_{D_{q}^{t}})}^{2}}{σ_{5}^{2}})}$

In order to simplify the calculation, the temperature, humidity, wind speed, and visibility are normalized and all σ₁, σ₂, σ₃, σ₄, σ₅ are set to 1, thereby simplifying the calculation; finally, by weighting the above three similarity indexes, p and q can be obtained. The overall similarity indicator at time t as follows (Eq. (3.14)): (3.14) $M (D_{p}^{t}, D_{q}^{t}; a) = δ_{w} (D_{p}^{t}, D_{q}^{t}) \sum_{i = 1}^{3} a_{i} λ_{i}$

Where $δ_{w} (D_{p}^{t}, D_{q}^{t})$ is a judgment function, when both p and q are working days or all non-working days, $δ_{w} (D_{p}^{t}, D_{q}^{t}) = 1$ , otherwise $δ_{w} (D_{p}^{t}, D_{q}^{t}) = 0$ . If you want to predict the amount of rent in the t time period in q, select the most similar K days and use the MSWK algorithm to calculate the predicted value. The specific Eq. (3.15) is as follows: (3.15) $s_{i} \cdot p d (D_{q}^{t}; a) = \frac{\sum_{p = 1}^{K} M (D_{p}^{t}, D_{q}^{t}; a) s_{i} \cdot p d (D_{p}^{t})}{\sum_{p = 1}^{K} M (D_{p}^{t}, D_{q}^{t}; a)}$

Returning number forecast model

After a user rents a bicycle, they often return the bicycle to an adjacent site within a certain period of time. Therefore, there is a need for prediction data of the number of bicycles based on neighbouring sites, which is used to predict the number of bicycles returned to the site. Bicycles rented from site i during time t may be returned to site j adjacent to i during time t or t+1. For the forecast of the number of return bicycles within the lease time t period, it is necessary to first estimate the number of bicycles rented from the site i and at the site j within the time period t. The specific Eq. (3.16) is as follows: (3.16) $e_{i j}^{t} = s_{i} \cdot p d (t) \frac{e_{i j} \cdot f}{s_{i} \cdot p d}$

Among them, $s_{i} \cdot p d (t)$ is the predicted value of bicycle rental quantity from site i in time period t, e_ij⋅f is historical record of bicycle rental from site i and is still at site j. s_i⋅pd is historical total bicycle rental record from site i. Through the analysis of historical data, it is found that the user’s riding time law can be fitted by the 2-Gaussian function. Therefore, the riding time $D_{i j} (t)$ between rental sites i and j can be estimated by Eq. (3.17): (3.17) $D_{i j} (t) = g_{1} (t; μ_{1}, σ_{1}) + g_{2} (t; μ_{2}, σ_{2}) .$

Assume that the user’s return time is evenly distributed, and the user’s behaviour of returning the leased bicycle is completed within the t time period or t+1-time period. During the time periods t and t+1, the user t₁ rents a bicycle from the site i at the moment, and the probability of returning the ticket at the site j at t₂ is as follows (Eqs. (3.18) and (3.19)): (3.18) $P_{i j}^{t} = \frac{1}{| t |} \int_{0}^{| t |} \int_{0}^{|t| - t_{1}^{^{'}}} d t_{1}^{^{'}} d t_{2} D_{i j} (t_{2})$ (3.19) $P_{i j}^{t + 1} = \frac{1}{| t |} \int_{0}^{| t |} \int_{|t| - t_{1}^{^{'}}}^{+ \infty} d t_{1}^{^{'}} d t_{2} D_{i j} (t_{2}) .$

Finally, considering the traffic patterns and the corresponding probabilities of the adjacent sites, the formula for predicting the number of return bicycles within the sites is obtained as follows: (3.20) $s_{i} \cdot d d (t) = \sum_{j \neq i} e_{i j}^{t} P_{i j}^{t} + e_{i j}^{t - 1} P_{i j}^{t + 1} .$

So far, the demand ΔN of the site i at the time t in the future will be calculated by combining the demand for rental and return of the rental site i at the time t in the future. The specific formula is as follows: (3.21) $Δ N = s_{i} \cdot d d (t) - s_{i} \cdot p d (t) .$

If ΔN is less than zero, it means that the site i will not be able to meet the user’s bicycle rental demand at the time t in the future, and it is necessary to dispatch the bicycle through dispatch (Feng, Zhu & Liu, 2018). If ΔN is greater than the number of parking spots at the leased site, it means that the site i at the time t in the future cannot satisfy the user’s demand for returning the car. It is necessary to reduce the number of bicycles by scheduling.

Community Discovery Algorithm Based on Multi-Objective Optimization

Community discovery algorithm based on multi-objective optimization, which combines quantitative indicators of regional scheduling workloads, community discovery algorithms (Shivach, Nautiyal & Ram, 2018), and multi-objective optimization algorithms (Mori & Saito, 2016). Firstly, the Fast Unfolding community discovery algorithm (Sun et al., 2018) is performed based on the similarity matrix of the site. Secondly, the workload adjusts the results of the community discovery algorithm. Throughout the process, the results are continuously optimized through a multi-objective optimization algorithm.

CDoMO scheduling workload analysis

Scheduling workload is an indicator to measure the workload of a dispatch line. The scheduling itself involves many fields, so there is no uniform standard (Kim, Jeong & Lee, 2017). The generalized scheduling workload is mainly determined by the scheduling distance, the delivery volume and the number of service outlets. The three parameters are weighted and integrated, and the workload of the dispatching line can be quantified. Suppose W is the generalized scheduling workload, D is the driving distance (km), N is the number of outlets (pieces), S is the delivery amount (pieces), and ρ₁, ρ₂, ρ₃ is the driving distance weight, the delivery amount weight, and the service outlet quantity weight as follows Eq. (4.1): (4.1) $W = ρ_{1} \cdot D + ρ_{2} \cdot N + ρ_{3} \cdot S$

This paper combines generalized scheduling workload with public bicycles, and then obtains a quantitative formula for regional scheduling workload, W_i is the scheduling workload of area i, and D_i is the scheduling distance of area i, which is calculated by the maximum generation star algorithm. N_i is the number of stations in area i, S_i is the number of stations in area i, and ρ₁, ρ₂, ρ₃ is the corresponding weight coefficient as follows Eq. (4.2): (4.2) $W_{i} = ρ_{1} \cdot D_{i} + ρ_{2} \cdot N_{i} + ρ_{3} \cdot S_{i}$

Since the regional scheduling is based on all stations in the entire area, and in the scheduling area division stage, the waiting scheduling sites and scheduling quantities of each area are unknown, so in this paper, the impact of S_i on the scheduling workload is ignored, that is, let ρ₃ = 0. So, the Eq. (4.3) can be simplified to: (4.3) $W_{i} = ρ_{1} \cdot D_{i} + ρ_{2} \cdot N_{i} .$

In the quantitative formula of scheduling workload, the weight coefficient cannot be determined manually, but when the scheduling workload balance is satisfied, the estimated scheduling distance variance in each area, and the variance of the number of stations in each area should be as small as possible, so the scheduling balance the problem can be turned into a multi-objective optimization problem. The objective function is $min f = {[Z_{1}, Z_{2}]}^{T}$ . NSGA2 is the most popular multi-objective genetic algorithm. NSGA2 first genetically manipulates the population P to obtain the population Q; then the populations are combined and then combined with non-inferior sorting and crowding distance sorting, and then a new population is established. Repeat the above process, until the termination condition is met. The detailed process is as follows:

Randomly generate the initial population P ₀, then sort the populations non-inferiorly, and assign a non-dominant value to each individual; then perform the operations of selection, crossover, and mutation on the initial population P ₀ to obtain a new population Q ₀, set to i = 0.
Combine the populations of the father and offspring, then form a new population R_i = P_i∪Q_i, and then sort the population R_i non-inferiorly to obtain the non-inferior layer F1, F2, ⋯.
Perform replication, crossover, and mutation operators on population P_i+1 to form population Q_i+1.
If the termination condition holds, then it ends; otherwise, i = i+1, go to step (2).

The main process diagram of NSGA2 is shown in Fig. 1:

Figure 1: The main process of NSGA2 algorithm.

Download full-size image

DOI: 10.7717/peerjcs.224/fig-1

This paper uses the NSGA2 multi-objective optimization algorithm to resolve the scheduling area partition model (Wu, 2014). The length of the chromosome in NSGA2 is 2 (Basch et al., 2015), which corresponds to the value of the weight parameter in the area scheduling workload. Each individual corresponds to a scheduling workload formula, based on schedule the workload adjustment community found the results of the division. Figure 2 shows the restricted flow of NSGA2 algorithm.

Figure 2: Specific flow of NSGA2 algorithm.

Download full-size image

DOI: 10.7717/peerjcs.224/fig-2

CDoMO algorithm design

Community discovery algorithm built on multi-objective integrates quantitative indicators of regional scheduling workloads, community discovery algorithms and multi-objective optimization algorithms. Firstly, the Fast Unfolding community discovery algorithm is implemented based on the similarity matrix of the least sites; secondly, the workload index is used to adjust the results of the community discovery algorithm. The entire process continuously optimizes the results from a multi-objective optimization algorithm.

Table 3 displays the detailed algorithm calculation.

Experiment and Analysis

The rest of the paper is part of the experiment and analysis. The experimental section was divided into two groups, which were experiments using New York public bicycle data and Chicago public bicycle data. In the analysis section, the two groups of experiments use K-means clustering algorithm, and Fast Unfolding community discovery algorithm as comparisons, it compares the three aspects of the number of rental sites, the variance of the number of scheduled bicycles, and the estimated total distance of scheduling. The comparative data show that the algorithm is effective against both sets of experiments.

New York public bicycle

Data set introduction

Citi Bikes (Jiang et al., 2018) is a people-benefit project launched by the New York City Government. Figure 3 displays the spatial distribution of rental sites. Blue represents Manhattan, with 250 rental sites; green represents Brooklyn, with 77 sites. Each Citi Bicycle rental site has GPS location information, so it is not difficult to locate the rental site. The system records the user’s data onto each cycle. The package contains the location and time data onto the start and the end of the site, the entire riding process, the bicycle ID, and the user’s gender and birth date. This experiment will use the May 2016 rent-return dataset of New York public bicycles to conduct an experiment, a total of 96, and 1986 rent-return data. The dataset contains 16 fields, and the nine fields related to this experiment are shown in Table 4.

Table 3:

Detailed algorithm calculation.

Algorithm:	Community discovery algorithm based on multi-objective optimization
Input:	Site similarity matrix X, population number popsize, maximum number of iterations MaxGen.
Output:	Optimal regional division results $ρ_{1}^{},$ and workload index parameters $ρ_{2}^{}$ .
1.	Initialize the historical optimal solution f^∗ and its workload index parameter $ρ_{1}^{}, ρ_{2}^{}$ .
2.	Perform a pass phrase of the Fast Unfolding community discovery algorithm, and obtain the results of the preliminary zoning division as R.
3.	Calculate the estimated distance D_i of each area in R, number of regional sites N_i.
4.	Individual genes in the population as weight coefficients ρ₁, ρ₂. Finally, the scheduling workload of each area is calculated by the formula W_i = ρ₁⋅D_i + ρ₂⋅N_i. The variance of the regional workload is denoted as V.
5.	For each rental site i, try to put i into other communities and calculate the incremental ΔV of the adjustment workload, the entire process records the maximum ΔV_max and the corresponding community k. If ΔV_max< 0, site node i does not adjust; if ΔV_max >0, node i is adjusted to community k. Traverse all the site until all the site are adjusted and the result is recorded as R^∗.
6.	Define the variance function f₁ of the regional site, and define the regional dispatch distance variance function f₂, they are two objective functions to perform fast non-dominated sorting on the results, the records of the optimal solution in the contemporary population as f′, and its corresponding scheduling workload parameters are denoted as $ρ_{1}^{^{'}}, ρ_{2}^{^{'}} .$ If f′ > f^∗ after comparison, letting $ρ_{1}^{} = ρ_{1}^{^{'}}, ρ_{2}^{} = ρ_{2}^{^{'}}$ .
7.	Determine whether the number of program iterations exceeds the maximum number of iterations MaxGen. If it exceeds, the optimal regional division results, and workload index parameters $ρ_{1}^{}, ρ_{2}^{}$ are output; otherwise, a new population is generated through elite strategy selection, which can ensure that certain elite individuals will not be discarded during the evolution process, thereby improving the accuracy of the optimization results, and expanding the sampling space. And gene crossover and mutation processes and the execution continue from 1.

DOI: 10.7717/peerjcs.224/table-3

Figure 3: Spatial distribution map of public bicycles in New York.

Download full-size image

DOI: 10.7717/peerjcs.224/fig-3

This paper uses the pre-processing program to process the leased data, it turned into the lease-return relationship between the least sites (Guo et al., 2017). It also generates a similarity matrix based on the rent-return relationship. The similarity calculation formula for the least sites is as follows (Eq. (4.4)): (4.4) $R_{i j} = \frac{Q_{i j} + Q_{j i}}{M}$

Among them, R_ij represents the similarity between site i and site j; Q_ij represents the number of times to rent a bicycle from site i and site j to return the bicycle; Q_ji represents the number of times of renting a bicycle from site i and returning it at site j; M represents the time range in days. In this experiment, the data set was a total of 31 days in May 2016, so M = 31.The corresponding abstract network can be generated through the lease-return relationship (Fig. 4). Due to the dense population, dense sites, and prosperous business, the sites in Manhattan are more closely linked, and Brooklyn is a river is separated from Manhattan, so the connection between the two regional sites is sparse except for the leases along the river.

Table 4:

Dataset contains 16 fields, and the nine fields.

No.	Fields	Meaning
1	start time	Starting time
2	stop time	End Time
3	start_station_id	Bicycle rental site ID
4	start_station_name	Name of bicycle rental site
4	start_station_longitude	Longitude of rental bicycle rental site
5	start_station_latitude	Latitude of rental bicycle rental
6	end_station_id	Return bicycle rental ID
7	end_station_name	The name of the bicycle rental site
8	end_station_longitude	The longitude of the bicycle rental site
9	end_station_latitude	The longitude of the bicycle rental site

DOI: 10.7717/peerjcs.224/table-4

Figure 4: Abstract network of public bicycles in New York City.

Download full-size image

DOI: 10.7717/peerjcs.224/fig-4

Figure 5: Schematic diagram of community discovery algorithm.

Download full-size image

DOI: 10.7717/peerjcs.224/fig-5

Figure 6: Schematic diagram of analysis results of the Gephi Visual Network Analysis platform.

Download full-size image

DOI: 10.7717/peerjcs.224/fig-6

Experimental result

In the experiment, we first used the Gephi visualization network analysis platform to analyse the community structure in the data (Hu, An & Wang, 2018). The Gephi platform uses the integrated Fast Unfolding algorithm, it divides the public bicycle abstraction network according to the rules of public bicycle rental. The Fast Unfolding algorithm mainly includes two phases. The first phase is known as Modularity Optimization. The main part is to divide each node into the community, its neighbourhood nodes are located, so that the value of the module degree becomes larger; the second phase is called community. Aggregation is mainly to aggregate the communities divided in the first step into one site, that is, to rebuild the network based on the community structure generated in the previous step. Repeat the above process, until the structure of the network no longer changes (Fig. 5).

After the Fast Unfolding algorithm for the New York public bicycle rental site in this paper, Fig. 6 shows the internal community structure of the abstract network of New York’s public bicycles, where the dots represent sites, where the sites of different communities are represented by different colours, and the lines represent the relationships between the sites; obviously, six communities have more close contact with leases within the same community, and the links between different societies are relatively sparse. The results of the Fast Unfolding community discovery algorithm are mapped to map on New York (Fig. 7). Manhattan is a densely populated administrative district, and the vast majority of public bicycles in the area ride on the inside, so the Manhattan District is divided into five areas according to the law of rent. Brooklyn is structured in a district. Although the division results are relatively reasonable, there are still many abnormal sites. These abnormal sites are far away from their respective areas; the number of sites of each area is uniform.

Results of fast unfolding community Discovery algorithm in the New York dataset. — Figure 7: Results of *fast unfolding* community Discovery algorithm in the New York dataset.

Download full-size image

DOI: 10.7717/peerjcs.224/fig-7

CDoMO is based on community discovery algorithm, considering the regional scheduling workload factors. The regional scheduling workload is determined by estimated dispatch distance and the number of regional least sites. If the community finds out that there are abnormal sites, it will cause regional forecasting scheduling distance become larger, so that the variance between the scheduling distances will become larger. If there is a major difference in the number of sites between areas, the variance between the numbers of sites will increase. The goal of CDoMO is to optimize the variance of the distance between the regional scheduling, and optimize the variance of the number of sites. In the optimization process, the division results can be adjusted to make it more reasonable. The division process does not take into consideration the deficiencies in the workload balance in each scheduling area. After the community discovery algorithm based on multi-objective optimization solves the division model of the public bicycle scheduling area, the experimental results shown in Fig. 8 are obtained. By comparing the result shows that the sites along the Williamsburg Bridge and the riverside along Manhattan is divided into the same dispatch area, which is more n line with the rules of public bicycle rental and resolving the anomaly (Zhang et al., 2011). The difference between the number of sites and regional sites is too large (Manju & Fred, 2018).

Figure 8: Results of region partition based on multi-objective optimization algorithm in the New York dataset.

Download full-size image

DOI: 10.7717/peerjcs.224/fig-8

In order to maintain the consistency of the experiment, the value of k in the classical clustering algorithm K-means algorithm is set to 6 (Lin & Song, 2017), and then the clustering is based on the same data set; the space area enclosed by the sites in each class as the scheduling area. In order to achieve regional division, the results of the regional division based on the clustering algorithm (Fig. 9). It was found that when the clustering number is k=6, the clustering algorithm achieves a poor regional division. The number of sites in the class represented by the red is very large, while the number of classes represented by purple and beige is very small, and the number of sites to vary greatly from the types. In addition, the boundaries of each scheduling area are unclear and are overlapped (Zhen et al., 2016).

Figure 9: Results of region partition based on clustering algorithm in the New York dataset.

Download full-size image

DOI: 10.7717/peerjcs.224/fig-9

Figure 10: Difference in the number of bicycle rental points under three algorithms in the New York dataset.

Download full-size image

DOI: 10.7717/peerjcs.224/fig-10

Figure 11: Distance difference of bicycle scheduling area under three algorithms in the New York dataset.

Download full-size image

DOI: 10.7717/peerjcs.224/fig-11

Algorithm performance comparison results

Built on the overall experimental results of the above three methods, it is found that the multi-objective optimization-based community discovery algorithm proposed to this paper can make the division of the areas consistent with the rules and make the regional scheduling workload as balanced as possible. In addition to the analysis of the overall distribution of provincial division space, the paper also compares and analyses the three dimensions of the regional rental site variance, the regional dispatch distance variance, and the estimated total dispatch distance. Figure 10 compares the variance between the numbers of sites. The data show that the variance between the CDoMO compared to the K-means algorithm is reduced by 63.31%, and the variance of the Fast Unfolding algorithm is reduced by 32.32%. Figure 11 compares the variance of the number of bicycles dispatched in the area. The data show that the variance of the CDoMO algorithm compared to the K-means algorithm is reduced by 88.06%, and the variance of the Fast Unfolding algorithm is reduced by 38.14%. Figure 12 compares the estimated total scheduled distances. The data show that the variance of the CDoMO algorithm is 55.17% compared with the K-means algorithm and 27.54% compared to the Fast Unfolding algorithm. When scheduling and partitioning based on multi-objective optimization algorithm, the estimated scheduling distance can be shortened, and the estimated scheduling distance is positively related to the actual scheduling distance, so the actual scheduling distance will also be shortened; in addition, the scheduling work of each area will also be made. Relatively balanced. Figure 13 is a comparative display of experimental results.

Figure 12: Total distance of estimated scheduling under three algorithms in the New York dataset.

Download full-size image

DOI: 10.7717/peerjcs.224/fig-12

Figure 13: Comparison of experimental results of region partition under three algorithms in New York dataset.

Download full-size image

DOI: 10.7717/peerjcs.224/fig-13