Laparoscope arm automatic positioning for robot-assisted surgery based on reinforcement learning

Compared with the traditional laparoscopic surgery, the preoperative planning of robot-assisted laparoscopic surgery is more complex and essential. Through the analysis of the surgical procedures and surgical environment, the laparoscope arm preoperative planning algorithm based on the artificial pneumoperitoneum model, lesion parametrization model is proposed, which ensures that the laparoscope arm satisfies both the distance principle and the direction principle. The algorithm is divided into two parts, including the optimum incision and the optimum angle of laparoscope entry, which makes the laparoscope provide a reasonable initial visual field. A set of parameters based on the actual situation is given to illustrate the algorithm flow in detail. The preoperative planning algorithm offers significant improvements in planning time and quality for robot-assisted laparoscopic surgery. The improved method which combines the preoperative planning algorithm with deep deterministic policy gradient algorithm is applied to laparoscope arm automatic positioning for the robot-assisted laparoscopic surgery. It takes a fixed-point position and lesion parameters as input, and outputs the optimum incision, the optimum angle and motor movements without kinematics. The proposed algorithm is verified through simulations with a virtual environment built by pyglet. The results validate the correctness, feasibility, and robustness of this approach.


Introduction
With the development of robotic technology and application of minimally invasive surgery (MIS), the laparoscopic MIS robotic system has been widely used in surgical specialties, such as urology (prostate, bladder and kidney cancer), gynecology (hysterectomy and myomectomy).Compared with traditional laparoscopic surgery, robot-assisted laparoscopic surgery displays high-definition, 3-D image of the lesion to the surgeon via the console and allows the surgeon to perform complex operations by manipulating the master controls.Robot-assisted laparoscopic surgery is more precision, flexibility, and controllable than conventional techniques, so it has become the research hotspot in recent years.
Although robot-assisted surgery has many advantages over traditional surgery, there are also some thorny problems, such as control switching between master controls and robotic arms, real-time synchronization of master-slave position and attitude, MIS robotic system preoperative planning.Besides, reasonable preoperative planning can significantly reduce the operation time; otherwise, it may increase surgical risks.
For MIS robotic system preoperative planning, scholars have proposed many different methods, which are divided into three parts: (1) A heuristic method based on surgeon experience.(2) A method based on the virtual surgical environment.(3) A method based on multi-objective optimization algorithm.Hanna et al. (1997a) investigated the impact of port placement on endoscopic manipulations, especially knotting.The optimal azimuth and elevation angles were obtained by comparing the execution time and performance quality score of tying a surgeon's knot (Hanna et al., 1997a).Austad et al. (2001) completed the coronary artery bypass grafting procedures on pigs using the Zeus robot-assisted surgical system.The Zeus system configurations, like port placement and pigs' position, were set based on recommendations from hospitals and surgeon experience (Austad et al., 2001).Fer-L.Yu et al.: Laparoscope arm automatic positioning zli and Fingerhut (2004) proposed recommendations of trocar placement for laparoscopic surgery.The abdominal cavity is divided into six parts according to the operation area, and recommendations are given according to different operations and patient posture characteristics (Ferzli and Fingerhut, 2004).Pick et al. (2014) proposed an anatomic guide of port placement for laparoscopic radical prostatectomy, which was performed on the da Vinci robot-assisted surgical system.Compared to traditional port placement, the pubic bone was used as optimal landmark (Pick et al., 2004).Badani et al. (2008) proposed a novel technique of port placement for robotic renal surgery, which aimed to maximize the range of motion and eliminate external collisions (Badani et al., 2008).Cestari et al. (2010) proposed a new method of port placement for laparoscopic radical prostatectomy, which used a nautical inclinometer and a homemade triangle mold (Cestari et al., 2010).
The heuristic method based on the surgeon experience is convenient and practical for the surgeon, so it is widely used in clinical practice.However, this method is related to the surgeon's operating habits and requires extensive surgical experience.More importantly, the advantages of the surgical robot system are not fully developed.Hayashibe et al. (2005) developed the simulation system for preoperative planning of abdominal surgery.The core of the simulation system was kinematics and haptics; the effectiveness of preoperative planning was validated by the surgeon's evaluation (Hayashibe et al., 2005).Hayashibe et al. (2006) developed a new simulation system with volume rendering of medical images and automatic positioning by kinematics (Hayashibe et al., 2006).Sun et al. (2007) developed a simulator of the da Vinci system, which was mainly used for surgeon training.Its primary functions were the simulation of port placement and the practice of simple surgical operations (Sun et al., 2007).Bauernschmitt et al. (2007) developed a simulator for port placement and enhanced guidance in robot-assisted heart surgery.The simulator was completed off-line, the simulation model is established by using the patient's computed tomography (CT) images to get the best ports position.Through this system, preoperative planning was optimized, the operation time was reduced, and operation quality was improved (Bauernschmitt et al., 2007).Konietschke et al. (2011) developed a simulator of the DLR MiroSurge system, which used the VR-Map device to establish the simulator quickly.Its primary functions were preoperative optimization and intraoperative simulation (Konietschke et al., 2011).
The method based on the virtual surgical environment visualizes the port placement and verifies the effect in advance.Compared with the former method, this method simplifies the steps of port placement and reduces the time required.However, this method also requires surgeons with extensive surgical experience, and due to the lack of analysis of surgical robot performance and finite attempts, it is difficult to obtain optimized preoperative planning.Sun and Yeung (2007) proposed the selection of optimal port placement and the determination of optimal robot attitude based on multi-objective optimization.This method used two performance indices, the global isotropy index (GII) and the efficiency index (EI).Through the interaction of these two indicators, the flexibility and operability of the robot were improved, and the workspace and visual space were also increased (Sun and Yeung, 2007).Azimian et al. (2010) proposed the preoperative planning method for robot-assisted minimally invasive CABG.This method used sequential quadratic programming to implement the optimization of kinematic and geometric requirements.In the optimization process, individualized preoperative planning can be achieved taking into account the surgeon's experience (Azimian et al., 2010).Ma et al. (2014) proposed the preoperative positioning method, which was mainly aimed at the collision problem of the multi-arm system.It used the maximum distance index to achieve collision-free optimal preoperative positioning (Ma et al., 2014).Yu et al. (2014) proposed the preoperative positioning method, which was mainly aimed at cooperative cooperation between two instrument arms.It used the percentage of collaboration workspace to achieve the optimal cooperation between two manipulators (Yu et al., 2014).Wang et al. (2016) proposed a preoperative planning algorithm for robot-assisted minimally invasive CABG.This algorithm used two performance indices, isotropy index based on CV (IICV) and index of instrument collaboration space (IICS), to implement the optimal port placement selection and the manipulator poses determination (Wang et al., 2016).
Compared with the former two methods, the method based on multi-objective optimization algorithm is more scientific.More importantly, in addition to the surgeon experience, the robot's characteristics are also taken into account, so the preoperative planning is more conducive to the operation.
In general, after obtaining the preoperative planning by the above method, the joint variables of the manipulator are obtained by inverse kinematics.At present, the telecentric fixed-point positioning mechanism of the surgical robot system is mostly an undriven mechanism, which needs to be manually adjusted to the target position.Due to errors of manual adjustment and mechanical kinematics parameters, the actual preoperative planning is not the optimal solution previously determined.Therefore, it is necessary to use a new method to complete preoperative planning instead of manual configuration.
Traditional manipulator control is to calculate joint variables by inverse kinematics of a given target position.At present, its trend has turned to the end-to-end solution.In other words, the controller learns diverse strategies directly from sensors data, rather than relying on fixed strategies such as kinematics (James and Johns, 2016;Otte et al., 2016;Phaniteja et al., 2017;Gu et al., 2017;Mohammadi et al., 2018).James and Johns (2016) proposed a method that took images as its input and outputs motor movements and target position.Thus, the control of the 7-DOF robot arm can be realized in a virtual environment without any prior knowledge (James and Johns, 2016).The telecentric fixed-point positioning mechanism is a redundant mechanism; an accurate kinematic inverse solution can only be obtained under appropriate constraints.In order to improve the effect of preoperative planning, it is necessary to explore a new method to tackle the problems caused by previous methods.
This paper proposes a laparoscope arm preoperative planning algorithm, which is based on the lesion parametrization model and evaluation indexes.Besides, an improved method based on reinforcement learning algorithm is proposed to achieve preoperative laparoscope arm automatic positioning.More importantly, it is a crucial step towards the automation of robot-assisted laparoscopic surgery.
The rest of the paper is organized as follows.Section 2 introduces surgical procedures and MIS robotic system.The laparoscope arm preoperative planning algorithm is introduced in Sect.3. The improved DDPG algorithm is introduced in Sect. 4. The simulation results are presented in Sect. 5. Discussion and conclusion are given in Sects.6 and 7, respectively.

The MIS procedures
The common MIS has three steps: (1) According to the actual surgical needs, a surgeon makes several small incisions (usually 5-15 mm) and inserts a thin tube called trocar.The trocar is deployed as a means of introduction for laparoscope or laparoscopic instruments, like scissors and graspers, to provide an access port during surgery.(2) Creation of a pneumoperitoneum by inflating the abdomen with carbon dioxide to make a separation between organs and increase the operating space of surgical instruments.(3) The surgeon views the magnified image of the patient's internal organs provided by laparoscope on a video monitor.Using different instruments, the surgeon performs a series of surgical operations in the pneumoperitoneum.
This paper takes laparoscopic cholecystectomy (LC) as an example.The surgeon makes three incisions and inserts trocar.In LC, it is always with the patient in a supine position.Three incisions are arranged in an isosceles triangle for better operating space, as shown in Fig. 1.A laparoscope is placed through a trocar, and specialized instruments are placed through other trocars.By operating the laparoscope and instruments, the surgeon delicately separates the gallbladder from its attachments to the liver and the bile duct and then removes it through one incision.

Layout design of MIS robotic system
The MIS robotic system includes a master-slave manipulator system and a depth camera.The slave manipulator consists of one laparoscope arm and two instrument arms.Laparoscope arm is equipped with a laparoscope, and instrument arms are equipped with different laparoscopic instruments.Laparoscope arm and instrument arms are located on both sides of the operating bed.A depth camera is installed above the operating bed for acquiring the position of the incisions and robotic arms, as shown in Fig. 2.
The three arms have the same mechanical structure.Each arm is divided into three parts, the telecentric fixed-point positioning mechanism, the remote center of motion mechanism and the end effector, as shown in Fig. 3.The first part adjusts the spatial position of telecentric fixed-point by three revolving joints and one linear joint.The second part adjusts the position and posture of the end effector by the master manipulator operated by a surgeon; at its end, there is a versatile quick-change mechanism for end effectors installation.

Laparoscope arm preoperative planning
One of the critical issues for MIS is preoperative planning, including preparation for interventions and decision about the optimum surgical incisions.Currently, the surgeon often uses trial-and-error method or experience-based to complete preoperative planning, which may not meet the requirements of the optimum incisions.Therefore, it is necessary to use preoperative planning algorithm instead of the previous method.The preoperative planning includes laparoscope arm and instrument arms preoperative planning.This paper studies the former, including the optimum incision and the optimum angle of laparoscope entry.

The mathematical model of artificial pneumoperitoneum
The mathematical model of pneumoperitoneum is established before preoperative planning.The shape of artificial pneumoperitoneum is approximately ellipsoid (Mulier et al.,  2008; Oda et al., 2012), so the abdominal wall is simplified to ellipsoid, defined as Eq. ( 1).The artificial pneumoperitoneal coordinate frame is established by combining the patient's CT images and anatomy.According to anatomy, there are three principal planes, namely the sagittal plane, the coronal plane, and the transverse plane.In the coordinate frame, there are also three reference planes, namely A plane, B plane, and C plane.A plane coincides with the sagittal plane; B plane coincides with the coronal plane; C plane is parallel to the transverse plane, and the pneumoperitoneum is divided equally by C plane.The origin of the coordinate frame is at the intersection of three reference planes.x p -axis is defined along the mediolateral direction; y p -axis is defined along the superior-inferior direction; z p -axis is defined along the anteroposterior direction.In the coordinate frame, the mathematical model of artificial pneumoperitoneum is established, as shown in Fig. 4.
During actual operation, the model parameters (a p , b p , c p ) are determined by the medical image and gas insufflation volume.Suppose an adult's chest width is 3.15 dm, chest thickness is 2.45 dm, and chest length is 2.9 dm.The corresponding parameters in Fig. 6 are a p = 1.55, b p = 1.45, h = 1.2 dm.Chen suggested that the gas insufflation volume is about 3L (Chen, 1999).According to Eq. ( 2), calculate c p = 2.27 dm.

The lesion parametrization model
The surgeon should be clear about the information of the surgical site, including lesion location, lesion anatomy, and surrounding tissues.At present, the conventional method is imaging (radiology) test, and the lesion model and its sur-  3), and no barrier (Hanna et al., 1997b).( 2) Observation direction principle: axis-to-target view angle, the smaller β is, the better operative field is.When β = 0, the operative field is optimum; in other words, the laparoscope visual axis is perpendicular to the plane τ (Hanna and Cuschieri, 1999).

The preoperative planning algorithm framework
Through the study of the mathematical model of artificial pneumoperitoneum, lesion parametrization model and preoperative planning principles, the laparoscope arm preoperative planning algorithm is proposed, as shown in Fig. 6, that includes three stages: data processing and modeling, optimum incision determination and optimum angle determination.
In the first stage, obtain patient information from the medical images, and then establish the mathematical model of artificial pneumoperitoneum, and lastly determine the location and lesion parametrization model.This stage is the basis of the entire algorithm, and also the most time-consuming stage.
In the second stage, all allowable surgical incisions are obtained from the first stage, and then the candidate incisions are determined according to the two principles.The candidate base positions are obtained by the candidate incisions.According to the actual situation of the operating room, select one of the positions as the base position.Combine candidate incisions and the base position to determine the optimum incision.
In the third stage, the candidate entry angles are determined by combining the optimum incision, lesion location, and initial entry angle.Determine the optimum angle according to the observation direction principle.Since there may be no direction in which the visual axis is perpendicular to the plane τ , the minimum β is chosen as the optimum angle.
Finally, the laparoscope arm preoperative planning algorithm is completed, including the optimum incision and the optimum angle.

The candidate incisions
The telecentric fixed-point positioning mechanism has four degrees of freedom; the mechanism diagram is shown in Fig. 7. o 4 is the telecentric fixed-point, o 5 is the end of a laparoscope, and α is determined by the remote center of motion mechanism.The prismatic joint is used to adjust the vertical position of o 4 , and the three revolute joints are used to   First, candidate incisions are determined based on the distance principle.Assume that the positions of the candidate incision and the lesion are P i and P l , respectively.If d(P i P l ) = |P i − P l | ≤ d 7max , P i satisfies the distance principle.Second, based on the direction principle, the candidate incisions are located on the generatrix of a right circular cone with specific apex at P l and aperture π − 2α.So, candidate incisions are incisions that satisfy the two principles.The following is a mathematical derivation of candidate incisions.
The P l (x l , y l , z l ) is obtained by imaging test; the candidate incisions are located on the intersection (red, Eq. 3) of the abdominal wall (navy blue) and right circular cone with specific apex at P l (light blue), as shown in Fig. 8.The intersecting line is not a plane curve; it is projected to the plane x p o p y p for the convenience of research.The projection curve (Eq.4) is an ellipse whose expression can be obtained by fitting four points on it.Go through P l and make two planes parallel to y p o p z p and x p o p z p , point m 1 , m 2 , m 3 and m 4 are obtained, go through P z5 l (x l , y l , z 5 ) and make one plane parallel to x p o p y p , point m 5 and m 6 are obtained, as shown in Fig. 9 and Eqs. ( 5)-( 7).The equation's coefficients can be obtained from any four points in the above six points, and the remaining two points are used to verify the correctness of them.

The candidate base positions
Besides, the base position also affects the surgical incisions.Removing the prismatic joint, the telecentric fixed-point positioning mechanism is a 3-RRR planar redundant mechanism.When o 4 remains unchanged, it is simplified as a planar four-bar mechanism, as shown in Fig. 10.In this case, the link length relationship determines whether the laparoscope trajectory is a whole cone, which makes it possible to provide the optimum operative field.In other words, o 3 o 4 should be rotated around o 4 while o 4 is unchanged, that is, the o 3 o 4 is a crank.Based on the conditions of crank existence, the link length relationship is determined.Assume that the length of link o 1 o 2 , o 2 o 3 , o 3 o 4 and o 4 o 1 are a 2 , a 3 , a 4 and l, l determines if there is a crank, which is discussed under three cases, as shown in Eqs. ( 8)-( 10).In summary, the distance from base to fixed-point should be less than a 2 + a 3 − a 4 to ensure that laparoscope has a complete operative field.
1.The link o 1 o 4 is the longest link: 2. The link o 1 o 4 is the shortest link: 0 < l a 4 l + a 2 a 3 + a 4 ⇒ 0 < l min(a 4 , a 3 + a 4 − a 2 ) (9) 3. The link o 1 o 4 is neither the longest nor the shortest link: According to Sect.3.4, the allowable base range is an ellipse.Compared with the projection of the candidate incisions on the plane x p o p y p , the center coordinate is unchanged and the semi-major axis and semi-minor axis increase l max .The intersection of allowable base range and non-interference area in the operating room is the candidate base positions, as shown in Eq. ( 11).
3.6 The optimum incision and the optimum angle P b (x b , y b , z b ) is chosen as the base position, so the optimum incisions are within allowable incisions circle with the P b as the origin and l max as the radius.The optimum incisions (red) are located on the intersection of candidate incisions (green) and allowable incisions circle (black), as shown in Fig. 11 and Eq. ( 12).The optimum angle of laparoscope entry is β = 0, that is, laparoscope visual line coincides with the line relating incision to the lesion.To sum up, combined with Sects.3.5 and 3.6, the laparoscope arm preoperative planning algorithm is completed.

Step 1 Determine candidate incisions
Take the data in Sect.3.1, 0 ≤ d 7 ≤ 320 mm, d(  (sqrt = square root), the result shows that any point on the abdominal wall can be used as a candidate incision.The candidate incisions are located on the curve, as shown in Eq. ( 13).According to Sect.3.4, the projection of the curve on the plane x p o p y p is shown in Eq. ( 14).

Step 3 Determine the optimum incision and the optimum angle
First, determine x i based on the surgical needs, body condition and surgeon's operating habits, calculate y i based on Eq. ( 16), calculate z i based on the mathematical model of pneumoperitoneum, the optimum incision is (x i , y i , z i ).Second, the optimum visual axis direction is the line connecting the optimum incision to the lesion.The optimum incision and optimum angle are shown in Fig. 12.

Problem description
Reinforcement learning describes the set of learning problems where an agent should learn how to map states to actions in an environment to maximize the defined reward function.Throughout the learning process, an agent is not told which actions to take but instead should find out which action yield the most reward by trying various actions.In most cases, actions may affect not only the immediate reward but also the next state, and through that all subsequent rewards.In solving practical problems, it should define a reasonable reward function to compute the reward for taking actions and have a goal relating to the state of the environment.Also, it should quantify all the variables the environment describes and have access to these variables at each step or state.
In this paper, the agent is the 3-RRR planar redundant mechanism which is a simplified model of telecentric fixedpoint positioning mechanism plus laparoscope.The environment is the lesion and the surgical incision obtained through the preoperative planning algorithm.The actions are the movement of three revolute joints.The agent-environment interaction is shown in Fig. 13.

Deep deterministic policy gradient (DDPG)
In this paper, laparoscope arm automatic positioning is achieved by DDPG, which is a model-free, off-policy actorcritic algorithm based on the deterministic policy gradient (DPG) (Silver et al., 2014).Deep neural network (DNN) function approximators were used to estimate the actionvalue function.Thus, the algorithm can learn policies in highdimensional, continuous action spaces.
Based on DPG, DDPG combines the ideas underlying the success of Deep Q Network (DQN) (Mnih et al., 2013(Mnih et al., , 2015)).It can learn value functions stably and robustly due to two aspects.First, the network is trained off-policy with samples from a replay buffer to minimize correlations between samples.Second, the network is trained with a target Q network to give consistent targets during temporal difference backups.Meanwhile, batch normalization is used to ac-celerate deep network training and improve the accuracy of the model (Ioffe and Szegedy, 2015).
DDPG contains a parameterized actor function µ(s|θ µ ) and critic network Q(s, a|θ Q ) with weights θ µ and θ Q .The critic network is learned using the Bellman equation (Eqs.17-18) to make the L(θ Q ) smaller and smaller.In other words, Q(s, a|θ Q ) gets closer to the actual value. where The actor function is updated by the chain rule (Eq.19) to the expected return from the distribution J with respect to the actor parameters.
Every n steps DDPG updates the target networks of actor and critic using "soft" target updates (Eq.20), rather than directly copying the weights.

Reward function construction
In the training process, telecentric fixed-point (marked point) position and lesion location are taken as the input of the DDPG algorithm.The fixed-point is obtained by a depth camera, the optimum incision, the optimum angle and the base position are obtained by the preoperative planning algorithm.The DDPG algorithm that combines the algorithm can learn policies directly from the inputs, to achieve laparoscope arm automatic positioning for the robot-assisted laparoscopic surgery.The reward function is essential for the algorithm to learn policies successfully.It consists of intermediate reward and final reward, where the former is given a continuous, guided negative reward when the task is not completed, and the latter is given a positive reward that is one to two orders of magnitude larger than the former when the task is completed.The continuous reward function can make convergence of the algorithm better.
In the o p − x p y p z p coordinate frame, the fixed-point position is P f (x f , y f , z f ), the incision position is P i (x i , y i , z i ), the laparoscope end position is P e (x e , y e , z e ), and the lesion location is P l (x l , y l , z l ).The goal of the task is |P f P i | + |P e P l | =0 (lsinα (definition in Fig. 7) is equal to |P i P l | for programming convenience.).The intermediate reward is −(|P f P i |+|P e P l |) and is normalized to [−1, 0] interval.The final reward is 10.

States description
To improve the convergence of the algorithm, the state variables also play a crucial role in addition to the reward function.If state variables can adequately present the environment, the algorithm can learn policies quickly.Because the image from the depth camera contains all the state information of the environment, it is reasonable to use the image directly as input.However, due to the limitations of the hardware, the processing image data is very slow.To speed up training of the algorithm, it uses a low-dimensional states description, such as joint variables and positions, instead of high-dimensional renderings of the environment.
The algorithm is to make the laparoscope arm move to the target position, so the joint variables are used as the state variables.However, from the training results, these variables cannot adequately describe the environment; in other words, the algorithm cannot achieve the laparoscope arm automatic movement.So, the distance from telecentric fixed-point to incision, the distance from laparoscope end to the lesion, and whether the target is reached are added to the state variables.The experimental results of these two state variables are described in Sect.5.2.

Simulation details
The environment is simulated using Pyglet, including a lesion point, a surgical incision and a simplified model of the telecentric fixed-point positioning mechanism.For this environment, a lesion point is randomly specified within a reasonable range, an incision and a base location are obtained by the preoperative planning algorithm.Batch normalization is used on the state input, all layers of the actor network and all layers of the critic network before the action input.In this way, it can learn effectively across tasks with different types of units, without needing to ensure the units are within a set range manually.
TensorFlow is used in the code for high-performance numerical computation.The simulations use Adam (Kingma and Ba, 2015) for learning neural network parameters with a learning rate of 10 −5 for the actor and critic.For Q it includes L 1 weight decay of 0.1, L 2 weight decay of 10 −3 and a discount factor of γ = 0.9.For the soft target updates, it uses τ = 0.01.The neural networks use the rectified non-linearity for all hidden layers (Glorot et al., 2011).The networks have three hidden layers with 900, 900 and 60 units respectively, and the final output layer of the actor is a tanh layer, to bound the actions.The actions are not included until the 3rd hidden layer of Q.The layers weights and biases of both the actor and critic are initialized from a uniform distribution [−x, x], where x = sqrt (6./(in + out)).It trains with minibatch sizes of 16, and it uses a replay buffer size of 6 × 10 4 .The behavior policy during training is ε-greedy with ε annealed linearly from 1 to 0.1 over the first hundred episodes and fixed at 0.1 after that.The simulations train for a total of 2000 episodes; every episode is terminated if the goal is not completed after 600 steps.

Simulation results
Two simulations are set up to evaluate the performance of the improved method applied to laparoscope arm automatic positioning for the robot-assisted laparoscopic surgery.The two simulations make one change to states description during training only, and use the same network architecture, learning algorithm and hyperparameters settings.States descriptor one is three joint variables and states descriptor two is the former plus the distance from fixed-point to incision, the distance from laparoscope end to the lesion, and whether the target is reached.
The two simulations evaluate the policy periodically during training by testing it without exploration noise.The improved method with 3 action dimensions and 20 state dimensions runs ten times in the simulated environment.Performance after training across the environment for at most 2000 episodes.The results of ten training sessions report both total reward per episode and steps to target, as shown in Figs.14-17.The solid line in the figure represents the average over ten sessions, the upper boundary of the shadow part represents the maximum over ten sessions, and the lower boundary represents the minimum value.
Figure 14 shows that the average of total reward per episode is stabilized to negative and only a few episodes total reward are positive.Figure 15 shows that the steps to target are always 600.These two figures show that it never reaches the goal.Figure 16 shows that the average of total reward per episode increases from −300 to about 120.After 400 episodes, the total reward converges to around 120. Figure 17 shows the steps to target stabilizes at about 150.These two figures show that it reaches the goal after 400 episodes.The  results illustrate the states descriptor two is outperformed states descriptor one, the latter does not enable the agent to converge to a good solution, but the former can do it.In other words, the improved method which uses the states descriptor two can learn the right policies on laparoscope arm automatic positioning.

Discussion
The preoperative planning algorithm, based on the artificial pneumoperitoneum model and the lesion parametrization model, appears to offer significant improvements in planning time and quality for robot-assisted laparoscopic surgery over experience-based method or literature-based method.The distance principle and the direction principle ensure that the proposed algorithm can meet the surgeon's surgical requirements.Furthermore, preoperative planning does not require an additional landmark on the abdominal wall or particular patient positioning.The proposed algorithm is designed to simulate the actual clinical procedure of robot-assisted surgery or applied to a virtual surgery training system, and a standardized procedure is proposed for preoperative planning.By taking LC as an example, the results indicate that the port placement and laparoscope entry angle selection have satisfying performance, especially for less experienced surgeons.
Preoperative laparoscope arm automatic positioning is achieved based on the DDPG.In this algorithm, the states descriptor plays a crucial role and affects the performance of the algorithm.From the results, the states descriptor two is outperformed states descriptor one.Although the controller does not learn a reasonable strategy directly from states descriptor one, with the evolution of episodes, the controller still improves compared to the initial.Therefore, it is crucial to select states descriptor reasonably.The controller learns a reasonable strategy from states descriptor two, but there is room to reduce the steps of the target, to improve the learning efficiency of the controller.Furthermore, the laparoscope arm automatic positioning is independent of robot configuration and can be extended to any surgical robot system.
This method successfully learns a controller in simulation, and the next step is to study to learn a controller in real robots without a lot of time training, and the method can be extended to the preoperative planning of other operations or even other surgical procedures.Thus, the implementation of the algorithm for robot-assisted surgery can further realize telesurgery, thereby improving the medical level in many areas.

Conclusions
This paper completes the preoperative planning by analyzing the surgical procedures and surgical environment of robot-assisted laparoscopic surgery.Based on the lesion parametrization model, two principles of laparoscope arm preoperative planning are designed, including the distance principle and the direction principle.According to the two principles, the laparoscope arm preoperative planning algorithm is divided into two parts, the optimum incision and the optimum angle of laparoscope entry.A set of parameters based on the actual situation is given to verify the effectiveness of the algorithm.Preoperative laparoscope arm automatic positioning is achieved by the improved method which combines the preoperative planning algorithm with the DDPG algorithm.The improved method takes the fixedpoint position captured by a depth camera and the lesion location obtained by imaging test as input.Based on the input information, optimum incision and optimum angle are obtained through the algorithm, and then the laparoscope arm can automatically move to the target position.Compared to the traditional method, kinematics is not used to calculate the motor movements, so that it can reduce errors caused by inaccuracy of kinematic parameters and improve the effectiveness of preoperative planning.The simulation results show that the improved method can realize preoperative laparoscope arm automatic positioning and it is also robust.
The automatic positioning algorithm provides a theoretical basis for the laparoscope arm preoperative planning of robotassisted laparoscopic surgery.It avoids the disadvantage of the heuristic method based on surgeon experience, and it also simplifies the preoperative planning process and reduces the operation time.However, the algorithm is implemented in a virtual environment, and there is a certain gap with the actual system.Therefore, how to implement the algorithm in the actual system is the primary direction of subsequent research.

Figure 1 .
Figure 1.The schematic diagram of surgical incisions.

Figure 3 .
Figure 3.The structure of the robotic arm.

Figure 4 .
Figure 4.The coordinate frame and pneumoperitoneum model.
rounding environment are obtained by the 3-D reconstruction technology.Describe the relationship between lesion and incision in parametric form, as shown in Fig. 5. Plane τ represents the target operation plane, a represents the normal vector of the plane τ , d represents the distance from the lesion to the laparoscope, β represents the angle between laparoscope visual axis and a, γ represents the laparoscope deviation angle.So, the two principles of laparoscope arm preoperative planning can be expressed as follows: (1) Observation distance principle: laparoscope-to-target distance d = 75-150 mm, d ≤ the maximum joint variable of d 7 (definition in Fig.

Figure 5 .
Figure 5.The definition of lesion parameters.

Figure 6 .
Figure 6.Flow chart of the laparoscope arm preoperative planning algorithm.

Figure 7 .
Figure 7.The mechanism diagram of the telecentric fixed-point positioning mechanism.

Figure 8 .
Figure 8.The schematic diagram of candidate incisions.

Figure 9 .
Figure 9.The projection of three planes.

Figure 10 .
Figure 10.The schematic diagram of the simplified mechanism.

Figure 11 .
Figure 11.The schematic diagram of optimum incisions.

Figure 12 .
Figure 12.The sketch map of the optimum incision and angle.

Figure 13 .
Figure 13.The agent-environment interaction in reinforcement learning.

Figure 14 .
Figure 14.The total reward per episode with states descriptor one.

Figure 15 .
Figure 15.The steps to target with states descriptor one.

Figure 16 .
Figure 16.The total reward per episode with states descriptor two.

Figure 17 .
Figure 17.The steps to target with states descriptor two.