This paper presents a method for the biped dynamic walking and balance control using reinforcement learning, which learns dynamic walking without a priori knowledge about the dynamic model. The learning architecture developed is aimed to solve complex control problems in robotic actuation control by mapping the action space from a discretized domain to a continuous one. It employs the discrete actions to construct a policy for continuous action. The architecture allows for the scaling of the dimensionality of the state space and cardinality of the action set that represents new knowledge, or new requirements for a desired task. The balance learning method utilizing the motion of robot arm and leg to shift the zero moment point on the soles of a robot can maintain the biped robot in a static stable state. This balanced algorithm is applied to biped walking on a flat surface and a seesaw and is making the biped’s walks more stable. The simulation shows that the proposed method can allow the robot to learn to improve its behavior in terms of walking speed. Finally, the methods are implemented on a physical biped robot to demonstrate the feasibility and effectiveness of the proposed learning scheme.