(猜測是舊系統的網頁編碼為UTF-8,但送出表單時的charset為BIG5,因無法找到對應字元所造成的)
String name = "王蘐老蘑先蘒生蘓王蘔老蘕先蘖生蘗王蘘老蘐先蘙生蘚"; Pattern regex = Pattern.compile("&#(\\d{5});"); StringBuffer sb = new StringBuffer(); long time1 = System.currentTimeMillis(); //使用HTMLDocument及HTMLEditorKit HTMLDocument doc = new HTMLDocument(); new HTMLEditorKit().read(new StringReader(name), doc, 0); System.out.println(doc.getText(0, doc.getLength())); long time2 = System.currentTimeMillis(); System.out.println(time2 - time1); //60毫秒 //使用正則表示式 Matcher match = regex.matcher(name); while(match.find()){ match.appendReplacement(sb, new String(Character.toChars(Integer.parseInt(match.group(1))))); } match.appendTail(sb); System.out.println(sb); long time3 = System.currentTimeMillis(); System.out.println(time3 - time2);//1毫秒
沒有留言:
張貼留言