html - Using regex in Vb.net to extract phone numbers -
i wrote code extract mobile numbers web link got 3 links in list box , getting source code using code below while i'm trying use regex extract phone number i'm getting same number again , again. full code wrote! , website i'm extracting link is
http://bolee.com/nf/all-results
dim doc new htmlagilitypack.htmldocument() private sub button1_click(sender object, e eventargs) handles button1.click if listbox1.items.count = 0 msgbox("please extract links first") else listbox1.selectedindex = 0 end if end sub private sub button2_click(sender object, e eventargs) handles button2.click scraplinks() end sub private function scraplinks() dim hw new htmlweb() try doc = hw.load(textbox1.text) doc.loadhtml(doc.documentnode.selectsinglenode("//*[@id='ad_list']").innerhtml()) each link htmlnode in doc.documentnode.selectnodes("//a[@href]") dim hrefvalue string = link.getattributevalue("href", string.empty) if hrefvalue.contains("/detail/") listbox1.items.add(hrefvalue) end if next dim items(listbox1.items.count - 1) object listbox1.items.copyto(items, 0) listbox1.items.clear() listbox1.items.addrange(items.asenumerable().distinct().toarray()) lbllinks.text = listbox1.items.count catch ex exception msgbox("error " + ex.message) end try return nothing end function private sub listbox1_selectedindexchanged(sender object, e eventargs) handles listbox1.selectedindexchanged try dim re new regex("(\+92|0092)-?\d{3}-?\d{7}|\d{11}|\d{4}-\d{7}") ' each link string in listbox1.items dim hw new htmlweb() doc = hw.load(listbox1.selecteditem) dim data = doc.documentnode.selectsinglenode("//*[@class='det_ad f_left']").innertext ' each match match in re.matches(data) textbox2.text = data ' next 'next catch ex exception msgbox("error " + ex.message) end try end sub
here s sample of out put i'm getting
03152405552 03152405552 03152405552 03152405552 03152405552 03152405552
try using code instead:
try each link string in listbox1.items listbox1.selectedindex += 1 dim hw new htmlweb() doc = hw.load(listbox1.selecteditem) dim data = doc.documentnode.selectsinglenode("//*[@class='det_ad f_left']").innertext each match match in regex.matches(data, "(\+92|0092)-?\d{3}-?\d{7}|\d{11}|\d{4}-\d{7}") textbox2.text += vbnewline & match.value next next catch ex exception msgbox("error " + ex.message) end try
the idea create new regex on each new input data avoid having cache.
Comments
Post a Comment