Information extraction from unstructured, ungrammatical data such as classified listings is difficult because traditional structural and grammatical extraction methods do not apply. The proposed architecture extracts unstructured and un-grammatical data using wrapper induction and show the result in structured format. The source of data will be collected from various post website. The obtained post data pages are processed by page parsing, cleansing and data extraction to obtain new reference sets. Reference sets are used for mapping the user search query, which improvised the scale of search on unstructured and ungrammatical post data. We validate our approach with experimental results.
ZAMBAD, RINA and GADGE, JAYANT
"WEB SCALE INFORMATION EXTRACTION USING WRAPPER INDUCTION APPROACH,"
International Journal of Electronics and Electical Engineering: Vol. 3
, Article 4.
Available at: https://www.interscience.in/ijeee/vol3/iss1/4