• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Extract paragraphs based on a criteria

Hi Guys

First of all wonderful job on the informative site. Guys, below is the crawled data and saved as CSV.

What is need is to extract About the author details from the below information and paste it to the cell b1.

So the data looks like

A2 B2 C2
Book description About the author and table of contents.

"Cracking The C, C++ And Java Interview 1st Edition Author: S G GANESH,"""",""Cracking The C, C++ And Java Interview 1st Edition (Paperback) Price: Rs.248 """"The questions and answers...are of high quality. The explanations provided are also very clear and up-to-the-point."""" About Sathyaprakash[ Can be generic ]
Sathyaprakash Dhanabal, Application Developer, ThoughtWorks India, Bangalore
""""...helps in strengthening the conceptual foundation needed to crack the IT interview.""""
V S Murthy Sidagam, Software Engineer, Mascon Global Limited, Bangalore

Table of Contents 1. Cracking the IT Interview: FAQ

Section I: General Programming

2. C Problem Solving 3. Data Structures and Algorithms 4. Object Oriented Programming

Section II: C Programming

5. Multiple Choice Questions 6. Programming Aptitude 7. Theory

Section III: C++ Programming

Thanks in advance

Regards
 
Welcome to the forum!
The ToC is the easier one to extract. Formula is:
=MID(A2,SEARCH("Table of Contents",A2),9E99)

For the author, we need something unique/specific. I'd be afraid to assume that the word "About" only appears before author. Are there any line breaks or other special characters that we could use as a key identifer?
 
Hi

Thanks for getting back to me,it gets real tricky cause not all the pages have TOC first and the about the authors and ur right ABOUT is the right keyword.
First we have determine if TOC comes first or ABOUT author and the extract only UPTO the last line of the either.

For examples u cud go to flip kart and go through the books sections.

Thanks for ur time
 
Hmm. Looking at different books on the site, I see what you mean, there is no straight format/rules for what words they put in. Some don't even have an "about the author" section.
But taking a shot...
Let's say data is in col A
Two helper columns, B and C.
Formula in B2:
=SEARCH("About",A2)
Formula in C2:
=SEARCH("Table of Contents",A2)

Formula to get book description:
=LEFT(A2,MIN(B2:C2)-1)
Formula for about author:
=IFERROR(IF(B2<C2,MID(A2,B2,C2-B2),MID(A2,B2,100000)),"")
Formula for ToC:
=IFERROR(IF(C2<B2,MID(A2,C2,B2-C2),MID(A2,C2,100000)),"")

Copy all of them down as needed.
 
Hi Balaji,
As I suspected, the data is too sporactic to develop a set formula/rule for extracting the information. The flip kart website seems to use that block just for "decription" which, as we can see, has a wide range of what may be included. What is the actual goal you are trying to achieve? You may need to go back to your data crawl and refine what sections of data are being pulled.
 
Hi Luke,

The application that i use to crawl and extract works based on tags and brings in the whole frame and doesn't let me chose individual tables, which would make my life easier.

I am trying to do an inventory management for my friend who owns a book shop. He gives me the ISBN-13 and the link from the flipkart for me to crawl.

Once I have the data crawled I had to manually split data accordingly.

Regards

Balaji Dhanapaul
 
If you do a quick search for "ISBN database excel", I think you'll find several better sites for retrieving the desired info (where author is in it's own field), and/or other XL projects.
 
Hi Luke

I think the inconsistency may be due to the fact the result of =SEARCH("About",$A$) is not displayed.

If I substitute the value of the TOC , then I get the desired result.

Thanks for your help, I will test further data and get back to you by end of tomorrow or early Saturday[24/05/2014].

Regards

Balaji Dhanapaul
 
Good luck with your efforts. Holiday weekend for me, so I'm off to enjoy a nice relaxing 4 day weekend. :):cool:
 
Back
Top